[Nemeth10] 8.8. Logical volume management

Stefen 2011-01-26

展开全文

8.8. Logical volume management

Imagine a world in which you don’t know exactly how large a partition needs to be. Six months after creating the partition, you discover that it is much too large, but that a neighboring partition doesn’t have enough space... Sound familiar? A logical volume manager lets you reallocate space dynamically from the greedy partition to the needy partition.

Logical volume management is essentially a supercharged and abstracted version of disk partitioning. It groups individual storage devices into “volume groups.” The blocks in a volume group can then be allocated to “logical volumes,” which are represented by block device files and act like disk partitions.

However, logical volumes are more flexible and powerful than disk partitions. Here are some of the magical operations a volume manager lets you carry out:

Move logical volumes among different physical devices
Grow and shrink logical volumes on the fly
Take copy-on-write “snapshots” of logical volumes
Replace on-line drives without interrupting service
Incorporate mirroring or striping in your logical volumes

The components of a logical volume can be put together in various ways. Concatenation keeps each device’s physical blocks together and lines the devices up one after another. Striping interleaves the components so that adjacent virtual blocks are actually spread over multiple physical disks. By reducing single-disk bottle-necks, striping can often provide higher bandwidth and lower latency.

LVM implementations

All our example systems support logical volume management, and with the exception of Solaris’s ZFS, the systems are all quite similar.

In addition to ZFS, Solaris supports a previous generation of LVM called the Solaris Volume Manager, formerly Solstice DiskSuite. This volume manager is still supported, but new deployments should use ZFS.

Linux’s volume manager, called LVM2, is essentially a clone of HP-UX’s volume manager, which is itself based on software by Veritas. The commands for the two systems are essentially identical, but we show examples for both systems because their ancillary commands are somewhat different. AIX’s system has similar abstractions but different command syntax. Table 8.4 illustrates the parallels among these three systems.

Table 8.4. Comparison of LVM commands
	Operation	Linux	HP-UX	AIX
Physical vol	Create Inspect Modify Check	pvcreate pvdisplay pvchange pvck	pvcreate pvdisplay pvchange pvck	– lspv chpv –
Volume group	Create Modify Extend Inspect Check Enable	vgcreate vgchange vgextend vgdisplay vgck vgscan	vgcreate vgchange vgextend vgdisplay – vgscan	mkvg chvg extendvg lsvg – varyonvg
Logical vol	Create Modify Resize Inspect	lvcreate lvchange lvresize lvdisplay	lvcreate lvchange lvextend, lvreduce lvdisplay	mklv chlv extendlv lslv

In addition to commands that deal with volume groups and logical volumes, Table 8.4 also shows a couple of commands that relate to “physical volumes.” A physical volume is a storage device that has had an LVM label applied; applying such a label is the first step to using the device through the LVM. Linux and HP-UX use pvcreate to apply a label, but AIX’s mkvg does it automatically. In addition to bookkeeping information, the label includes a unique ID to identify the device.

“Physical volume” is a somewhat misleading term because physical volumes need not have a direct correspondence to physical devices. They can be disks, but they can also be disk partitions or RAID arrays. The LVM doesn’t care.

Linux logical volume management

You can control Linux’s LVM implementation (LVM2) with either a large group of simple commands (the ones illustrated in Table 8.4) or with the single lvm command and its various subcommands. These options are for all intents and purposes identical; in fact, the individual commands are really just links to lvm, which looks to see how it’s been called to know how to behave. man lvm is a good introduction to the system and its tools.

A Linux LVM configuration proceeds in a few distinct phases:

Creating (defining, really) and initializing physical volumes
Adding the physical volumes to a volume group
Creating logical volumes on the volume group

LVM commands start with letters that make it clear at which level of abstraction they operate: pv commands manipulate physical volumes, vg commands manipulate volume groups, and lv commands manipulate logical volumes. A few commands with the prefix lvm (e.g., lvmchange) operate on the system as a whole.

In the following example, we set up the /dev/md0 RAID 5 device we created on page 243 for use with LVM and create a logical volume. Since striping and redundancy have already been addressed by the underlying RAID configuration, we won’t make use of the corresponding LVM2 features, although they exist.

$ sudo pvcreate /dev/md0
Physical volume "/dev/md0" successfully created

Our physical device is now ready to be added to a volume group:

$ sudo vgcreate DEMO /dev/md0
Volume group "DEMO" successfully created

Although we’re using only a single physical device in this example, we could of course add additional devices. In this case, it would be strange to add anything but another RAID 5 array since there is no benefit to partial redundancy. DEMO is an arbitrary name that we’ve selected.

To step back and examine our handiwork, we use the vgdisplay command:

$ sudo vgdisplay DEMO
--- Volume group ---
VG Name                DEMO
System ID
Format                 lvm2
Metadata Areas         1
Metadata Sequence No   1
VG Access              read/write
VG Status              resizable
MAX LV                 0
Cur LV                 0
Open LV                0
Max PV                 0
Cur PV                 1
Act PV                 1
VG Size                975.99 GB
PE Size                4.00 MB
Total PE               249854
Alloc PE / Size        0 / 0
Free                   PE / Size249854 / 975.99 GB
VG UUID                NtbRLu-RqiQ-3Urt-iQZn-vEvJ-u0Th-FVYKWF

A “PE” is a physical extent, the allocation unit according to which the volume group is subdivided.

The final steps are to create the logical volume within DEMO and then to create a filesystem within that volume. We make the logical volume 100GB in size:

$ sudo lvcreate -L 100G -n web1 DEMO
Logical volume "web1" created

Most of LVM2’s interesting options live at the logical volume level. That’s where striping, mirroring, and contiguous allocation would be requested if we were using those features.

We can now access the volume through the device /dev/DEMO/web1. We discuss filesystems in general starting on page 254 , but here is a quick overview of creating a standard filesystem so that we can demonstrate a few additional LVM tricks.

$ sudo mkfs /dev/DEMO/web1
...
$ sudo mkdir /mnt/web1
$ sudo mount /dev/DEMO/web1 /mnt/web1

Volume snapshots

You can create copy-on-write duplicates of any LVM2 logical volume, whether or not it contains a filesystem. This feature is handy for creating a quiescent image of a filesystem to be backed up on tape, but unlike ZFS snapshots, LVM2 snapshots are unfortunately not very useful as a general method of version control.

The problem is that logical volumes are of fixed size. When you create one, storage space is allocated for it up front from the volume group. A copy-on-write duplicate initially consumes no space, but as blocks are modified, the volume manager must find space in which to store both the old and new versions. This space for modified blocks must be set aside when you create the snapshot, and like any LVM volume, the allocated storage is of fixed size.

Note that it does not matter whether you modify the original volume or the snapshot (which by default is writable). Either way, the cost of duplicating the blocks is charged to the snapshot. Snapshots’ allocations can be pared away by activity on the source volume even when the snapshots themselves are idle.

If you do not allocate as much space for a snapshot as is consumed by the volume of which it is an image, you can potentially run out of space in the snapshot. That’s more catastrophic than it sounds because the volume manager then has no way to maintain a coherent image of the snapshot; additional storage space is required just to keep the snapshot the same. The result of running out of space is that LVM stops maintaining the snapshot, and the snapshot becomes irrevocably corrupt.

So, as a matter of practice, LVM snapshots should be either short-lived or as large as their source volumes. So much for “lots of cheap virtual copies.”

To create /dev/DEMO/web1-snap as a snapshot of /dev/DEMO/web1, we would use the following command:

$ sudo lvcreate -L 100G -s -n web1-snap DEMO/web1

Note that the snapshot has its own name and that the source of the snapshot must be specified as volume_group/volume.

In theory, /mnt/web1 should really be unmounted first to ensure the consistency of the filesystem. In practice, ext4 will protect us against filesystem corruption, although we may lose a few of the most recent data block updates. This is a perfectly reasonable compromise for a snapshot used as a backup source.

To check on the status of your snapshots, run lvdisplay. If lvdisplay tells you that a snapshot is “inactive,” that means it has run out of space and should be deleted. There’s very little you can do with a snapshot once it reaches this point.

Resizing filesystems

Filesystem overflows are more common than disk crashes, and one advantage of logical volumes is that they’re much easier to juggle and resize than are hard partitions. We have experienced everything from servers used for personal MP3 storage to a department full of email pack rats.

The logical volume manager doesn’t know anything about the contents of its volumes, so you must do your resizing at both the volume and filesystem levels. The order depends on the specific operation. Reductions must be filesystem-first, and enlargements must be volume-first. Don’t memorize these rules: just think about what’s actually happening and use common sense.

Suppose that in our example, /mnt/web1 has grown more than we predicted and needs another 10GB of space. We first check the volume group to be sure additional space is available.


	`$` `sudo vgdisplay DEMO`
	`--- Volume group ---`
	VG Name System ID Format Metadata Areas Metadata Sequence No VG Access VG Status MAX LV Cur LV Open LV Max PV Cur PV Act PV VG Size PE Size Total PE Alloc PE / Size	DEMO lvm2 1 18 read/write resizable 0 2 1 0 1 1 975.99 GB 4.00 MB 249854 51200 / 200.00 GB
	`Free`	`PE / Size198654 / 775.99 GB`
	`VG UUID`	`NtbRLu-RqiQ-3Urt-iQZn-vEvJ-u0Th-FVYKWF`

Plenty of space is available, so we unmount the filesystem and use lvresize to add space to the logical volume.

$ sudo umount /mnt/web1
$ sudo lvchange -an DEMO/web1
$ sudo lvresize -L +10G DEMO/web1
$ sudo lvchange -ay DEMO/web1
Extending logical volume web1 to 110.00 GB
Logical volume web1 successfully resized

The lvchange commands are needed to deactivate the volume for resizing and to reactivate it afterwards. This part is only needed because there is an existing snapshot of web1 from our previous example. After the resize operation, the snapshot will “see” the additional 10GB of allocated space, but since the filesystem it contains is only 100GB in size, the snapshot will still be usable.

We can now resize the filesytem with resize2fs. (The 2 comes from the original ext2 filesystem, but the command supports all versions of ext.) Since resize2fs can determine the size of the new filesystem from the volume, we don’t need to specify the new size explicitly. We would have to do so when shrinking the filesystem.

$ sudo resize2fs /dev/DEMO/web1
resize2fs 1.41.9 (22-Aug-2009)
Please run 'e2fsck -f /dev/DEMO/web1' first.

Oops! resize2fs forces you to double-check the consistency of the filesystem before resizing.

$ sudo e2fsck -f /dev/DEMO/web1
e2fsck 1.41.9 (22-Aug-2009)
Pass 1: Checking inodes, blocks, and sizes
...
/dev/DEMO/web1: 6432/6553600 files (0.1% non-contiguous), 473045/26214400
     blocks
$ sudo resize2fs /dev/DEMO/web1
resize2fs 1.41.9 (22-Aug-2009)
Resizing the filesystem on /dev/DEMO/web1 to 28835840 (4k) blocks.
The filesystem on /dev/DEMO/web1 is now 28835840 blocks long.

That’s it! Examining the output of df again shows the changes:

$ sudo mount /dev/DEMO/web1 /mnt/web1
$ df -h /mnt/web1
Filesystem             Size   Used  Avail  Use%  Mounted on
/dev/mapper/DEMO-web  1109G   188M  103G     1%   /mnt/web1

HP-UX logical volume management

As of HP-UX 10.20, HP provides a full logical volume manager. It’s a nice addition, especially when you consider that HP-UX formerly did not even support the notion of disk partitions. The volume manager is called LVM, just as on Linux, although the HP-UX version is in fact the original. (Really, it’s Veritas software...)

As a simple example of LVM wrangling, here’s how you would configure a 75GB hard disk for use with the logical volume manager. If you have read through the Linux example above, the following procedure will seem eerily familiar. There are a few minor differences, but the overall process is essentially the same.

The pvcreate command identifies physical volumes.

$ sudo pvcreate /dev/rdisk/disk4
Creating "/etc/lvmtab_p".
Physical volume "/dev/rdisk/disk4" has been successfully created.

If you will be using the disk as a boot disk, add the -B option to pvcreate to reserve space for a boot block, then run mkboot to install it.

After defining the disk as a physical volume, you add it to a new volume group with the vgcreate command. Two metadata formats exist for volume groups, versions 1.0 and 2.0. You specify which version you want with the -V option when creating a volume group; version 1.0 remains the default. Version 2.0 has higher size limits, but it’s not usable for boot devices or swap volumes. Even version 1.0 metadata has quite generous limits, so it should be fine for most uses. You can see the exact limits with lvmadm. For reference, here are the limits for 1.0:

$ sudo lvmadm -t -V 1.0
--- LVM Limits ---
VG Version                1.0
Max VG Size (Tbytes)      510
Max LV Size (Tbytes)      16
Max PV Size (Tbytes)      2
Max VGs                   256
Max LVs                   255
Max PVs                   255
Max Mirrors               2
Max Stripes               255
Max Stripe Size (Kbytes)  32768
Max LXs per LV            65535
Max PXs per PV            65535
Max Extent Size (Mbytes)  256

You can add extra disks to a volume group with vgextend, but this example volume group contains only a single disk.

$ sudo vgcreate vg01 /dev/disk/disk4
Increased the number of physical extents per physical volume to 17501.
Volume group "/dev/vg01" has been successfully created.
Volume Group configuration for /dev/vg01 has been saved in
    /etc/lvmconf/vg01.con

Once your disks have been added to a convenient volume group, you can split the volume group’s pool of disk space back into logical volumes. The lvcreate command creates a new logical volume. Specify the size of the volume in megabytes with the -L flag or in logical extents (typically 4MiB) with the -l flag. Sizes specified in MiB are rounded up to the nearest multiple of the logical extent size.

To assess the amount of free space remaining in a volume group, run vgdisplay vgname as root. The output includes the extent size and the number of unallocated extents.

Code View: Scroll / Show All

$ sudo lvcreate -L 25000 -n web1 vg01
Logical volume "/dev/vg01/web1" has been successfully created with character
     device "/dev/vg01/rweb1".
Logical volume "/dev/vg01/web1" has been successfully extended.
Volume Group configuration for /dev/vg01 has been saved in
    /etc/lvmconf/vg01.conf

The command above creates a 25GB logical volume named web1. Once you’ve created your logical volumes, you can verify them by running vgdisplay -v /dev/vgname to double-check their sizes and make sure they were set up correctly.

In most scenarios, you would then go on to create a filesystem on /dev/vg01/web1 and arrange for it to be mounted at boot time. See page 258 for details.

Another common way to create a logical volume is to use lvcreate to create a zero-length volume and then use lvextend to add storage to it. That way, you can specify exactly which physical volumes in the volume group should compose the logical volume. If you allocate space with lvcreate (as we did above), it simply uses free extents from any available physical volumes in the volume group—good enough for most situations.

As in Linux, striping (which HP-UX’s LVM refers to as “distributed allocation”) and mirroring are features at the logical volume level. You can request them at the time the logical volume is created with lvcreate, or later with lvchange. In contrast to Linux, the logical volume manager does not allow snapshots. However, temporary snapshots are available as a feature of HP’s VxFS filesystem.

If you plan to use a logical volume as a boot or swap device or to store system core dumps, you must specify contiguous allocation and turn off bad block remapping with the -C and -r flags to lvcreate, as shown below.^[16]

^[16] HP-UX limitations require swap space to reside in the first 2GiB of the physical disk and the boot volume to be the first logical volume. The 1.5GB root and 500MB swap shown here were chosen to work around these constraints. You can have a root partition that is larger than these values, but you must then have separate boot and root volumes. See the man page for lvlnboot for more details.

Code View: Scroll / Show All

# lvcreate -C y -r n -L 1500 -n root vg01
Logical volume "/dev/vg01/root" has been successfully created with character
     device "/dev/vg01/rroot".
Logical volume "/dev/vg01/root" has been successfully extended.
Volume Group configuration for /dev/vg01 has been saved in
    /etc/lvmconf/vg01.conf
# lvcreate -C y -r n -L 500 -n swap vg01
Logical volume "/dev/vg01/swap" has been successfully created with character
     device "/dev/vg01/rswap".
...

You must then run the lvlnboot command to notify the system of the new root and swap volumes. See the man page for lvlnboot for more information about the special procedures for creating boot, swap, and dump volumes.

AIX logical volume management

AIX’s logical volume manager uses a different command set from the volume managers of Linux and HP-UX, but its underlying architecture and approach are similar. One potentially confusing point is that AIX calls the objects more commonly known as extents (that is, the units of space allocation within a volume group) “partitions.” Because the entities normally referred to as partitions do not exist in AIX, there is no ambiguity within the AIX sphere itself. However, tourists visiting from other systems may wish to bring along an AIX phrase book.

In other respects—physical volume, volume group, logical volume—AIX terminology is standard. The SMIT interface for logical volume management is pretty complete, but you can also use the commands listed in Table 8.4.

The following four commands create a volume group called webvg, a logical volume called web1 within it, and a JFS2 filesystem inside web1. The filesystem is then mounted in /mnt/web1.

$ sudo mkvg -y webvg hdisk1
webvg
$ sudo crfs -v jfs2 -g webvg -m /mnt/web1 -a size=25G
File system created successfully.
26213396 kilobytes total disk space.
New File System size is 52428800
$ sudo mkdir /mnt/web1
$ sudo mount /mnt/web1

AIX does not require you to label disks to turn them into physical volumes. mkvg and extendvg automatically label disks as part of the induction process. Note that mkvg takes a device name and not the path to a disk device.

You can create the logical volume and the filesystem inside it in separate steps (with mklv and mkfs, respectively), but crfs performs both tasks for you and updates /etc/filesystems as well. The exact name of the logical volume device that holds the filesystem is made up for you in the crfs scenario, but you can determine it by inspecting /etc/filesystems or running mount. (On the other hand, it can be hard to unscramble filesystems in the event of problems if the volumes all have generic names.)

If you run mklv directly, you can specify not only a device name of your choosing but also various options to the volume manager such as striping and mirroring configurations. Snapshots are implemented through the JFS2 filesystem and not through the volume manager.