Here’s a quick recipe to easily enlarge a RAID1 array with the least possible downtime, using linux 2.6 and mdadm.
We’ll start with a two-disk setup, /dev/sda and /dev/sdb, containing two arrays, /dev/md0 and /dev/md1. /dev/md0 is mounted on / and /dev/md1 is mounted on /backup. We want to grow /dev/md1 from 230GB to 898G (switching from 250GB disks to 1TB).
/dev/md0 has /dev/sda1 and /dev/sdb1, /dev/md1 has /dev/sda3 and /dev/sdb3, while swap partitions are on /dev/sda2 and /dev/sdb2.
Obligatory warning: Use your own brain when following this procedure. Don’t follow me blindly – it’s your data at stake.
Booting on degraded array: don’t shoot yourself in the foot.
When you’ll remove one of the existing disks, your computer won’t be able to boot if grub isn’t installed on the other disk’s bootsector, so make sure that grub is installed on both disks’ MBR:
grub> find /boot/grub/menu.lst
grub> root (hd0,0)
grub> setup (hd0)
grub> setup (hd1)
Shutdown the computer, remove sdb, put in one of the new 1TB disks in place, and reboot. Booting can take some time while the initrd’s mdadm tries to find the missing disk.
You’ll boot with degraded arrays, as shown there:
md0 : active raid1 sda1
19534912 blocks [1/2] [U_]
md1 : active raid1 sda3
223134720 blocks [1/2] [U_]
Now, we’ll dump sda’s partition table:
#sfdisk -d /dev/sda > partitions.txt
Edit the partitions.txt file to remove the size=xxxxxxx field on the sda3 line, so that the biggest possible partition size will be used. The file will look like:
# partition table of /dev/sda
/dev/sda1 : start= 63, size= 39070017, Id=fd, bootable
/dev/sda2 : start= 39070080, size= 1959930, Id=82
/dev/sda3 : start= 41030010, Id=fd
/dev/sda4 : start= 0, size= 0, Id= 0
Now partition sdb using this table:
#sfdisk /dev/sdb < partitions.txt
recreate swap if needed:
#mkswap /dev/sdb2; swapon -a
Put sdb back in the arrays:
#mdadm –manage /dev/md0 –add /dev/sdb1
#mdadm –manage /dev/md1 –add /dev/sdb3
Wait until the array is resynchronised and clean. I use:
#watch cat /proc/mdstat #(quit with Ctrl-C)
Install grub on the new disk using grub, like previously (sdb is hd1 for grub), so that you’ll be able to boot from it.
Changing the second disk
Shutdown, remove sda, put the second new disk in place of it, and reboot – make sure your BIOS is configured to try and boot on both drives.
Now you’ll have degraded arrays again:
md0 : active raid1 sdb1
19534912 blocks [1/2] [_U]
md1 : active raid1 sdb3
223134720 blocks [1/2] [_U]
Redo the whole disk initialisation section, this time on sda instead of sdb. Don’t forget to reinstall grub on sda.
In the end you’ll get your arrays clean as they were before, but /dev/md1 will still be 230GB instead of using the whole available room on the disks’ partitions 3.
Grow the things
Let’s ask mdadm to take the whole partitions size for md1:
#mdadm –grow /dev/md1 –size=max
You’ll have to wait for synchronisation again (watch cat /proc/mdstat).
The only remaining thing is to grow the ext3 filesystem sitting on md1, and that’s where the most downtime happen (your data won’t be available unless you do a live FS resize, which I didn’t want to test); these steps took about 30 minutes to complete for me:
#e2fsck -f /dev/md1 #(it’s better to force a check to avoid a resize failure)
#resize2fs /dev/md1 #(this makes the filesystem the biggest possible)
#e2fsck -f /dev/md1 #(verify that everything is OK)
#mount /dev/md1 #(and you’re done, as df -h should show you):
# df -h /dev/md1
Filesystem Size Used Avail Use% Mounted on
/dev/md1 898G 228G 634G 27% /backup
Rambling about half-finished RAID setups
One thing you may have noticed is that I’m installing grub on both drives. This can seem evident, but most software RAID arrays I’ve seen couldn’t boot out of the second disk for lack of an MBR. It makes the RAID setup useful when your second disk fails, but if it’s the first, you’re forced to resort to a rescue CD or PXE boot to reboot your server. This makes things much harder to fix, provokes cold sweats, downtimes, and user annoyment. Install grub on both disks. Check the system boots when removing one disk, both the first or the second, before going into production. Don’t misunderstand your RAID arrays as a backup system. RAID arrays provides redundancy and eases (a LOT) recovering from a failed disk, but it doesn’t eases recovering from two failed disks; and it doesn’t recover lost data from human mistakes either. Regarding failed disks, best results are achieved by monitoring the disks – with smartd for example – and replacing suspicious disks too soon rather than too late.