Archive for the 'Sysadmin' Category

RAID1 array enlarging

mercredi, mars 4th, 2009

Here’s a quick  recipe to easily enlarge a RAID1 array with the least possible downtime, using linux 2.6 and mdadm.

We’ll start with a two-disk setup, /dev/sda and /dev/sdb, containing two arrays, /dev/md0 and /dev/md1. /dev/md0 is mounted on / and /dev/md1 is mounted on /backup. We want to grow /dev/md1 from 230GB to 898G (switching from 250GB disks to 1TB).

/dev/md0 has /dev/sda1 and /dev/sdb1, /dev/md1 has /dev/sda3 and /dev/sdb3, while swap partitions are on /dev/sda2 and /dev/sdb2.

Obligatory warning: Use your own brain when following this procedure. Don’t follow me blindly – it’s your data at stake.

Booting on degraded array: don’t shoot yourself in the foot.

When you’ll remove one of the existing disks, your computer won’t be able to boot if grub isn’t installed on the other disk’s bootsector, so make sure that grub is installed on both disks’ MBR:
grub> find /boot/grub/menu.lst
grub> root (hd0,0)
grub> setup (hd0)
grub> root(hd1,0)
grub> setup (hd1)

Shutdown the computer, remove sdb, put in one of the new 1TB disks in place, and reboot. Booting can take some time while the initrd’s mdadm tries to find the missing disk.

You’ll boot with degraded arrays, as shown there:
#cat /proc/mdstat
md0 : active raid1 sda1[0]
19534912 blocks [1/2] [U_]

md1 : active raid1 sda3[0]
223134720 blocks [1/2] [U_]

Now, we’ll dump sda’s partition table:
#sfdisk -d /dev/sda > partitions.txt

Edit the partitions.txt file to remove the size=xxxxxxx field on the sda3 line, so that the biggest possible partition size will be used. The file will look like:

# partition table of /dev/sda
unit: sectors
/dev/sda1 : start=       63, size= 39070017, Id=fd, bootable
/dev/sda2 : start= 39070080, size=  1959930, Id=82
/dev/sda3 : start= 41030010, Id=fd
/dev/sda4 : start=        0, size=        0, Id= 0

Disk initialisation

Now partition sdb using this table:
#sfdisk /dev/sdb < partitions.txt

recreate swap if needed:
#mkswap /dev/sdb2; swapon -a

Put sdb back in the arrays:
#mdadm –manage /dev/md0 –add /dev/sdb1
#mdadm –manage /dev/md1 –add /dev/sdb3

Wait until the array is resynchronised and clean. I use:
#watch cat /proc/mdstat #(quit with Ctrl-C)

Install grub on the new disk using grub, like previously (sdb is hd1 for grub), so that you’ll be able to boot from it.

Changing the second disk

Shutdown, remove sda, put the second new disk in place of it, and reboot – make sure your BIOS is configured to try and boot on both drives.

Now you’ll have degraded arrays again:
#cat /proc/mdstat
md0 : active raid1 sdb1[0]
19534912 blocks [1/2] [_U]

md1 : active raid1 sdb3[0]
223134720 blocks [1/2] [_U]

Redo the whole disk initialisation section, this time on sda instead of sdb. Don’t forget to reinstall grub on sda.

In the end you’ll get your arrays clean as they were before, but /dev/md1 will still be 230GB instead of using the whole available room on the disks’ partitions 3.

Grow the things

Let’s ask mdadm to take the whole partitions size for md1:
#mdadm –grow /dev/md1 –size=max

You’ll have to wait for synchronisation again (watch cat /proc/mdstat).

The only remaining thing is to grow the ext3 filesystem sitting on md1, and that’s where the most downtime happen (your data won’t be available unless you do a live FS resize, which I didn’t want to test); these steps took about 30 minutes to complete for me:
#umount /dev/md1
#e2fsck -f /dev/md1 #(it’s better to force a check to avoid a resize failure)
#resize2fs /dev/md1 #(this makes the filesystem the biggest possible)
#e2fsck -f /dev/md1 #(verify that everything is OK)
#mount /dev/md1 #(and you’re done, as df -h should show you):

# df -h /dev/md1
Filesystem            Size  Used Avail Use% Mounted on
/dev/md1              898G  228G  634G  27% /backup

Rambling about half-finished RAID setups

One thing you may have noticed is that I’m installing grub on both drives. This can seem evident, but most software RAID arrays I’ve seen couldn’t boot out of the second disk for lack of an MBR. It makes the RAID setup useful when your second disk fails, but if it’s the first, you’re forced to resort to a rescue CD or PXE boot to reboot your server. This makes things much harder to fix, provokes cold sweats, downtimes, and user annoyment. Install grub on both disks. Check the system boots when removing one disk, both the first or the second, before going into production. Don’t misunderstand your RAID arrays as a backup system. RAID arrays provides redundancy and eases (a LOT) recovering from a failed disk, but it doesn’t eases recovering from two failed disks; and it doesn’t recover lost data from human mistakes either. Regarding failed disks, best results are achieved by monitoring the disks – with smartd for example – and replacing suspicious disks too soon rather than too late.

Registar switch

mardi, novembre 25th, 2008

Back in may 2001, I grabbed my first domain name, I wanted to stop switching URL each time I switched the hosting. At the time I had no debit card, and I chose the first registrar I found which accepted payment by cheque,

Since then, I grumbled and grumbled each time I had to log in to their web administration interface, for domain renewal, DNS glue records updates, etc ; but out of habit, I bought two more domain names from them : my wife‘s, and a second one I had.

Finally, last week, after more failing glue records updates, I switched my domain and my wife’s domain to I left the third one on amen, as I don’t plan on renewing it, it’s useless for me to keep two domain names.

I knew Gandi beforehand as it’s the registrar for my work’s domain, and I’m glad I did the switch. Their interface is multiple times better, it does what it has to do and doesn’t bug out when hitting Submit with not even an error message.

How to change Dell’s BIOS settings from a Linux command-line

mercredi, mai 21st, 2008

To be able to change BIOS settings from the command-line on a Dell Poweredge, you need the syscfg utility. It’s very useful when you want to change a configuration on, for example, 32 nodes at once, without having to plug screen, plug keyboard, reboot, change setting, reboot 32 times. Here is how I installed it on the CentOS 5 distribution :

# cd ; wget -q -O – | bash
# yum install srvadmin-hapi
# wget
# mkdir dtk
# mount -o loop dtk_2.5_80_Linux.iso dtk/
# cd dtk/isolinux/
# cp  SA.2 ~/SA.2.gz
# cd; gunzip SA.2.gz
# mkdir stage2
# cd stage2
# cpio -i < ../SA.2
# cd lofs
# mkdir dell
# mount -o loop dell.cramfs dell/
# mkdir -p /usr/local/sbin ; cp dell/toolkit/bin/syscfg /usr/local/sbin/
# umount dell
# cd
# umount dtk

And voilà! You can now use syscfg:

# /usr/local/sbin/syscfg –biosver
# /usr/local/sbin/syscfg –virtualization=enable

I’d have preferred an easier way, but couldn’t find syscfg’s RPM.

When deploying that to a lot of nodes, you probably don’t want to go through all the associated network downloads of the first phase (wget of the yum repository, yum, and wget of the 230MB iso), so you can take shortcuts:

# for node in $(list_of_nodes); do scp /usr/local/sbin/syscfg /var/cache/yum/dell-hardware-auto/packages/srvadmin-*.rpm $node: ; ssh $node « mkdir -p /usr/local/sbin; mv syscfg /usr/local/sbin; rpm -ivh srvadmin-*.rpm »; done;

Lazily testing memory

mardi, mai 13th, 2008

I had, until recently, a problem when it came to test memory on the nodes in my lab. Until now, I was able to PXE boot memtest+, but had to go down to the lab and plug a screen to check the output. Multiple annoyances: first I had to move my ass to the lab room, then I add to do some difficult things to plug a screen to the node, then I had to come back from time to time and look at the output. All of these right in front of the cooling units, which blow some really cold air now that they work correctly.

This morning I investigated in the source code of memtest+ and found out it supports output to serial consoles since recently!

A little upgrade later, I can now boot memtest+ with the console=ttyS0,57600 command line parameter and just watch my serial line output, without moving at all! Yay!

DEFAULT memtest console=ttyS0,57600
LABEL memtest
KERNEL images/tools/memtest

Viva PXE!

(Btw for those who’ll find funny to use DEFAULT memtest… PXE boot choices are updated via a cgi script called from an intranet tool, which itself is wrapped in a little GTK systray applet. This applet allows to reboot, shut down, power on, reinstall various distributions, follow serial line, open an ssh connection, on the lab’s nodes. This tool is also useable via command line for scripting power).

Stuff that happens to sysadmins

mercredi, mars 5th, 2008

Buy one 1U server from $supplier, specifically ask for a pair of rails, learn that « Of course it comes with rails! ».

Two weeks later. Buy a 42U rack, and eight 1U servers, all of these from the same $supplier, and at the same time. Receive your rack and 8 servers, without rails. Inquire by email: « No, servers don’t come automatically with rails, you didn’t ask for them ».

Thanks, Dell. It’s always a pleasure.

news for few, stuff no-one cares about