User Tools

Site Tools


recovery_from_a_multiple_disk_failure_with_mdadm

Stolen from here. Summary: mdadm –assemble –run –force /dev/md0 /dev/sd[abcd]1

Recovery from a multiple disk failure with mdadm

If you lose more disks than you have parity to protect you from data loss, then you're array is gone. But, quite often you get a temporary failure of several disks at once (bad cable, or failed HBA); afterwards the RAID superblocks are out of sync and you can no longer start your RAID array. Here's a couple ways to try to get it working again.

Maybe a controller dies and takes two disks offline at the same time or a cable comes loose. This will show you quick way to put your array back be together. It's likely that your array either didn't assemble, or assembled incorrectly. The first thing you want to do is make sure you're working on disks in a read-only state. To do that you first need to stop any running arrays. To view them, do this.

sudo -i
cat /proc/mdstat

I'll assume you had a /dev/md0 array show up, so let's stop it.

mdadm --stop /dev/md0

Once, it's stopped the next thing to try to do is just force the array back together using the proper disks.

mdadm --assemble --force /dev/md0 /dev/sd[abdefghij]1

Hopefully, that got your array assembled correctly, and if something really bad didn't happen, it's likely you're back in business. If that doesn't work, you can do this…

mdadm --assemble --run --force /dev/md0 /dev/sd[abdefghij]1

If all else fails, you can zero the superblocks, and then re-create the array. mdadm is smart enough to detect the existing data, and shouldn't overwrite it.

mdadm --create --assume-clean /dev/md0 /dev/sd[abcdefg]1
recovery_from_a_multiple_disk_failure_with_mdadm.txt · Last modified: 2012/06/02 10:24 by tkbletsc