The Raid saga continues
The Scenario:
Current linux box at my personal residence has two mirrored raid arrays using linux software raid. Running ontop of that I’m using LVM to strip. The effect is a raid1+0 array.
The Problem:
On the first sunday of every month a script runs:
#6 1 * * 0 root [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ] && /usr/share/mdadm/checkarray –cron –all –quiet
This does a rebuild on each raid array. However, on the first Sunday of every month my server dies. It appears to lock up without any error on the console. A reboot will result in the server booting, trying to continue rebuilding the raid array followed by another lockup. I have to go into linux single mode. Disable the 2nd disk of the raid arrays, reboot, then manually add a disk to the first raid array, wait for it to finishing rebuilding and then add in the 2nd member of the other arary. In the short to medium term I’ve disabled the cron entry.
The question, what is causing this!
Thus far I thought it was a bad disk, so I replaced one of the drives wth a brand new drive, however the problem still occures. I tested the old disk and it also works correctly. I suspect there is a bug in the kernel somewhere, something to do with libsata or similar.
http://ubuntuforums.org/showthread.php?t=748418
is a report of the same problem.
I thought perhaps it’s a load issues, I mean, a rebuild hammers the raid? So I tried:
date; dd if=/dev/zero of=/blah count=1024 bs=100M; date
which writes 100gig of data from /dev/zero as fast as the raid can go. During this test nothing unexpected happened. Therefor I have proved the issues doesn’t appear to be hardware related.
I think I need to upgrade my kernel but my current kernel is a custom compiled job so upgrading is hard.


