Saving an Iomega IX-200D from a dual drive fault

Yup. She’s down. No hardware faults that I can see, but the array claims that all the drives contain data from another system. (Settings–>Disks)

20141127-01

I didn’t have any active I/O to the array at the time of the fault, so I’m hoping that the RAID structures are consistent enough for repair. The storage recovery page is accessed by going to the hidden support page: https://<iomega_ip>/support.html and clicking Recover Disks.

20141127-02

On the Storage Recovery Verification page, click the check box and Apply.

20141127-03

The array displays a confirmation screen, but no further status from what I can see.

20141127-04

I waited a LONG time and couldn’t find any further indications. The array appears idle. Anyway, I’m not willing to give it up there. Surely the RAID structures aren’t that fragile…. I remember reading somewhere that the Iomega NAS are Linux-based, so let’s see if I can SSH into this thing!

Disclaimer: If you follow steps after this line you may render your device unusable. Buyer beware; your mileage may vary; at your own risk; etc.

I came across this excellent article by Christopher Kusek (PKGuild): Shell access to your ix2/ix4 exposed! “Get yer red hot ssh here!” If you want to hack it yourself, read the post. I followed all of the steps including the John-the-ripper brute force shadow attack. Then I saw at the end of the post the password is always soho+admin_password…. That’s the second time in my life that I learned to read the whole assignment before starting. 🙂

So, back on the support screen, click Support Access.

20141127-05

Check the box to Allow remote access for support and click Apply.

20141127-06

Now the fun can begin!!! SSH into the NAS as root. Remember, the password is soho+admin_password. So, if your admin password is “haxor”, the root password will be “sohohaxor”.

Jasons-MacBook-Pro:~ thornj$ ssh root@192.168.0.254
root@192.168.0.254's password: 
root@storage:

It looks like the Iomega uses a standard Linux RAID implementation. So it should be recoverable using standard Linux tools. You can find more detailed information about this on raid.wiki.kernel.org.

So, first to check the array health.

root@storage:/# cat /proc/mdstat

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid5 sdd2[4](S) sdc2[5](S) sda2[0] sdb2[1]
5854422528 blocks level 5, 64k chunk, algorithm 2 [4/2] [UU__]

md0 : active raid1 sdd1[3] sdc1[2] sda1[0] sdb1[1]
2040128 blocks [4/4] [UUUU]

unused devices: <none>
root@storage:/#

The device md1 is my RAID 5 array. The device md0 is the RAID 0 array that contains the IX-200D kernel. Right away it’s not looking good. Note the “UU__”. There are two up devices and two down devices on md1. All four devices are up on md0 (“UUUU”).

Next to look at the status of md1.

root@storage:/# mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90
  Creation Time : Sat Nov  8 18:15:18 2014
     Raid Level : raid5
     Array Size : 5854422528 (5583.21 GiB 5994.93 GB)
  Used Dev Size : 1951474176 (1861.07 GiB 1998.31 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1
    Persistence : Superblock is persistent

Update Time : Thu Nov 27 20:33:21 2014
State : clean, degraded
Active Devices : 2
Working Devices : 4
Failed Devices : 0
Spare Devices : 2

Layout : left-symmetric
Chunk Size : 64K

UUID : bb9047c5:f742e3ce:2e29483d:f114274d (local to host storage)
Events : 0.22892

Number   Major   Minor   RaidDevice State
0       8        2        0      active sync   /dev/sda2
1       8       18        1      active sync   /dev/sdb2
2       0        0        2      removed
3       0        0        3      removed

4       8       50        –      spare   /dev/sdd2
5       8       34        –      spare   /dev/sdc2
root@storage:/#

OUCH! It appears that drives 2 and 3 faulted. Before I make any changes, I’ll save the superblock info. I can come back to this later for reference if I make mistakes reassembling the array.

root@storage:/# mdadm --examine /dev/sd[abcd]2 > raid.status
root@storage:/#

I’ll check the event IDs are similar to determine how much data corruption I am likely to encounter.

root@storage:/# mdadm --examine /dev/sd[a-d]2 | egrep 'Event|/dev/sd'
/dev/sda2:
         Events : 22894
this     0       8        2        0      active sync   /dev/sda2
   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   4     4       8       50        4      spare   /dev/sdd2
   5     5       8       34        5      spare   /dev/sdc2
/dev/sdb2:
         Events : 22894
this     1       8       18        1      active sync   /dev/sdb2
   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   4     4       8       50        4      spare   /dev/sdd2
   5     5       8       34        5      spare   /dev/sdc2
/dev/sdc2:
         Events : 22894
this     5       8       34        5      spare   /dev/sdc2
   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   4     4       8       50        4      spare   /dev/sdd2
   5     5       8       34        5      spare   /dev/sdc2
/dev/sdd2:
         Events : 22894
this     4       8       50        4      spare   /dev/sdd2
   0     0       8        2        0      active sync   /dev/sda2
   1     1       8       18        1      active sync   /dev/sdb2
   4     4       8       50        4      spare   /dev/sdd2
   5     5       8       34        5      spare   /dev/sdc2
root@storage:/#

The event IDs are identical, so the array should safe to reconstruct without any data loss.

First, I’ll try to re-add the drives.

root@storage:/# mdadm /dev/md1 --re-add /dev/sd[cd]2
mdadm: Cannot open /dev/sdc2: Device or resource busy
root@storage:/#

Next, I’ll try to forcibly reassemble the array.

root@storage:/# mdadm --assemble --force /dev/md1 /dev/sd[a-d]2    
mdadm: device /dev/md1 already active - cannot assemble it
root@storage:/#

Last, I’ll try to recreate the array. Before I do that, I need to find the drive size.

root@storage:/# grep Used raid.status
  Used Dev Size : 1951474176 (1861.07 GiB 1998.31 GB)
  Used Dev Size : 1951474176 (1861.07 GiB 1998.31 GB)
  Used Dev Size : 1951474176 (1861.07 GiB 1998.31 GB)
  Used Dev Size : 1951474176 (1861.07 GiB 1998.31 GB)
root@storage:/#

Now to create the array:

root@storage:/# mdadm --stop /dev/md1
mdadm: stopped /dev/md1
root@storage:/# mdadm --remove /dev/md1
root@storage:/# mdadm --create --assume-clean --level=5 --raid-devices=4 --size=1951474176 /dev/md1 /dev/sd[a-d]2
mdadm: /dev/sda2 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Sat Nov  8 18:15:18 2014
mdadm: /dev/sdb2 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Sat Nov  8 18:15:18 2014
mdadm: /dev/sdc2 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Sat Nov  8 18:15:18 2014
mdadm: /dev/sdd2 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Sat Nov  8 18:15:18 2014
Continue creating array? y
mdadm: array /dev/md1 started.
root@storage:/#

Now to check if the array is created.

root@storage:/# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid5 sda2[0] sdd2[3] sdc2[2] sdb2[1]
      5854422528 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
      
md0 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
      2040128 blocks [4/4] [UUUU]
      
unused devices: <none>
root@storage:/#

It looks good, but the UI doesn’t see any change. Reboot so that the NAS inits properly.

root@storage:/# reboot

Success! All drives and RAID volumes look fine. The reconstruction is running again. With a little luck it will actually complete.

11_27_2014-7

11_27_2014-8

The UI remained stuck at 0% reconstructed, but the CLI never showed a reconstruct. Then I remembered that the reconstruct of /dev/sdd2 may never have completed. So I decided to force a fault on /dev/sdd2 to trigger a reconstruct.

root@storage:/# mdadm --manage --set-faulty /dev/md1 /dev/sdd2
mdadm: set /dev/sdd2 faulty in /dev/md1
root@storage:/#

Check the array status by looking at /proc/mdstat. The device is showing as faulted, but reconstruction has not started.

root@storage:/# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid5 sda2[0] sdd2[4](F) sdc2[2] sdb2[1]
      5854422528 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
      
md0 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
      2040128 blocks [4/4] [UUUU]
      
unused devices: <none>
root@storage:/#

Check the array status again with mdadm. The device is showing as faulted. The reconstruction has started automatically.

root@storage:/# mdadm -D /dev/md1
/dev/md1:
        Version : 00.90
  Creation Time : Thu Nov 27 21:33:13 2014
     Raid Level : raid5
     Array Size : 5854422528 (5583.21 GiB 5994.93 GB)
  Used Dev Size : 1951474176 (1861.07 GiB 1998.31 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 1
    Persistence : Superblock is persistent

Update Time : Fri Nov 28 08:35:28 2014
State : clean, degraded, recovering
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 64K

Rebuild Status : 0% complete

UUID : c0c74b41:ea6d9b00:2e29483d:f114274d (local to host storage)
Events : 0.20

Number   Major   Minor   RaidDevice State
0       8        2        0      active sync   /dev/sda2
1       8       18        1      active sync   /dev/sdb2
2       8       34        2      active sync   /dev/sdc2
4       8       50        3      spare rebuilding   /dev/sdd2
root@storage:/#

Continue to monitor status until reconstruction reaches 1%.

root@storage:/# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid5 sdd2[4] sda2[0] sdc2[2] sdb2[1]
      5854422528 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
      [>....................]  recovery =  1.0% (19524352/1951474176) finish=962.5min speed=33452K/sec
      
md0 : active raid1 sda1[0] sdd1[3] sdc1[2] sdb1[1]
      2040128 blocks [4/4] [UUUU]
      
unused devices: <none>
root@storage:/#

At 1% progess, check the UI to see if it’s tracking.

20141127-09

Reconstruct is running! In 16 hours it should be complete.