Dead disk during expansion

Failed expansion + dead disk = lost data

A few weeks ago I replaced one of the disks in my ReadyNAS NVX with a larger one. The expansion process seemed to complete successfully… This morning one of my disks went bad…maybe due to a brief power outage…The NAS appeared to just be in a weird state, so I powered it off cleanly, restarted it, and told it to do another scan.

Now the NAS is telling me that disk #1… is “spare”, not part of the RAID array. It says disk #3 — the one that appeared to fail this morning — is just gone. Since there are two “failed” disks, the array is “dead’ and my data is gone.

Is there anything I can do at this point?

Expansion is a fragile process. All the disks of the original set must be in perfect shape before expansion. Expansion (also called reshape, because the array geometry is changed) requires every sector on every disk to be first read and then written to.

Normally, the expansion process survives the power outage. It certainly does survive a normal shut down or an UPS-initiated shutdown. A smart UPS can tell the NAS that the power is lost and the NAS then proceeds to shut itself down without any human intervention. This is certainly not a problem. Sudden power cuts are more of a problem, but the damage, if any, is usually well contained.

However, a drive failure during expansion makes a rebuild tricky. Theoretically, the RAID is still redundant, because the reshape algorithm is designed to maintain redundancy throughout the process. In practice, once the drive fails, accounting for what is where and how to recompute data from the parity suddenly gets complicated. Any further failure results in a half-reshaped array which is a mess to fix and certainly beyond the abilities of automatic recovery.

What can be done to minimize the chance of the failure?

  1. Think twice if you need the expansion. The traditional way, used before the on-line expansion, is to back the data up, verify the backup, destroy the original  array, build a new array, copy the data back. This method still works.
  2. Have backup before expanding the array. Once you have a backup, there is no requirement to destroy the original array. You still have the expansion capability. If something goes wrong, you have a backup.
  3. If the data is not that valuable and a risk of losing it is deemed acceptable, make sure you check SMART status on all the array disks and do an extended test of the disks (if your NAS allows that).

Know your RAID level

One ReadyNAS owner seems to be confused about what RAID level is (and was) used on its ReadyNAS Duo (full story).

I have a readyNas Duo with 2x 1Tb disks in raid. … I had to reset the NAS, unfortunally i holded the reset button to long so the disks are wiped clean….

After the reset i upgraded to the latest firmware and let the disk sync.

… is my data lost, or are there way’s to recover the data?

i tried to recover my data with [RecoverMyFiles] …without success…I checked only one disk since i wanted to have the other one as an backup to be examined by a company … [and they] … think my disks were not in mirror but in striping mode…I can’t check this ofcorse but i never saw more totall space then the size of one disk.

This looks real bad. There are two things we know for sure:

  1. this is ReadyNAS Duo, and
  2. there are two physical drives.

and that’s all. There are four conflicting bits in the above quote relevant to the RAID level.

  1. [did] let the disk sync suggests RAID1. RAID0 does not need any kind of sync.
  2. wanted to have the other one as an backup indicates owner’s belief that the array is RAID1
  3. company … thinks … striping mode, that’s pretty straightforward.
  4. never saw more totall space than the size of one disk, which again points to RAID1.

Problem is, it is fairly easy to recover data either from a RAID0 or from a RAID1. Home NAS Recovery, as one obvious example, can work either way, and does not even need to know the RAID level beforehand. However, if the initial array was RAID0, and then after a reset the NAS switched to RAID1 mode and copied contents of the one disk to the other disk, there is nothing left to recover.

 

ReadyNAS RAID Levels

ReadyNAS comes in a wide variety of RAID levels, none of them seemingly matching the standard. There are (in different ReadyNAS devices)

  • Flex-RAID
  • X-RAID
  • X-RAID2

Flex-RAID is the most simple of them all. It is a stand-in for manual configuration. Once you choose Flex-RAID, the system asks you to choose between any of the standard RAID levels, and if you go for multiple arrays, then how many disks are allocated to each of the arrays.

X-RAID and X-RAID2 both hide RAID settings from user. Both will automatically expand array if more disks are added, or if enough disks are replaced with larger ones and the array can be expanded maintaining redundancy. Internally,

  1. on one disk, it is just a simple partition,
  2. as the second disk is added, that simple partition is converted to a RAID1,
  3. at the addition of the third disk,  RAID1 is converted to a RAID5,
  4. as more disks are added, RAID5 is reshaped to accommodate additional disks.

The exact difference between X-RAID and X-RAID2 is sort of moot. For all intents and purposes, once the RAID crashed, they are the same.

Home NAS Recovery works with any of these configurations except

  1. where disks of different sizes are involved
  2. where multiple RAIDs are involved; Home NAS Recovery requires RAIDs fed to it one by one. If you have multiple RAIDs and you do not remember which disks form which RAID, your case is likely to end up in a data recovery lab anyway.

Talk about taking risks

This thread is about unnecessary risks.

It starts with some sort of intermittent hardware failure, probably not-quite-dead disk or something.

ReadyNAS Duo [X-RAID]. I noticed is was offline in my LAN and I couldn’t access …. The device seemed to have frozen and it wouldn’t shut down normally using the power button, I had to pull the cable. When restarting … it didn’t boot up properly and become usable again

When asked if he has a valid backup, he goes on to say

For the relatively brief periods of time I can get it up and running I can browse the shares on the network as normal. i was kind of hoping to address the cause of it falling over without having to move the data off the disks.

Well, this is bold. I daresay it is sorta kinda bordering on excessively bold. In any anomaly like this, there are only two alternative courses of action

  1. back up data from the NAS, with the disks in the NAS, or
  2. get the disks out and clone them to blank new disks.

depending on the exact situation, one or the other may be better, but backup goes first in any case.

ReadyNAS BTRFS

There are few programs to help you with ReadyNASes of late. This is because of BTRFS filesystem used by NETGEAR.

This happens to be an example case

Readynas Ultra 4 Plus – I think the Data is okay, but the Readynas is not functioning. After attempting all official methods to revive a Netgear RN-102ND NAS, I’ve removed the single WD 3TB Green drive to recover its data. Slaving it to a Windows PC, I open a popular program around here called Sysinternals Linux Reader for recovery. However…a dialogue appears stating “can’t open disk: Btrfs Volume 1 (0e34c953:data, raid) Check the disk and try again”.

Sysinternals Linux Reader does not know BTRFS. Furthermore, readers are not very good in reading damaged filesystem. By the time all official methods were attempted with no effect, you most likely need recovery software, not readers. You know, we have one, BTRFS and all.

How long to rebuild?

There is a story of routine drive replacement going belly up

[I] have a Netgear ReadyNAS NV+ (RND4410)… [containing] four 1TB-disks…I changed all disks to 2Tb-disks. As it’s a X-Raid NAS I just pulled the disk from slot 1 and mounted the first new 2tb disk…[NAS] started the sync… 3 days later it still said syncing and I could still not reach the web-interface.

I pulled the plug and rebooted the NAS.

It did a check on the NAS volume and then it said booting with the LED on slot 1 still blinking.

[30 hours later it was] stuck on Booting and can not reach it in the web-interface.

despite the NAS owner specifically states that

No smart errors detected before start.

The case still looks very much like a failure of the second disk during rebuild. The most likely reason of a NAS not being able to complete the rebuild while blinking its disk LEDs is a read failure on one of the remaining drive. The X-RAID mode in ReadyNAS is essentially a RAID5, so once the disk fails, the NAS is supposed to remain online and rebuild the content of the failed disk once said disk is replaced. Observed result is that the NAS does not even remain online. Leaving aside the possibility of the new disk being a dud, another likely reason is that the second disk failed to read a block and the NAS is now locked up, endlessly retrying the read.

What could be done in this situation?

  1. Pull the new disk out and try to start up the NAS. If the NAS starts, shut it down again and try with another blank new disk. This rules out the possibility of using a dud as replacement.
  2. If that does not work, try and put the original disk back. This further rules out any  incompatibility of the new disks with the NAS. The entire batch of new disks may be DOA or incompatible. These things happen, albeit rarely.
  3. If putting the original disk back does not help, yet another attempt should be made without the disk at all (with both original and replacement out). If the NAS comes online, first thing should be made is to back up its content, most starting with the most important.

If none of this works, we have Home NAS Recovery for you. You should give it all the disks which were in the NAS all the time. That is, you’d better leave out the original disk which was removed and any replacement disks. The only set of disks guaranteed to be in sync with each other are the disks which were never removed from the NAS.

 

ReadyNAS Ultra 4 plus

The original story (from this link):

[someone did] flip the circuit breakers without my knowledge.  Now my Readynas Ultra 4 Plus is not functioning properly.  … So thought I do some troubleshooting.  I tried to update the firmware [it won’t flash].  … then … i try disabling CIFS and re-enabling.  Now it won’t re-enable at all. Raidar reports everything is healthy.  The ReadyNas Admin page also seems like everything is okay,  However I just can’t access the data.  … So what are my options?

1) Can I go and get 4 blank drives.. Restore to factory?  Then try to put my old drives back in?

2) Do i have to get something like ReClaime or R-Studio to recover my data?

b)  If so is there any good step by step guide on how to do it?

I wish there was a way to do restore the OS to factory without losing the data.  I am sure the Raid Partition is intact, it’s just the OS is malfunctioning. …

What can we learn from this?

  1. Even in 2015, sudden power failure can kill your NAS. NASes are considered fault-tolerant, just not with all types and kinds of failures.
  2. Firmware flash on a broken NAS is unlikely to work. In other cases, it may actually make things worse, so I would advise against even trying.

Does he actually have to use something like Home NAS Recovery, ReclaiMe, or R-Studio? In all probability, yes. There is a good step-by-step guide, here. Restore the unit to factory conditions with 4 blank disks, and then swap disks for ones with data will not work. The configuration is stored on disks, and factory config after reset will only apply to the new 4 disks. Once original disks are in, the fault is back in.