Creating RAID5 over existing data is bad

Irreversibly bad, in fact, if you do like here

I bought a Zyxel Nas540 … plugged my HDD’s with my data in them and created a RAID5 volume. I didn’t know it will change the file format completely. I tried to connect my hdd’s back to my computer to backup the data but all of the hdd’s are damaged. … Is there a way to restore last file system ?

No.

The RAID5 has parity interleaved with data on all disks. Once RAID5 is first synchronized, every Nth block of data on each of N disks is overwritten. Given typical block sizes used in RAIDs, this prevents any practical recovery.

 

Rebuild goes wrong

Everything looked nominal. This case is actually kind of unexpected example of routine operation going belly up, the NAS owner ends up looking for recovery options after RAID 5 crash

Recently our quite old NAS Iomega Storcenter 200rl with a RAID 5 out of 4x500GB, reported a harddisk failure and our administration replaced one of the disks that was marked. Then the Storecenter told that it was synchronizing the data. After synchronization the NAS told that 100% of storage capacity is free and so it tell until now… I am interessted in options to revocer the data.

Certainly does not look like something was done wrong; if the NAS have sensed the rebuild cannot be completed, it would have had refused to even start the process. Certainly, with a wrong disk replaced, the rebuild does not happen. In this case, however, the rebuild was completed with no reported anomaly – something else went wrong.

With something unknown going wrong, it is difficult to predict if the case is recoverable. One can give our Home NAS Recovery a spin, but there is no guarantee of success. Cases when rebuild goes wrong for no apparent reason are always dicey.

 

 

Lost RAID5 on Thecus

Trying to rebuild lost Raid5 on Thecus

I have lost a raid 5 in a 4200PRO and now I am desparatly trying to revive the Raidset.

What happened: The Raid5 over 4x 2TB WDGreen Disks degraded and I accidently marked the wrong disk as spare (this was definately my mistake, failure before always happend on drive #3, this time it was #4…), next boot the raid was gone.

[tried to fix the problem using Linux commands but to no avail].

Has anybody any experience how to get further than this?

Yes, but the success depends on what exactly was done during troubleshooting. Unless the rebuild was forced on a wrong disk order, Home NAS Recovery is a definite answer to the question.

IX4 falling apart

This documents a typical sequence of multiple drives failing in a RAID5, this time in Iomega ix4-200d.

I’ve received an automatic email from the dashboard saying

Data protection is being reconstructed. Data is available during this operation, however performance may be degraded.

After that, the NAS started ‘Data recovery procedure’ [and then came another message]

Drive number 1 encountered a recoverable error.

[and] NAS started recovery procedure from the scratch. Even though that mentioned drive has failed, everything worked fine untill yesterday [when] new message … said

Storage failed and some data loss may have occurred. Multiple drives may have either failed or been removed from your storage system. Visit the Dashboard on the management interface for details.

Is there any way to recover at least some data if it is NAS that failed?

Depending on the condition of the disks, Home NAS Recovery may or may not be able to extract data from it.

Maybe there was a spare involved, as data protected is being reconstructed in a RAID5 can only refer to a rebuild of the array. The rebuild happens either when a defective disk was replaced or when a hot spare kicks in after one of the active array disks fails. There is another variation to that tune, not really obvious, which is a transient failure causing one of the disks to drop out of the array momentarily, then report back online and be accepted back in the array.

Anyhow, while rebuild is in progress, it turns out the second drive in the array is unreadable. This halts the rebuild. The error is at first deemed recoverable, and the rebuild is retried. However, the error recovery is not successful and the second disk (#1) drops offline. With two drives offline, the data is no longer accessible.

For our data recovery software, the logical reconstruction is not a problem given that the disks are still readable enough. This may be a problem though. In worst case, the disks need to be cloned to a blank new disks, and clones then used for recovery. The NAS will not accept the clones because it already has recorded the disks as “failed”, and cloning the entire disk content also clones the “failed” marks for respective disks.

Fixing an array with mdadm goes wrong

Now this is complex.

I have an mdadm-created RAID5 array consisting of 4 discs. One of the discs was dropping out, so I decided to replace it. Somehow, this went terribly wrong and I foolishly succeeded in marking two of the (wrong) drives as faulty, and then re-adding them as spare.

Now the array is (logically) no longer able to start:

mdadm: Not enough devices to start the array.

Degraded and can’t create RAID,auto stop RAID [md1]

As I don’t want to ruin the maybe small chance I have left to rescue my data…

This sure is complicated. Obviously, if you fail two array members, RAID5 goes down. Worse yet, once this happens, it stays down. You can’t tell it to accept the spares back in a normal way. Theoretically, some more fiddling with mdadm can force the array back into shape, but I doubt it is safe given a DIY environment. If your unit is still under warranty (this particular case was with Thecus), then by all means open a ticket and ask them to fix the issue – they are pretty good with mdadm. If the case is beyond Linux repair, fall back on our Home NAS Recovery – we are pretty good too.

Dead disk during expansion

Failed expansion + dead disk = lost data

A few weeks ago I replaced one of the disks in my ReadyNAS NVX with a larger one. The expansion process seemed to complete successfully… This morning one of my disks went bad…maybe due to a brief power outage…The NAS appeared to just be in a weird state, so I powered it off cleanly, restarted it, and told it to do another scan.

Now the NAS is telling me that disk #1… is “spare”, not part of the RAID array. It says disk #3 — the one that appeared to fail this morning — is just gone. Since there are two “failed” disks, the array is “dead’ and my data is gone.

Is there anything I can do at this point?

Expansion is a fragile process. All the disks of the original set must be in perfect shape before expansion. Expansion (also called reshape, because the array geometry is changed) requires every sector on every disk to be first read and then written to.

Normally, the expansion process survives the power outage. It certainly does survive a normal shut down or an UPS-initiated shutdown. A smart UPS can tell the NAS that the power is lost and the NAS then proceeds to shut itself down without any human intervention. This is certainly not a problem. Sudden power cuts are more of a problem, but the damage, if any, is usually well contained.

However, a drive failure during expansion makes a rebuild tricky. Theoretically, the RAID is still redundant, because the reshape algorithm is designed to maintain redundancy throughout the process. In practice, once the drive fails, accounting for what is where and how to recompute data from the parity suddenly gets complicated. Any further failure results in a half-reshaped array which is a mess to fix and certainly beyond the abilities of automatic recovery.

What can be done to minimize the chance of the failure?

  1. Think twice if you need the expansion. The traditional way, used before the on-line expansion, is to back the data up, verify the backup, destroy the original  array, build a new array, copy the data back. This method still works.
  2. Have backup before expanding the array. Once you have a backup, there is no requirement to destroy the original array. You still have the expansion capability. If something goes wrong, you have a backup.
  3. If the data is not that valuable and a risk of losing it is deemed acceptable, make sure you check SMART status on all the array disks and do an extended test of the disks (if your NAS allows that).

Lenovo PX6-300D

This describes behavior of the PX6 6-bay Lenovo NAS with multiple disk failures,

I have px6-300D nas with 3TB X 6 drives. I configured it with Raid 5. Few Days back it was showing a message The amount of free space on your ‘Shares’ volume is below 5% of capacity. and asked to overwrite Drive 6…Then i contacted customer care they told that your few drives (3 or 4) has failed. … and go with some data recovery solution provide… If its NAS with raid protection my data must be protected. I really need my data back.

RAID protection is great but it has its limits. It does not protect against anything else than disk failure, and RAID5 only protects against a single disk failure. Multiple disks fail, down it goes.

Reconstucting restarts at 45% starts with 0.

That’s what it looks like when implemented by Lenovo. Other vendors will have different indications, but the end result is the same and the array cannot be rebuilt. Short of packing the disks for a data recovery service, what else can be done?

  1. Cheapest option is to remove all the disks from the NAS, clone them to a set of new disks of the same capacity, and put the clones back. The NAS will hopefully pick up the copies and completes the rebuild successfully.
  2. If the rebuild does not pick up, our Home NAS Recovery software can in all likelihood do the job.