Sorry we lost some posts because of database errors!

*New 12.1 series Release:
2020-09-01: XigmaNAS 12.1.0.4.7728 - released

*New 11.4 series Release:
2020-08-27: XigmaNAS 11.4.0.4.7718 - released!


We really need "Your" help on XigmaNAS https://translations.launchpad.net/xigmanas translations. Please help today!

Producing and hosting XigmaNAS costs money. Please consider donating for our project so that we can continue to offer you the best.
We need your support! eg: PAYPAL

Resilvering after a controller failure (disks didn't fail)

Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
oldnaser
NewUser
NewUser
Posts: 1
Joined: 15 Oct 2020 17:08
Status: Offline

Resilvering after a controller failure (disks didn't fail)

#1

Post by oldnaser »

Hello, I have 6x2Tb disks in a z2 pool, and recently had one of my two SATA controllers fail. 2 out of 6 disks were on that controller, so, the data was ok but the pool went into a (heavily) degraded state. Unfortunately, I did not immediately notice the controller failure AND did not take immediate action after I noticed the controller failure. I did, however, backup most (but not all) of the data from the degraded pool, before shutting the system down.

I now have a new controller and tested the controller and system stability with different disks (while the 6 "production" disks are disconnected). I also upgraded to the latest Xigmanas from the old NAS4Free version I was running (primarily to make sure I have the latest LSI drivers - the new controller is LSI).

What would be the best way to restore the full integrity of the original z2 pool?

Option 1) Physically reconnect all 6 disks, import all, and let the system do its magic? My concern is that the 2 disks that were on the failed controller are out of sync, and will need at least partial resilvering, putting them under stress. However, the pool data was mostly read only (family photos etc.), so, the incremental updates to the 2 disks would be a fraction of the total data. Is ZFS smart enough to only resilver required data on the 2 disks or would it need to read/write the entire disks?

Option 2) Physically reconnect only the 4 disks that were "up to date and in sync" at the time of shutdown, backup ALL data (I now have extra HDDs for that), then reconnect the remaining 2 disks, let it resilver as needed.

arghdubya
NewUser
NewUser
Posts: 10
Joined: 05 May 2018 02:31
Status: Offline

Re: Resilvering after a controller failure (disks didn't fail)

#2

Post by arghdubya »

So there should be no harm in plugging them up with the new controller and seeing what happens.
Since it's a fresh boot, the pool will see those drives and attempt to roll them forward... badda bing.
(So you actually complicated things by upgrading... it has the old config? )
... But like you said if too much time passed they may be too far behind.
So if that is the case, the pool should be up and DEGRADED and you can do a backup.
So now... you kinda have no choice but to wipe the unrecognized drives (MBR/GPT clear and zeroing the first gb is sometimes enough offline with another computer) and do a replace drive for drive.
If you are concerned about stressing the drives, check the SMART data for the drives before the replacing. If one of the good 4 looks iffy, then sector by sector copy to a new drive (ddrescue is good for this)... mark the old and wipe it if the new drive takes.(i.e.. don't confuse ZFS)
If a drive fails on the replaced controller you aren't any worse off... if one finishes you are way better off.
Making the heads flick around isn't normally how you fail a drive... it's just a time thing where the heads and platter spacing is off or platter degradation. A happy drive is like a sled dog, it likes to work, and better to find out you have a sick dog when you're prepared for it vs. pampering.

Doing scrubs occasionally is not what kills drives, that's just finding out it's failing when you choose not when it chooses. I think if you wait you increase the chances of more than one drive failing at once.. Like .. if you don't test anyone, no one has COVID?

-EDIT I think your worry is that it recognizes the disconnected drives enough to do a full re-silver on them and you think a drive will fail while it's scrubbing (same thing basically). [If so you could cause the drive to fail during a full backup while it's degraded.. it'll be doing the same thing]. Yes check the SMART data to get a feel for the drives. I think you're ok to let it rip. You might hit some bad sectors, but the chances of an out-right failure is very low. If you don't have the critical stuff backed up at all then, sure, bringing it up DEGRADED and copying that stuff is wise. You never know.
1: N40L 4gb [(6) int 2tb +esata mediasonic (4) 2tb] 20tb's in Z2 , Marvel based eSata, boot on ZFSonRoot
2: Noontec (J1800 based) 4gb 2bay NAS with 1tb mirror, dedup on, rev5074, USB boot
3: UNAS 4bay Intel DH61DL 8gb, Marvel MiniPCIe AHCI, SDD boot root-on-ZFS , (6) 3tb in Z1 (10tb total currently)
& 4 QNAPs, 1 Synology, Win2008 EX495, Vsphere 6.7, Win Server 2016 AD

Post Reply

Return to “ZFS (only!)”