Page 1 of 1

Still Degraded after replacing drive

Posted: 27 Feb 2014 13:23
by JoyMonkey
My Nas4Free box has been running 9.1.0.1 Sandstorm happily for some time now with 5 2TB drives in a raidz1 array. But a few days ago I had a drive die and a few of my other drives don't look healthy so I decided to start replacing them 1 by 1. It's been a while since I dealt with the box so I did a lot of Googling, reading through the wiki and these forums, then I took these steps...

I shutdown and physically removed the 'dead' drive from my case, replacing it with a similar drive. The replacement drive was previously used in a Freenas box, but I deleted all partitions using GParted.

In the Disks/Management tab, I clicked 'Import Disks' to get the new drive to show up, then I deleted the old drive from the listing.

I logged in via SSH and issued the command

Code: Select all

zpool replace terraid 3305624593698899328 /dev/ada1
This began resilvering, but once the resilvering completed my ZFS pool still shows as degraded...

Code: Select all

  pool: terraid
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: resilvered 1.64T in 14h2m with 4 errors on Thu Feb 27 02:02:55 2014
config:

	NAME                                            STATE     READ WRITE CKSUM
	terraid                                         DEGRADED     0     0     4
	  raidz1-0                                      DEGRADED     0     0     8
	    ada0                                        ONLINE       0     0     0
	    replacing-1                                 DEGRADED     0     0     0
	      3305624593698899328                       UNAVAIL      0     0     0  was /dev/gptid/26971116-5d0b-11e2-ac5a-b8975a2730dc
	      ada1                                      ONLINE       0     0     0
	    gptid/26fa2600-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
	    gptid/27b1c2b5-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
	    gptid/282b425b-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /mnt/terraid/1gbfile1.dat
        /mnt/terraid/1gbfile2.dat
        /mnt/terraid/12gbfile.dat
        /mnt/terraid/8gbfile.dat
I thought a scrub might help, but it still showed as Degraded after scrubbing. I restarted the box and when Nas4Free came up again, it was resilvering all over again, resulting in the same state as above.

Any ideas where I went wrong and how to remedy? Thanks!

Re: Still Degraded after replacing drive

Posted: 27 Feb 2014 14:12
by JoyMonkey
Well don't I feel dumb? :?

I solved this by detaching the old unavailable drive. After issuing this command...

Code: Select all

zpool detach terraid 3305624593698899328
The pool now shows as ONLINE instead of DEGRADED and the new drive has once again begun resilvering...

Code: Select all

pool: terraid
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Feb 27 08:05:38 2014
        25.9G scanned out of 8.22T at 209M/s, 11h24m to go
        5.18G resilvered, 0.31% done
config:

	NAME                                            STATE     READ WRITE CKSUM
	terraid                                         ONLINE       0     0     4
	  raidz1-0                                      ONLINE       0     0     8
	    ada0                                        ONLINE       0     0     0
	    ada1                                        ONLINE       0     0     0  (resilvering)
	    gptid/26fa2600-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
	    gptid/27b1c2b5-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
	    gptid/282b425b-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
So should I have issued the detach command BEFORE issuing the replace command? Man, I'm an idiot. :oops:

Re: Still Degraded after replacing drive

Posted: 27 Feb 2014 20:10
by substr
Had you ever run a scrub on the pool before the failure?

It appears you have some block errors that are uncorrectable. The files it listed are affected, so somewhere in those files will be a read error. The files might be salvageable depending on whether they can handle a gap. Otherwise, "restore from backup."

I don't think you can detach before replace with a raidz. It is best not to detach after the replace until the resilver is complete, unless you have a situation that seems to require it(dead drive killing performance, etc).

Re: Still Degraded after replacing drive

Posted: 28 Feb 2014 13:24
by JoyMonkey
This just gets weirder.
It finished resilvering the replacement drive (for a third time). And I thought I'd reboot. When it comes back up it starts another resilver on the new drive all over again.

Code: Select all

  pool: terraid
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Feb 28 05:43:31 2014
        1.13T scanned out of 8.23T at 202M/s, 10h13m to go
        231G resilvered, 13.70% done
config:

	NAME                                            STATE     READ WRITE CKSUM
	terraid                                         ONLINE       0     0     0
	  raidz1-0                                      ONLINE       0     0     0
	    ada0                                        ONLINE       0     0     0
	    ada1                                        ONLINE       0     0     0  (resilvering)
	    gptid/26fa2600-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
	    gptid/27b1c2b5-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
	    gptid/282b425b-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
Any ideas why it keeps doing this?

Re: Still Degraded after replacing drive

Posted: 28 Feb 2014 22:38
by substr
Possibly export the pool after it finishes the resilver. That might make sure the new drive is considered fully a part of the pool.

Re: Still Degraded after replacing drive

Posted: 16 Apr 2014 20:43
by pakpenyo
Yes, this happen with me.

ZFS Detected

Code: Select all

Name               Type        Pool    Devices
vol1_raidz1_0   raidz1      vol1     /dev/ada0, /dev/ada1, /dev/ada2
vol1_raidz1_1   raidz1      vol1     /dev/replacing-0, /dev/ada4, /dev/ada5
ZFS Current

Code: Select all

Name               Type       Pool    Devices
vol1_raidz1_0   raidz1     vol1     /dev/ada0, /dev/ada1, /dev/ada2
vol2_raidz1_0   raidz1     vol1     /dev/ada3, /dev/ada4, /dev/ada5
After reboot, still resilvering, and /dev/ada3/old still exist.

Code: Select all

pool: vol1
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Apr 16 18:05:07 2014
        7.56T scanned out of 9.80T at 241M/s, 2h42m to go
        1.26T resilvered, 77.18% done
config:

	NAME                        STATE     READ WRITE CKSUM
	vol1                        DEGRADED     0     0     0
	  raidz1-0                  ONLINE       0     0     0
	    ada0.nop                ONLINE       0     0     2  (resilvering)
	    ada1.nop                ONLINE       0     0     0
	    ada2.nop                ONLINE       0     0     0
	  raidz1-1                  DEGRADED     0     0     0
	    replacing-0             DEGRADED     0     0     0
	      12609113799974789003  UNAVAIL      0     0     0  was /dev/ada3/old
	      ada3                  ONLINE       0     0     0  (resilvering)
	    ada4                    ONLINE       0     0    24  (resilvering)
	    ada5                    ONLINE       0     0    30  (resilvering)

errors: Permanent errors have been detected in the following files:

        vol1/progress@auto-20131213-140000:/VIDEO/Local/Blnd/01 DEN HAAG - nicely.avi
And there are snapshot permanent error, but the .avi video is fine.
I don't know how to fix it. Maybe with detach command, but i'm not sure.

Re: Still Degraded after replacing drive

Posted: 17 Jan 2015 19:45
by Olvikolvi
Same problem here. Afrer replace and first resilvering old disk is UNAVAIL and new disk is ONLINE, but it still says replacing-1. And it started resilvering again.. Resilve, scrup or clear does not help.. Another disk is giving read errors and need to be replaced too, but dont know if allready changed disk is part of raidz or not.

Code: Select all

	NAME                                              STATE     READ WRITE CKSUM
	myfiles                                         DEGRADED     0     0   141
	  raidz2-0                                        DEGRADED     0     0   282
	    gptid/disk-xxxxxxxxa    ONLINE      16     0     0  (resilvering)
	    replacing-1                                   DEGRADED     0     0     0
	      12345678901234456                        UNAVAIL      0     0     0  was /dev/gptid/disk-xxxxxxxxb
	      gptid/disk-xxxxxxxxc  ONLINE       0     0     0  (resilvering)
	    gptid/disk-xxxxxxxxd    ONLINE       0     0     0  (resilvering)
....