Page 1 of 2

My raidz2 array comes up faulted when disk is replaced

Posted: 18 Dec 2015 18:46
by fletchowns
I'm running NAS4Free 9.3.0.2.

I'm in the process of replacing my 2TB drives for 6TB drives. First two went fine:

Image

Image

However, after replacing the third one and booting it back up, pool status is 'faulted' instead of 'degraded' like before:

Image

I tried a couple different new drives in this slot and the same thing happened. I've also tried replacing the other remaining drives. Also tried simply removing one of the old drives and not putting in a new one. Still comes up faulted for all of these. If I swap the old one back in goes back to normal. Any ideas?

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 20 Dec 2015 03:26
by fletchowns
Still having trouble this. I ran a scrub and it completed with 0 errors. I also tried offlining ada2, It went into degraded state as expected. Then after removing the drive in ada2 and booting up, the vdev goes into failed state. I'm stumped!

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 20 Dec 2015 03:40
by daoyama
It seems you need zpool upgrade before doing. (from second image)
If you already upgraded, did you use offline for replaced drive?

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 20 Dec 2015 05:06
by Parkcomm
it looks like that is the most recent upgrade - it only turns on feature flags, I can't see that affecting replacing a drive.

This is an interesting fault - the root cause looks like when you remove the disk from the ada2 slot, three disks become unrecognised (ada0, ada1, ada2). Now the most obvious thing I can think of is that the removal of the disk causes the controller to renumber the slots.

What happens if you try (the recommended) replacement process? (I'm assuming the replacements that worked were ada0 and ada1 and the one that failed is ada2)

Code: Select all

zpool offline fletch_vdev ada2
Shutdown
Replace the drive
Reboot

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 20 Dec 2015 05:39
by fletchowns
Thank you daoyama. I ran the zpool upgrade:

Code: Select all

  pool: fletch_vdev
 state: ONLINE
  scan: resilvered 8.31M in 0h0m with 0 errors on Sun Dec 20 02:08:07 2015
config:

	NAME          STATE     READ WRITE CKSUM
	fletch_vdev   ONLINE       0     0     0
	  raidz2-0    ONLINE       0     0     0
	    ada0.nop  ONLINE       0     0     0
	    ada1.nop  ONLINE       0     0     0
	    ada2.nop  ONLINE       0     0     0
	    ada3.nop  ONLINE       0     0     0
	    ada4.nop  ONLINE       0     0     0

errors: No known data errors
Now offline ada2 for replacement, looks fine:

Code: Select all

  pool: fletch_vdev
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
	Sufficient replicas exist for the pool to continue functioning in a
	degraded state.
action: Online the device using 'zpool online' or replace the device with
	'zpool replace'.
  scan: resilvered 8.31M in 0h0m with 0 errors on Sun Dec 20 02:08:07 2015
config:

	NAME                    STATE     READ WRITE CKSUM
	fletch_vdev             DEGRADED     0     0     0
	  raidz2-0              DEGRADED     0     0     0
	    ada0.nop            ONLINE       0     0     0
	    ada1.nop            ONLINE       0     0     0
	    878045480102891058  OFFLINE      0     0     0  was /dev/ada2.nop
	    ada3.nop            ONLINE       0     0     0
	    ada4.nop            ONLINE       0     0     0

errors: No known data errors
Power it down. Physically replace drive in ada2 and power it back up. Now I have:

Code: Select all

  pool: fletch_vdev
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
	replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

	NAME                     STATE     READ WRITE CKSUM
	fletch_vdev              UNAVAIL      0     0     0
	  raidz2-0               UNAVAIL      0     0     0
	    7656447358237872088  UNAVAIL      0     0     0  was /dev/ada0.nop
	    5382396735686418498  UNAVAIL      0     0     0  was /dev/ada1.nop
	    878045480102891058   UNAVAIL      0     0     0  was /dev/ada2.nop
	    ada3.nop             ONLINE       0     0     0
	    ada4.nop             ONLINE       0     0     0
Here's what I see in disk management:
Image

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 20 Dec 2015 05:45
by fletchowns
Parkcomm wrote:it looks like that is the most recent upgrade - it only turns on feature flags, I can't see that affecting replacing a drive.

This is an interesting fault - the root cause looks like when you remove the disk from the ada2 slot, three disks become unrecognised (ada0, ada1, ada2). Now the most obvious thing I can think of is that the removal of the disk causes the controller to renumber the slots.

What happens if you try (the recommended) replacement process? (I'm assuming the replacements that worked were ada0 and ada1 and the one that failed is ada2)
Correct, the replacements that worked were ada0 and ada1, the one that failed is ada2.

Everything seems to look ok in Disk Management after replacing the drive (see previous reply) though, they don't seem to be out of order as far as I can tell. I also tried the offline and then replace.

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 20 Dec 2015 12:27
by Parkcomm
First - don't panic.
Second - this is pretty weird!

So the disk management screen is pretty clear - only one disk has changed. But you still have those three disks unavailable.

you should try:

Code: Select all

zpool online fletch_vdev ada2
and

[code
zpool online fletch_vdev[/code]

But it still doesn't doesn't tell us why three disks are unavailable.

Have a look at

Code: Select all

camcontrol devlist
camcontrol identify ada2
And see if the devices are as expected.

and print the output of

Code: Select all

zdb -l /dev/ada0
if that doesnt' work try

Code: Select all

zdb -l /dev/ada0.nop

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 20 Dec 2015 16:57
by fletchowns
Parkcomm wrote:First - don't panic.
Second - this is pretty weird!
Trying not to :) Thank you for the reply!
So the disk management screen is pretty clear - only one disk has changed. But you still have those three disks unavailable.

you should try:

Code: Select all

zpool online fletch_vdev ada2
and

Code: Select all

zpool online fletch_vdev
Tried these but got "cannot open 'fletch_vdev': pool is unavailable" and "missing device name" for the latter.
But it still doesn't doesn't tell us why three disks are unavailable.

Have a look at

Code: Select all

camcontrol devlist
camcontrol identify ada2
And see if the devices are as expected.
That seems to look OK:

Code: Select all

fletchn40l: ~ # camcontrol devlist
<WDC WD60EFRX-68MYMN1 82.00A82>    at scbus0 target 0 lun 0 (ada0,pass0)
<WDC WD60EFRX-68MYMN1 82.00A82>    at scbus1 target 0 lun 0 (ada1,pass1)
<WDC WD60EFRX-68MYMN1 82.00A82>    at scbus2 target 0 lun 0 (ada2,pass2)
<ST2000DL003-9VT166 CC45>          at scbus3 target 0 lun 0 (ada3,pass3)
<ST2000DL003-9VT166 CC45>          at scbus4 target 1 lun 0 (ada4,pass4)
<SanDisk Cruzer Fit 1.26>          at scbus6 target 0 lun 0 (da0,pass5)
fletchn40l: ~ # camcontrol identify ada2
pass2: <WDC WD60EFRX-68MYMN1 82.00A82> ATA-9 SATA 3.x device
pass2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)

protocol              ATA/ATAPI-9 SATA 3.x
device model          WDC WD60EFRX-68MYMN1
firmware revision     82.00A82
serial number         WD-redacted
WWN                   50014ee20cac138e
cylinders             16383
heads                 16
sectors/track         63
sector size           logical 512, physical 4096, offset 0
LBA supported         268435455 sectors
LBA48 supported       11721045168 sectors
PIO supported         PIO4
DMA supported         WDMA2 UDMA6
media RPM             5700

Feature                      Support  Enabled   Value           Vendor
read ahead                     yes      yes
write cache                    yes      yes
flush cache                    yes      yes
overlap                        no
Tagged Command Queuing (TCQ)   no       no
Native Command Queuing (NCQ)   yes              32 tags
NCQ Queue Management           no
NCQ Streaming                  no
Receive & Send FPDMA Queued    no
SMART                          yes      yes
microcode download             yes      yes
security                       yes      no
power management               yes      yes
advanced power management      no       no
automatic acoustic management  no       no
media status notification      no       no
power-up in Standby            yes      no
write-read-verify              no       no
unload                         yes      yes
general purpose logging        yes      yes
free-fall                      no       no
Data Set Management (DSM/TRIM) no
Host Protected Area (HPA)      yes      no      11721045168/11721045168
HPA - Security                 no
and print the output of

Code: Select all

zdb -l /dev/ada0
Nothing jumping out at me here: https://gist.github.com/fletchowns/93617d5016f12d9ff4f8
if that doesnt' work try

Code: Select all

zdb -l /dev/ada0.nop
Output is identical to previous command.

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 20 Dec 2015 21:16
by Parkcomm
I made a mistake in my previous post - I should have suggested

Code: Select all

zpool replace fletch_vdev ada2 
Although that also wont work if the pool is unavailable.

If you have free slots and free disks you could try http://docs.oracle.com/cd/E19253-01/819 ... index.html

OR

I do have a suggestion that is slightly less safe - but is potentially dangerous. If you have a backup it should be cool, especially if your setup supports hot swap. I have done it on an AHCI controller and it worked, however I believe it is not recommended.

Put the original disk back in - let it resilver etc and get the disk back to working state.
Reboot
Offline ada2 (do not reboot)
Remove ada2
Replace ada2
Wait for the disk to spin up (15 seconds)
camcontrol identify ada2 just to make sure the disk is as expected if you are paranoid (like me)
zpool online data ada2
zpool replace data ada2
zpool status

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 20 Dec 2015 23:07
by fletchowns
This is on an N40L so no hot swap and no free slots :(

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 20 Dec 2015 23:44
by Parkcomm
How are you are using ata4 (fifth disk)?

btw - I did the hotswap above on an n40l, but with the modified bios which enables hotswap, as well as six disk support - have you modified the bios?
http://homeservershow.com/forums/index. ... -features/


Also you do have one more slot - either the optical drive or the estata port on the back. Might be a PITA but you should be able to use either of those, temporarily. Obviously the OD without the modified bios will be slow, but will work.
http://lime-technology.com/forum/index. ... ic=11585.0

If anyone else is reading this thread - the above are work arounds because I cannot for the life of me work out why replacing one drive cause three drives to become unavailable. So if you have any ideas please step forward.

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 21 Dec 2015 00:00
by Parkcomm
fletchowns - have you been through the dmesg output?

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 21 Dec 2015 00:10
by Parkcomm
Btw fletchclowns

There is one more option I haven't suggested. I'm pretty sure it will work, but having three UNAVAIL disks in Raidz2 scares me a little. It's pretty simple.

Export pool vfletch_vdev
replace disk
Import pool vfletch_vdev

The pool will read the labels on the disks and reconstruct the pool, the new drive will be unavail (probably, it might start resilvering) and the other two should be fine. If the device is unavail:
zpool online vfletch_vdev ada2
zpool replace vfletch_vdev ada2

The reason i have not suggested is if the pool does not import, you might not be able to fall back. If it was my data, I would have already tried this. (and I'd be prepared to rebuild the pool from scratch if necessary)

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 21 Dec 2015 00:16
by fletchowns
Parkcomm wrote:How are you are using ata4 (fifth disk)?

btw - I did the hotswap above on an n40l, but with the modified bios which enables hotswap, as well as six disk support - have you modified the bios?
http://homeservershow.com/forums/index. ... -features/


Also you do have one more slot - either the optical drive or the estata port on the back. Might be a PITA but you should be able to use either of those, temporarily. Obviously the OD without the modified bios will be slow, but will work.
http://lime-technology.com/forum/index. ... ic=11585.0

If anyone else is reading this thread - the above are work arounds because I cannot for the life of me work out why replacing one drive cause three drives to become unavailable. So if you have any ideas please step forward.

My fifth disk is in the optical drive slot going to the SATA port next to the internal USB. I have not modified the bios. I forgot about the esata port on the back, maybe I will give that a shot.
Parkcomm wrote:fletchowns - have you been through the dmesg output?

Code: Select all

ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD60EFRX-68MYMN1 82.00A82> ATA-9 SATA 3.x device
ada0: Serial Number WD-WX71D65JEC3R
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 5723166MB (11721045168 512 byte sectors: 16H 63S/T 16383C)
ada0: quirks=0x1<4K>
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <WDC WD60EFRX-68MYMN1 82.00A82> ATA-9 SATA 3.x device
ada1: Serial Number WD-WX91D65350PR
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 5723166MB (11721045168 512 byte sectors: 16H 63S/T 16383C)
ada1: quirks=0x1<4K>
ada1: Previously was known as ad6
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <WDC WD60EFRX-68MYMN1 82.00A82> ATA-9 SATA 3.x device
ada2: Serial Number WD-WXA1D65421H7
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 5723166MB (11721045168 512 byte sectors: 16H 63S/T 16383C)
ada2: quirks=0x1<4K>
ada2: Previously was known as ad8
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <ST2000DL003-9VT166 CC45> ATA-8 SATA 3.x device
ada3: Serial Number 5YD9ZZ8V
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada3: quirks=0x1<4K>
ada3: Previously was known as ad10
ada4 at ata0 bus 0 scbus4 target 1 lun 0
ada4: <ST2000DL003-9VT166 CC45> ATA-8 SATA 3.x device
ada4: Serial Number 5YDA1EWE
ada4: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada4: quirks=0x1<4K>
ada4: Previously was known as ad1
SMP: AP CPU #1 Launched!
da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
da0: <SanDisk Cruzer Fit 1.26> Removable Direct Access SCSI-5 device
da0: Serial Number 4C532000040821115343
da0: 40.000MB/s transfers
da0: 15267MB (31266816 512 byte sectors: 255H 63S/T 1946C)
da0: quirks=0x2<NO_6_BYTE>
The only thing that looks odd to me is the "Previously known as" from this section, everything else looked normal

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 21 Dec 2015 01:36
by Parkcomm
If you are using the OD port you should definitely mod your BIOS - the ODD is limited in the stock bios to (I think) 1.5 Gbps, which will be the throughput of the whole vdev. You could get up to twice the throughput with the mod. You can see in the dmesg output its listed as SATA not SATA 2.x.

dmesg is as expected.

I don't have anything more to add to my ideas above.

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 21 Dec 2015 02:00
by Parkcomm
Dude - I've got it

Look at the output on your gihub - the devices with GUID 7656447358237872088 (from your zpool status) and 5382396735686418498 do NOT appear.

So here's what I think is happening, When you reboot ZFS finds a problem (the new and unrecognised drive) and falls back to to using GUIDs because it can't trust the adaX labels. And what happens? it finds the label file on the pool devices and then looks for those devices but can't find them.

My guess is that these disks (ada0 and ada1) have been used in zfs pools before. If so fall back to the working config with the original disk:

offline /dev/ada0

Code: Select all

zpool labelclear /dev/ada0
online or replace it and then let it resilver

If this works, try /dev/ada1 the same

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 21 Dec 2015 03:40
by fletchowns
Parkcomm wrote:Dude - I've got it

Look at the output on your gihub - the devices with GUID 7656447358237872088 (from your zpool status) and 5382396735686418498 do NOT appear.

So here's what I think is happening, When you reboot ZFS finds a problem (the new and unrecognised drive) and falls back to to using GUIDs because it can't trust the adaX labels. And what happens? it finds the label file on the pool devices and then looks for those devices but can't find them.

My guess is that these disks (ada0 and ada1) have been used in zfs pools before. If so fall back to the working config with the original disk:

offline /dev/ada0

Code: Select all

zpool labelclear /dev/ada0
online or replace it and then let it resilver

If this works, try /dev/ada1 the same
This sounds really promising! The replacement drives are brand new though, never been used in zfs pools before. I'm gonna try it anyways.

Doh, of course now when I hook up the original drive it doesn't go back to normal like it did before. I swear I've done this like 10 times now and it always went back to normal. Haven't run any destructive commands as far as I know. Not sure if it is safe to proceed with your suggestion now. Maybe I should just put the other 2 original disks back in and start over, is that possible?

Code: Select all

  pool: fletch_vdev
 state: FAULTED
status: One or more devices could not be opened.  There are insufficient
	replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

	NAME                     STATE     READ WRITE CKSUM
	fletch_vdev              FAULTED      0     0     1
	  raidz2-0               DEGRADED     0     0     6
	    7656447358237872088  UNAVAIL      0     0     0  was /dev/ada0.nop
	    5382396735686418498  UNAVAIL      0     0     0  was /dev/ada1.nop
	    ada2.nop             ONLINE       0     0     0
	    ada3.nop             ONLINE       0     0     0
	    ada4.nop             ONLINE       0     0     0

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 21 Dec 2015 04:28
by Parkcomm
worth trying - it looks to me like the on disk config thinks the old disks are still in place, so it should work.

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 21 Dec 2015 08:04
by fletchowns
Parkcomm wrote:worth trying - it looks to me like the on disk config thinks the old disks are still in place, so it should work.
It wouldn't let me do much of anything with the pool in a faulted state, so I just put the original drives back in. Unfortunately, this is what I see now:

Code: Select all

  pool: fletch_vdev
 state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from
	a backup source.
   see: http://illumos.org/msg/ZFS-8000-72
  scan: none requested
config:

	NAME          STATE     READ WRITE CKSUM
	fletch_vdev   FAULTED      0     0     1
	  raidz2-0    ONLINE       0     0     6
	    ada0.nop  ONLINE       0     0     0
	    ada1.nop  ONLINE       0     0     0
	    ada2.nop  ONLINE       0     0     0
	    ada3.nop  ONLINE       0     0     0
	    ada4.nop  ONLINE       0     0     0
I have backups, so I'm not really worried about losing data. Just disappointed with ZFS I guess, I figured replacing these drives would be routine, boring and uneventful.

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 21 Dec 2015 10:53
by Parkcomm
It usually is - you were pretty unlucky i think

you should try

Code: Select all

zpool clear -F fletch_vdev 
I'm not hopeful, but its worth a try

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 21 Dec 2015 22:06
by fletchowns
Parkcomm wrote:It usually is - you were pretty unlucky i think

you should try

Code: Select all

zpool clear -F fletch_vdev 
I'm not hopeful, but its worth a try
Well this is an interesting turn of events! I booted it up this morning to try the zpool clear and lo and behold:

Code: Select all

  pool: fletch_vdev
 state: ONLINE
  scan: resilvered 2.03M in 0h0m with 0 errors on Mon Dec 21 20:28:26 2015
config:

	NAME          STATE     READ WRITE CKSUM
	fletch_vdev   ONLINE       0     0     0
	  raidz2-0    ONLINE       0     0     0
	    ada0.nop  ONLINE       0     0     0
	    ada1.nop  ONLINE       0     0     0
	    ada2.nop  ONLINE       0     0     0
	    ada3.nop  ONLINE       0     0     0
	    ada4.nop  ONLINE       0     0     0

errors: No known data errors
This is with the last known good config (first two disks replaced successfully, three originals to go):
Image

Now I'm debating whether to try the BIOS hotswap so I can offline & replace without rebooting, or try the labelclear.

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 22 Dec 2015 01:08
by Parkcomm
You mighty have a flaky hardware. (cable or onboard controller)

I don't think labelclear is the answer - you have a label issues of some kind, but labelclear is specifically about removing a level from a previously used ZFS disk.

If the disks have never been be used before, I think something went wrong during the last replacement. Maybe you did zpool online instead of zpool replace or something.

I suggest going through the steps as per the process for one disk, then look at the zdb -l output, and see if anything looks suspicious. Then do a zpool export and zpool import. If it imports OK then the on-disk config is OK.

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 22 Dec 2015 12:57
by fletchowns
Parkcomm wrote:You mighty have a flaky hardware. (cable or onboard controller)

I don't think labelclear is the answer - you have a label issues of some kind, but labelclear is specifically about removing a level from a previously used ZFS disk.

If the disks have never been be used before, I think something went wrong during the last replacement. Maybe you did zpool online instead of zpool replace or something.

I suggest going through the steps as per the process for one disk, then look at the zdb -l output, and see if anything looks suspicious. Then do a zpool export and zpool import. If it imports OK then the on-disk config is OK.
I ran zdb -l and camcontrol identify for ada0 through ada4 before and after attempting to replace ada2 again, and everything lines up perfectly. I can't find any evidence of things being reordered. The zdb -l output after replacing ada2 doesn't look so good though:

Code: Select all

fletchn40l: ~ # zdb -l /dev/ada2
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3
I really appreciate all this help Parkcomm. Can I paypal you some beer money for your troubles?

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 22 Dec 2015 13:09
by Parkcomm
So have you replaced ada0 and ada1 - or just ada2. I'm gonna guess you did zdb -l /dev/ada2 instead of zdb -l /dev/ada2.nop (hopefully). I've seen this happen when I've checked the disk instead of a partition.

Anyway that does not look good - what does zpool status say?

No beer money thanks - but thanks for the offer

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 22 Dec 2015 20:32
by fletchowns
Parkcomm wrote:So have you replaced ada0 and ada1 - or just ada2. I'm gonna guess you did zdb -l /dev/ada2 instead of zdb -l /dev/ada2.nop (hopefully). I've seen this happen when I've checked the disk instead of a partition.

Anyway that does not look good - what does zpool status say?

No beer money thanks - but thanks for the offer
Same output for both ada2 & ada2.nop. This is with the zpool status in the original broken state (should be same as my original post):

Code: Select all

  pool: fletch_vdev
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
	replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

	NAME                     STATE     READ WRITE CKSUM
	fletch_vdev              UNAVAIL      0     0     0
	  raidz2-0               UNAVAIL      0     0     0
	    7656447358237872088  UNAVAIL      0     0     0  was /dev/ada0.nop
	    5382396735686418498  UNAVAIL      0     0     0  was /dev/ada1.nop
	    878045480102891058   UNAVAIL      0     0     0  was /dev/ada2.nop
	    ada3.nop             ONLINE       0     0     0
	    ada4.nop             ONLINE       0     0     0

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 22 Dec 2015 20:44
by Parkcomm
I helped a guys once before with a very strange failure mode - not the same as yours, but possible you have a similar issue. Have a look at this post

viewtopic.php?t=9520#p59219

It might be worth a try.

Also in the above did you try zpool export / zpool import before replaceing ada2?

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 22 Dec 2015 23:39
by fletchowns
Parkcomm wrote:I helped a guys once before with a very strange failure mode - not the same as yours, but possible you have a similar issue. Have a look at this post

viewtopic.php?t=9520#p59219

It might be worth a try.

Also in the above did you try zpool export / zpool import before replaceing ada2?
That issue does sound very similar. I only see one copy of that file though:

Code: Select all

fletchn40l: ~ # find / -name zpool.cache
/cf/boot/zfs/zpool.cache
Yup I tried the zpool export / zpool import before replacing ada2, got the same thing.

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 23 Dec 2015 01:19
by Parkcomm
What's the date mark on that file? Maybe its stale?

You could test this by checking

Code: Select all

zdb -C
zdb -C poolname
The first reads the cache file the second reads from the pool itself


Sent from my foam - stupid auto correct.

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 23 Dec 2015 04:28
by fletchowns
Parkcomm wrote:What's the date mark on that file? Maybe its stale?

You could test this by checking

Code: Select all

zdb -C
zdb -C poolname
The first reads the cache file the second reads from the pool itself


Sent from my foam - stupid auto correct.
Seems pretty old!

Code: Select all

fletchn40l: ~ # ls -la /cf/boot/zfs/zpool.cache
-rw-r--r--  1 root  wheel  2572 Jan  3  2015 /cf/boot/zfs/zpool.cache

Here's the other commands...should I have run these with the pool in a working state?

Code: Select all

fletchn40l: ~ # zdb -C
cannot open '/boot/zfs/zpool.cache': No such file or directory
fletchn40l: ~ # zdb -C fletch_vdev
zdb: can't open 'fletch_vdev': No such file or directory
fletchn40l: ~ # zpool status
  pool: fletch_vdev
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

        NAME                     STATE     READ WRITE CKSUM
        fletch_vdev              UNAVAIL      0     0     0
          raidz2-0               UNAVAIL      0     0     0
            7656447358237872088  UNAVAIL      0     0     0  was /dev/ada0.nop
            5382396735686418498  UNAVAIL      0     0     0  was /dev/ada1.nop
            878045480102891058   UNAVAIL      0     0     0  was /dev/ada2.nop
            ada3.nop             ONLINE       0     0     0
            ada4.nop             ONLINE       0     0     0

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 23 Dec 2015 04:37
by Parkcomm
OK - you can work without a cache. (its a cache after all)

You cache should update every time you make a change to the pool config - if its that old it is stale.

So first get the pool back to a working state.

Mount CF as read/write etc... actually basically do the same as the link above.