Page 1 of 2
My raidz2 array comes up faulted when disk is replaced
Posted: 18 Dec 2015 18:46
by fletchowns
I'm running NAS4Free 9.3.0.2.
I'm in the process of replacing my 2TB drives for 6TB drives. First two went fine:
However, after replacing the third one and booting it back up, pool status is 'faulted' instead of 'degraded' like before:
I tried a couple different new drives in this slot and the same thing happened. I've also tried replacing the other remaining drives. Also tried simply removing one of the old drives and not putting in a new one. Still comes up faulted for all of these. If I swap the old one back in goes back to normal. Any ideas?
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 20 Dec 2015 03:26
by fletchowns
Still having trouble this. I ran a scrub and it completed with 0 errors. I also tried offlining ada2, It went into degraded state as expected. Then after removing the drive in ada2 and booting up, the vdev goes into failed state. I'm stumped!
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 20 Dec 2015 03:40
by daoyama
It seems you need zpool upgrade before doing. (from second image)
If you already upgraded, did you use offline for replaced drive?
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 20 Dec 2015 05:06
by Parkcomm
it looks like that is the most recent upgrade - it only turns on feature flags, I can't see that affecting replacing a drive.
This is an interesting fault - the root cause looks like when you remove the disk from the ada2 slot, three disks become unrecognised (ada0, ada1, ada2). Now the most obvious thing I can think of is that the removal of the disk causes the controller to renumber the slots.
What happens if you try (the recommended) replacement process? (I'm assuming the replacements that worked were ada0 and ada1 and the one that failed is ada2)
Shutdown
Replace the drive
Reboot
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 20 Dec 2015 05:39
by fletchowns
Thank you daoyama. I ran the zpool upgrade:
Code: Select all
pool: fletch_vdev
state: ONLINE
scan: resilvered 8.31M in 0h0m with 0 errors on Sun Dec 20 02:08:07 2015
config:
NAME STATE READ WRITE CKSUM
fletch_vdev ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ada0.nop ONLINE 0 0 0
ada1.nop ONLINE 0 0 0
ada2.nop ONLINE 0 0 0
ada3.nop ONLINE 0 0 0
ada4.nop ONLINE 0 0 0
errors: No known data errors
Now offline ada2 for replacement, looks fine:
Code: Select all
pool: fletch_vdev
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: resilvered 8.31M in 0h0m with 0 errors on Sun Dec 20 02:08:07 2015
config:
NAME STATE READ WRITE CKSUM
fletch_vdev DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
ada0.nop ONLINE 0 0 0
ada1.nop ONLINE 0 0 0
878045480102891058 OFFLINE 0 0 0 was /dev/ada2.nop
ada3.nop ONLINE 0 0 0
ada4.nop ONLINE 0 0 0
errors: No known data errors
Power it down. Physically replace drive in ada2 and power it back up. Now I have:
Code: Select all
pool: fletch_vdev
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-3C
scan: none requested
config:
NAME STATE READ WRITE CKSUM
fletch_vdev UNAVAIL 0 0 0
raidz2-0 UNAVAIL 0 0 0
7656447358237872088 UNAVAIL 0 0 0 was /dev/ada0.nop
5382396735686418498 UNAVAIL 0 0 0 was /dev/ada1.nop
878045480102891058 UNAVAIL 0 0 0 was /dev/ada2.nop
ada3.nop ONLINE 0 0 0
ada4.nop ONLINE 0 0 0
Here's what I see in disk management:

Re: My raidz2 array comes up faulted when disk is replaced
Posted: 20 Dec 2015 05:45
by fletchowns
Parkcomm wrote:it looks like that is the most recent upgrade - it only turns on feature flags, I can't see that affecting replacing a drive.
This is an interesting fault - the root cause looks like when you remove the disk from the ada2 slot, three disks become unrecognised (ada0, ada1, ada2). Now the most obvious thing I can think of is that the removal of the disk causes the controller to renumber the slots.
What happens if you try (the recommended) replacement process? (I'm assuming the replacements that worked were ada0 and ada1 and the one that failed is ada2)
Correct, the replacements that worked were ada0 and ada1, the one that failed is ada2.
Everything seems to look ok in Disk Management after replacing the drive (see previous reply) though, they don't seem to be out of order as far as I can tell. I also tried the offline and then replace.
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 20 Dec 2015 12:27
by Parkcomm
First - don't panic.
Second - this is pretty weird!
So the disk management screen is pretty clear - only one disk has changed. But you still have those three disks unavailable.
you should try:
and
[code
zpool online fletch_vdev[/code]
But it still doesn't doesn't tell us why three disks are unavailable.
Have a look at
Code: Select all
camcontrol devlist
camcontrol identify ada2
And see if the devices are as expected.
and print the output of
if that doesnt' work try
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 20 Dec 2015 16:57
by fletchowns
Parkcomm wrote:First - don't panic.
Second - this is pretty weird!
Trying not to

Thank you for the reply!
So the disk management screen is pretty clear - only one disk has changed. But you still have those three disks unavailable.
you should try:
and
Tried these but got "cannot open 'fletch_vdev': pool is unavailable" and "missing device name" for the latter.
But it still doesn't doesn't tell us why three disks are unavailable.
Have a look at
Code: Select all
camcontrol devlist
camcontrol identify ada2
And see if the devices are as expected.
That seems to look OK:
Code: Select all
fletchn40l: ~ # camcontrol devlist
<WDC WD60EFRX-68MYMN1 82.00A82> at scbus0 target 0 lun 0 (ada0,pass0)
<WDC WD60EFRX-68MYMN1 82.00A82> at scbus1 target 0 lun 0 (ada1,pass1)
<WDC WD60EFRX-68MYMN1 82.00A82> at scbus2 target 0 lun 0 (ada2,pass2)
<ST2000DL003-9VT166 CC45> at scbus3 target 0 lun 0 (ada3,pass3)
<ST2000DL003-9VT166 CC45> at scbus4 target 1 lun 0 (ada4,pass4)
<SanDisk Cruzer Fit 1.26> at scbus6 target 0 lun 0 (da0,pass5)
fletchn40l: ~ # camcontrol identify ada2
pass2: <WDC WD60EFRX-68MYMN1 82.00A82> ATA-9 SATA 3.x device
pass2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
protocol ATA/ATAPI-9 SATA 3.x
device model WDC WD60EFRX-68MYMN1
firmware revision 82.00A82
serial number WD-redacted
WWN 50014ee20cac138e
cylinders 16383
heads 16
sectors/track 63
sector size logical 512, physical 4096, offset 0
LBA supported 268435455 sectors
LBA48 supported 11721045168 sectors
PIO supported PIO4
DMA supported WDMA2 UDMA6
media RPM 5700
Feature Support Enabled Value Vendor
read ahead yes yes
write cache yes yes
flush cache yes yes
overlap no
Tagged Command Queuing (TCQ) no no
Native Command Queuing (NCQ) yes 32 tags
NCQ Queue Management no
NCQ Streaming no
Receive & Send FPDMA Queued no
SMART yes yes
microcode download yes yes
security yes no
power management yes yes
advanced power management no no
automatic acoustic management no no
media status notification no no
power-up in Standby yes no
write-read-verify no no
unload yes yes
general purpose logging yes yes
free-fall no no
Data Set Management (DSM/TRIM) no
Host Protected Area (HPA) yes no 11721045168/11721045168
HPA - Security no
Nothing jumping out at me here:
https://gist.github.com/fletchowns/93617d5016f12d9ff4f8
Output is identical to previous command.
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 20 Dec 2015 21:16
by Parkcomm
I made a mistake in my previous post - I should have suggested
Although that also wont work if the pool is unavailable.
If you have free slots and free disks you could try
http://docs.oracle.com/cd/E19253-01/819 ... index.html
OR
I do have a suggestion that is slightly less safe - but is potentially dangerous. If you have a backup it should be cool, especially if your setup supports hot swap. I have done it on an AHCI controller and it worked,
however I believe it is not recommended.
Put the original disk back in - let it resilver etc and get the disk back to working state.
Reboot
Offline ada2 (do not reboot)
Remove ada2
Replace ada2
Wait for the disk to spin up (15 seconds)
camcontrol identify ada2 just to make sure the disk is as expected if you are paranoid (like me)
zpool online data ada2
zpool replace data ada2
zpool status
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 20 Dec 2015 23:07
by fletchowns
This is on an N40L so no hot swap and no free slots

Re: My raidz2 array comes up faulted when disk is replaced
Posted: 20 Dec 2015 23:44
by Parkcomm
How are you are using ata4 (fifth disk)?
btw - I did the hotswap above on an n40l, but with the modified bios which enables hotswap, as well as six disk support - have you modified the bios?
http://homeservershow.com/forums/index. ... -features/
Also you do have one more slot - either the optical drive or the estata port on the back. Might be a PITA but you should be able to use either of those, temporarily. Obviously the OD without the modified bios will be slow, but will work.
http://lime-technology.com/forum/index. ... ic=11585.0
If anyone else is reading this thread - the above are work arounds because I cannot for the life of me work out why replacing one drive cause three drives to become unavailable. So if you have any ideas please step forward.
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 21 Dec 2015 00:00
by Parkcomm
fletchowns - have you been through the dmesg output?
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 21 Dec 2015 00:10
by Parkcomm
Btw fletchclowns
There is one more option I haven't suggested. I'm pretty sure it will work, but having three UNAVAIL disks in Raidz2 scares me a little. It's pretty simple.
Export pool vfletch_vdev
replace disk
Import pool vfletch_vdev
The pool will read the labels on the disks and reconstruct the pool, the new drive will be unavail (probably, it might start resilvering) and the other two should be fine. If the device is unavail:
zpool online vfletch_vdev ada2
zpool replace vfletch_vdev ada2
The reason i have not suggested is if the pool does not import, you might not be able to fall back. If it was my data, I would have already tried this. (and I'd be prepared to rebuild the pool from scratch if necessary)
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 21 Dec 2015 00:16
by fletchowns
Parkcomm wrote:How are you are using ata4 (fifth disk)?
btw - I did the hotswap above on an n40l, but with the modified bios which enables hotswap, as well as six disk support - have you modified the bios?
http://homeservershow.com/forums/index. ... -features/
Also you do have one more slot - either the optical drive or the estata port on the back. Might be a PITA but you should be able to use either of those, temporarily. Obviously the OD without the modified bios will be slow, but will work.
http://lime-technology.com/forum/index. ... ic=11585.0
If anyone else is reading this thread - the above are work arounds because I cannot for the life of me work out why replacing one drive cause three drives to become unavailable. So if you have any ideas please step forward.
My fifth disk is in the optical drive slot going to the SATA port next to the internal USB. I have not modified the bios. I forgot about the esata port on the back, maybe I will give that a shot.
Parkcomm wrote:fletchowns - have you been through the dmesg output?
Code: Select all
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <WDC WD60EFRX-68MYMN1 82.00A82> ATA-9 SATA 3.x device
ada0: Serial Number WD-WX71D65JEC3R
ada0: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 5723166MB (11721045168 512 byte sectors: 16H 63S/T 16383C)
ada0: quirks=0x1<4K>
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <WDC WD60EFRX-68MYMN1 82.00A82> ATA-9 SATA 3.x device
ada1: Serial Number WD-WX91D65350PR
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 5723166MB (11721045168 512 byte sectors: 16H 63S/T 16383C)
ada1: quirks=0x1<4K>
ada1: Previously was known as ad6
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <WDC WD60EFRX-68MYMN1 82.00A82> ATA-9 SATA 3.x device
ada2: Serial Number WD-WXA1D65421H7
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 5723166MB (11721045168 512 byte sectors: 16H 63S/T 16383C)
ada2: quirks=0x1<4K>
ada2: Previously was known as ad8
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <ST2000DL003-9VT166 CC45> ATA-8 SATA 3.x device
ada3: Serial Number 5YD9ZZ8V
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada3: quirks=0x1<4K>
ada3: Previously was known as ad10
ada4 at ata0 bus 0 scbus4 target 1 lun 0
ada4: <ST2000DL003-9VT166 CC45> ATA-8 SATA 3.x device
ada4: Serial Number 5YDA1EWE
ada4: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada4: quirks=0x1<4K>
ada4: Previously was known as ad1
SMP: AP CPU #1 Launched!
da0 at umass-sim0 bus 0 scbus6 target 0 lun 0
da0: <SanDisk Cruzer Fit 1.26> Removable Direct Access SCSI-5 device
da0: Serial Number 4C532000040821115343
da0: 40.000MB/s transfers
da0: 15267MB (31266816 512 byte sectors: 255H 63S/T 1946C)
da0: quirks=0x2<NO_6_BYTE>
The only thing that looks odd to me is the "Previously known as" from this section, everything else looked normal
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 21 Dec 2015 01:36
by Parkcomm
If you are using the OD port you should definitely mod your BIOS - the ODD is limited in the stock bios to (I think) 1.5 Gbps, which will be the throughput of the whole vdev. You could get up to twice the throughput with the mod. You can see in the dmesg output its listed as SATA not SATA 2.x.
dmesg is as expected.
I don't have anything more to add to my ideas above.
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 21 Dec 2015 02:00
by Parkcomm
Dude - I've got it
Look at the output on your gihub - the devices with GUID 7656447358237872088 (from your zpool status) and 5382396735686418498 do NOT appear.
So here's what I think is happening, When you reboot ZFS finds a problem (the new and unrecognised drive) and falls back to to using GUIDs because it can't trust the adaX labels. And what happens? it finds the label file on the pool devices and then looks for those devices but can't find them.
My guess is that these disks (ada0 and ada1) have been used in zfs pools before. If so fall back to the working config with the original disk:
offline /dev/ada0
online or replace it and then let it resilver
If this works, try /dev/ada1 the same
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 21 Dec 2015 03:40
by fletchowns
Parkcomm wrote:Dude - I've got it
Look at the output on your gihub - the devices with GUID 7656447358237872088 (from your zpool status) and 5382396735686418498 do NOT appear.
So here's what I think is happening, When you reboot ZFS finds a problem (the new and unrecognised drive) and falls back to to using GUIDs because it can't trust the adaX labels. And what happens? it finds the label file on the pool devices and then looks for those devices but can't find them.
My guess is that these disks (ada0 and ada1) have been used in zfs pools before. If so fall back to the working config with the original disk:
offline /dev/ada0
online or replace it and then let it resilver
If this works, try /dev/ada1 the same
This sounds really promising! The replacement drives are brand new though, never been used in zfs pools before. I'm gonna try it anyways.
Doh, of course now when I hook up the original drive it doesn't go back to normal like it did before. I swear I've done this like 10 times now and it always went back to normal. Haven't run any destructive commands as far as I know. Not sure if it is safe to proceed with your suggestion now. Maybe I should just put the other 2 original disks back in and start over, is that possible?
Code: Select all
pool: fletch_vdev
state: FAULTED
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-3C
scan: none requested
config:
NAME STATE READ WRITE CKSUM
fletch_vdev FAULTED 0 0 1
raidz2-0 DEGRADED 0 0 6
7656447358237872088 UNAVAIL 0 0 0 was /dev/ada0.nop
5382396735686418498 UNAVAIL 0 0 0 was /dev/ada1.nop
ada2.nop ONLINE 0 0 0
ada3.nop ONLINE 0 0 0
ada4.nop ONLINE 0 0 0
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 21 Dec 2015 04:28
by Parkcomm
worth trying - it looks to me like the on disk config thinks the old disks are still in place, so it should work.
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 21 Dec 2015 08:04
by fletchowns
Parkcomm wrote:worth trying - it looks to me like the on disk config thinks the old disks are still in place, so it should work.
It wouldn't let me do much of anything with the pool in a faulted state, so I just put the original drives back in. Unfortunately, this is what I see now:
Code: Select all
pool: fletch_vdev
state: FAULTED
status: The pool metadata is corrupted and the pool cannot be opened.
action: Destroy and re-create the pool from
a backup source.
see: http://illumos.org/msg/ZFS-8000-72
scan: none requested
config:
NAME STATE READ WRITE CKSUM
fletch_vdev FAULTED 0 0 1
raidz2-0 ONLINE 0 0 6
ada0.nop ONLINE 0 0 0
ada1.nop ONLINE 0 0 0
ada2.nop ONLINE 0 0 0
ada3.nop ONLINE 0 0 0
ada4.nop ONLINE 0 0 0
I have backups, so I'm not really worried about losing data. Just disappointed with ZFS I guess, I figured replacing these drives would be routine, boring and uneventful.
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 21 Dec 2015 10:53
by Parkcomm
It usually is - you were pretty unlucky i think
you should try
I'm not hopeful, but its worth a try
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 21 Dec 2015 22:06
by fletchowns
Parkcomm wrote:It usually is - you were pretty unlucky i think
you should try
I'm not hopeful, but its worth a try
Well this is an interesting turn of events! I booted it up this morning to try the zpool clear and lo and behold:
Code: Select all
pool: fletch_vdev
state: ONLINE
scan: resilvered 2.03M in 0h0m with 0 errors on Mon Dec 21 20:28:26 2015
config:
NAME STATE READ WRITE CKSUM
fletch_vdev ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ada0.nop ONLINE 0 0 0
ada1.nop ONLINE 0 0 0
ada2.nop ONLINE 0 0 0
ada3.nop ONLINE 0 0 0
ada4.nop ONLINE 0 0 0
errors: No known data errors
This is with the last known good config (first two disks replaced successfully, three originals to go):
Now I'm debating whether to try the BIOS hotswap so I can offline & replace without rebooting, or try the labelclear.
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 22 Dec 2015 01:08
by Parkcomm
You mighty have a flaky hardware. (cable or onboard controller)
I don't think labelclear is the answer - you have a label issues of some kind, but labelclear is specifically about removing a level from a previously used ZFS disk.
If the disks have never been be used before, I think something went wrong during the last replacement. Maybe you did zpool online instead of zpool replace or something.
I suggest going through the steps as per the process for one disk, then look at the zdb -l output, and see if anything looks suspicious. Then do a zpool export and zpool import. If it imports OK then the on-disk config is OK.
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 22 Dec 2015 12:57
by fletchowns
Parkcomm wrote:You mighty have a flaky hardware. (cable or onboard controller)
I don't think labelclear is the answer - you have a label issues of some kind, but labelclear is specifically about removing a level from a previously used ZFS disk.
If the disks have never been be used before, I think something went wrong during the last replacement. Maybe you did zpool online instead of zpool replace or something.
I suggest going through the steps as per the process for one disk, then look at the zdb -l output, and see if anything looks suspicious. Then do a zpool export and zpool import. If it imports OK then the on-disk config is OK.
I ran zdb -l and camcontrol identify for ada0 through ada4 before and after attempting to replace ada2 again, and everything lines up perfectly. I can't find any evidence of things being reordered. The zdb -l output after replacing ada2 doesn't look so good though:
Code: Select all
fletchn40l: ~ # zdb -l /dev/ada2
--------------------------------------------
LABEL 0
--------------------------------------------
failed to unpack label 0
--------------------------------------------
LABEL 1
--------------------------------------------
failed to unpack label 1
--------------------------------------------
LABEL 2
--------------------------------------------
failed to unpack label 2
--------------------------------------------
LABEL 3
--------------------------------------------
failed to unpack label 3
I really appreciate all this help Parkcomm. Can I paypal you some beer money for your troubles?
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 22 Dec 2015 13:09
by Parkcomm
So have you replaced ada0 and ada1 - or just ada2. I'm gonna guess you did zdb -l /dev/ada2 instead of zdb -l /dev/ada2.nop (hopefully). I've seen this happen when I've checked the disk instead of a partition.
Anyway that does not look good - what does zpool status say?
No beer money thanks - but thanks for the offer
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 22 Dec 2015 20:32
by fletchowns
Parkcomm wrote:So have you replaced ada0 and ada1 - or just ada2. I'm gonna guess you did zdb -l /dev/ada2 instead of zdb -l /dev/ada2.nop (hopefully). I've seen this happen when I've checked the disk instead of a partition.
Anyway that does not look good - what does zpool status say?
No beer money thanks - but thanks for the offer
Same output for both ada2 & ada2.nop. This is with the zpool status in the original broken state (should be same as my original post):
Code: Select all
pool: fletch_vdev
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-3C
scan: none requested
config:
NAME STATE READ WRITE CKSUM
fletch_vdev UNAVAIL 0 0 0
raidz2-0 UNAVAIL 0 0 0
7656447358237872088 UNAVAIL 0 0 0 was /dev/ada0.nop
5382396735686418498 UNAVAIL 0 0 0 was /dev/ada1.nop
878045480102891058 UNAVAIL 0 0 0 was /dev/ada2.nop
ada3.nop ONLINE 0 0 0
ada4.nop ONLINE 0 0 0
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 22 Dec 2015 20:44
by Parkcomm
I helped a guys once before with a very strange failure mode - not the same as yours, but possible you have a similar issue. Have a look at this post
viewtopic.php?t=9520#p59219
It might be worth a try.
Also in the above did you try zpool export / zpool import before replaceing ada2?
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 22 Dec 2015 23:39
by fletchowns
Parkcomm wrote:I helped a guys once before with a very strange failure mode - not the same as yours, but possible you have a similar issue. Have a look at this post
viewtopic.php?t=9520#p59219
It might be worth a try.
Also in the above did you try zpool export / zpool import before replaceing ada2?
That issue does sound very similar. I only see one copy of that file though:
Code: Select all
fletchn40l: ~ # find / -name zpool.cache
/cf/boot/zfs/zpool.cache
Yup I tried the zpool export / zpool import before replacing ada2, got the same thing.
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 23 Dec 2015 01:19
by Parkcomm
What's the date mark on that file? Maybe its stale?
You could test this by checking
The first reads the cache file the second reads from the pool itself
Sent from my foam - stupid auto correct.
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 23 Dec 2015 04:28
by fletchowns
Parkcomm wrote:What's the date mark on that file? Maybe its stale?
You could test this by checking
The first reads the cache file the second reads from the pool itself
Sent from my foam - stupid auto correct.
Seems pretty old!
Code: Select all
fletchn40l: ~ # ls -la /cf/boot/zfs/zpool.cache
-rw-r--r-- 1 root wheel 2572 Jan 3 2015 /cf/boot/zfs/zpool.cache
Here's the other commands...should I have run these with the pool in a working state?
Code: Select all
fletchn40l: ~ # zdb -C
cannot open '/boot/zfs/zpool.cache': No such file or directory
fletchn40l: ~ # zdb -C fletch_vdev
zdb: can't open 'fletch_vdev': No such file or directory
fletchn40l: ~ # zpool status
pool: fletch_vdev
state: UNAVAIL
status: One or more devices could not be opened. There are insufficient
replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-3C
scan: none requested
config:
NAME STATE READ WRITE CKSUM
fletch_vdev UNAVAIL 0 0 0
raidz2-0 UNAVAIL 0 0 0
7656447358237872088 UNAVAIL 0 0 0 was /dev/ada0.nop
5382396735686418498 UNAVAIL 0 0 0 was /dev/ada1.nop
878045480102891058 UNAVAIL 0 0 0 was /dev/ada2.nop
ada3.nop ONLINE 0 0 0
ada4.nop ONLINE 0 0 0
Re: My raidz2 array comes up faulted when disk is replaced
Posted: 23 Dec 2015 04:37
by Parkcomm
OK - you can work without a cache. (its a cache after all)
You cache should update every time you make a change to the pool config - if its that old it is stale.
So first get the pool back to a working state.
Mount CF as read/write etc... actually basically do the same as the link above.