Page 1 of 1

ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 08 Sep 2015 07:00
by fusion327
I recently had to pull the RAID card in my system for testing in another system. The RAID card acts as SATA ports for 2 of my drives. Decided to keep the NAS up for Read-Only use in the meantime.

However, when I did that, I noticed that only three of my six drives were showing up in my zpool, causing it to be faulted. I have checked my server's disk page and it does show 4 drives imported.

After bringing up my RAID card, all six drives show as ONLINE. I want to fix this issue in case two of my drives decide to die in the future.

Code: Select all

  pool: vpool_1
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
	replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

	NAME                     STATE     READ WRITE CKSUM
	vpool_1                  UNAVAIL      0     0     0
	  raidz2-0               UNAVAIL      0     0     0
	    269575308383878843   UNAVAIL      0     0     0  was /dev/ada0.nop
	    6668283126672377540  UNAVAIL      0     0     0  was /dev/ada1.nop
	    918738776210664435   UNAVAIL      0     0     0  was /dev/ada0.nop
	    ada1.nop             ONLINE       0     0     0
	    ada2.nop             ONLINE       0     0     0
	    ada3.nop             ONLINE       0     0     0
I am attaching some additional information in hopes it might be relevant.

My zpools were originally set up like this:
Drive0 ada0 Onboard SATA Port0
Drive1 ada1 Onboard SATA Port1
Drive2 ada2 Onboard SATA Port2
Drive3 ada3 Onboard SATA Port3
Drive4 ada4 RAID Card Port0
Drive5 ada5 RAID Card Port1

However, after one of the NAS4Free updates, the system was detecting it as such:
Drive0 ada2 Onboard SATA Port0
Drive1 ada3 Onboard SATA Port1
Drive2 ada4 Onboard SATA Port2
Drive3 ada5 Onboard SATA Port3
Drive4 ada0 RAID Card Port0
Drive5 ada1 RAID Card Port1

I then had to resilver Drive0
Drive6 ada2 Onboard SATA Port0 (Resilvered)
Drive1 ada3 Onboard SATA Port1
Drive2 ada4 Onboard SATA Port2
Drive3 ada5 Onboard SATA Port3
Drive4 ada0 RAID Card Port0
Drive5 ada1 RAID Card Port1

This is the current state that I am in. It seems like the ZFS metadata is remembering the ada0 info. Since the ada info has changed and a resilver has been done, the ada info is not lining up and causes problems when drives fail.

Is there a way to repair it without backing up the data and starting the zfs pool from scratch?

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 08 Sep 2015 08:17
by Parkcomm

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 08 Sep 2015 10:16
by raulfg3
sync webGUI an Current, in ZFS | Config | Synchronize

http://wiki.nas4free.org/doku.php?id=do ... ynchronize

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 08 Sep 2015 13:21
by Parkcomm
Raulf - fusion needs to sort out the problem ZFS has with changing device numbers before he does any synchronise.

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 08 Sep 2015 14:18
by b0ssman
the device numbers are irrelevant to zfs.

each device has metadata on it that identifies which array it belongs to regardless of the ada dev it is assigned.

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 08 Sep 2015 14:28
by Parkcomm
As you can see from the OP, if the device numbers change, it does confuse an operating pool. I think this is because the metadata actually refers to the drive number. Have a look at:

Code: Select all

zdb -l /dev/ada#
I think you can solve this problem if you export the pool and then import it, but if you just changed device numbers, like happenned above, you do get problem in FreeBSD.

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 08 Sep 2015 14:31
by b0ssman
ah yes i created mine with labels.

Code: Select all

--------------------------------------------
LABEL 2
--------------------------------------------
    version: 28
    name: 'store'
    state: 0
    txg: 5891
    pool_guid: 17045103106627381472
    hostname: ''
    top_guid: 246831764463355988
    guid: 10239615735373830627
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 246831764463355988
        nparity: 1
        metaslab_array: 30
        metaslab_shift: 35
        ashift: 12
        asize: 6001182375936
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 10239615735373830627
            path: '/dev/label/wd1'
            phys_path: '/dev/label/wd1'
            whole_disk: 1
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 6705638897804641365
            path: '/dev/label/wd2'
            phys_path: '/dev/label/wd2'
            whole_disk: 1
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 1191527253680171165
            path: '/dev/label/wd3'
            phys_path: '/dev/label/wd3'
            whole_disk: 1
            create_txg: 4

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 08 Sep 2015 14:51
by Parkcomm
You are a wise man b0ssman - you will never see this particular problem.

fusion, labelling a live system is a pain in the arse. Adding the label creates a slightly smaller partition, and if its one byte smaller you can't reattach it to the pool - zfs only allows you to go bigger!

The alternative is that when disks is to hot swap, or at least replace the faulted disks before you reboot - as long as there are good disks in good controllers the device numbers will match.

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 09 Sep 2015 04:44
by fusion327
Parkcomm. Thank you very much for confirming my suspicion. Your help is very much appreciated.

I have spent some time looking into what my options are, and was hoping you can give some input. I do actually want to fix the issue as I wouldnt bet my money on the SATA controller (it is not a good one and is prone to failure)

I currently do not have a spare SATA port on my system but I do have a storage device that can provide backup for all my data.

1. execute zpool scrub vpool_1
2. backup data
3. turn off system
4. remove 1 hard drive and format clean on another computer
5. insert back into system and turn on
6. format as zfs drive with label lab/d00 (same char count as ada0.nop to prevent running into smaller partition issue)
7. replace the 'missing' ada device with the lab/d00 and wait for resilver
8. repeat step 3-7 for remaining five drives

it seems like this is the best way I can change the labels without destroying my zpool and without using an additional SATA port.

but it seems like it is too easy for this solution to be true. Would greatly appreciate your input!

Thanks a ton!

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 09 Sep 2015 06:57
by Parkcomm
Give it a try - this link says it will work.
https://forums.freebsd.org/threads/howt ... tem.28181/

You are pretty much describing option 1

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 09 Sep 2015 08:34
by b0ssman
however in the long run it is best to have one array on one controller.

consider getting a lsi 9211-8i (ibm m1015)

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 13 Sep 2015 07:26
by fusion327
Just want to provide a quick update.

So I took a stab at getting it done with the method above. Apparently ZFS recognizes an old drive and wont let you replace a drive with an old drive.

So instead of having to dd your entire drive with zero's, the labelclear command clears all the metadata so that ZFS allows you to use the old drive to replace another drive.

So the steps would be the following:
1. Backup all data
2. zpool export -f vpool_1
3. zpool labelclear -f /dev/ada0
4. zpool import vpool_1
5. glabel label d0 ada0
6. zpool replace vpool_1 ada0.nop label/d0
7. wait for resilver
8. repeat from step 2 for remaining hard drives

As for the suggestion to replace the SATA controller card, why is it recommended to replace it? I know it is bad but I thought the whole point of ZFS is that it wouldnt care about controller cards.

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 13 Sep 2015 07:35
by b0ssman

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 13 Sep 2015 19:55
by fusion327
It seems I have made things worse....

Whenever I restart/shutdown the system, all my drives are UNAVAILABLE until a quick export/import is done

Code: Select all

  pool: vpool_1
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
	replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

	NAME                      STATE     READ WRITE CKSUM
	vpool_1                   UNAVAIL      0     0     0
	  raidz2-0                UNAVAIL      0     0     0
	    269575308383878843    UNAVAIL      0     0     0  was /dev/ada0.nop
	    6668283126672377540   UNAVAIL      0     0     0  was /dev/ada1.nop
	    918738776210664435    UNAVAIL      0     0     0  was /dev/ada0.nop
	    7595778247549049004   UNAVAIL      0     0     0  was /dev/ada1.nop
	    16678915790832657357  UNAVAIL      0     0     0  was /dev/ada2.nop
	    17195589601746205862  UNAVAIL      0     0     0  was /dev/ada3.nop
	    
Doing the zdb command, it does show that all my drives are using labels.

Code: Select all

--------------------------------------------
LABEL 0
--------------------------------------------
    version: 28
    name: 'vpool_1'
    state: 0
    txg: 10622922
    pool_guid: 11699457574323802911
    hostid: 1906542976
    hostname: 'nas.ka'
    top_guid: 186008531028276548
    guid: 15459262194439941612
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 186008531028276548
        nparity: 2
        metaslab_array: 30
        metaslab_shift: 37
        ashift: 12
        asize: 18003528253440
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 15459262194439941612
            path: '/dev/label/d0'
            phys_path: '/dev/label/d0'
            whole_disk: 1
            DTL: 110595
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 13647518853205248072
            path: '/dev/label/d1'
            phys_path: '/dev/label/d1'
            whole_disk: 1
            DTL: 110598
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 8858274501554315128
            path: '/dev/label/d2'
            phys_path: '/dev/label/d2'
            whole_disk: 1
            DTL: 110594
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 10634212950175155346
            path: '/dev/label/d3'
            phys_path: '/dev/label/d3'
            whole_disk: 1
            DTL: 110597
            create_txg: 4
        children[4]:
            type: 'disk'
            id: 4
            guid: 9236865265504671092
            path: '/dev/label/d4'
            phys_path: '/dev/label/d4'
            whole_disk: 1
            DTL: 110596
            create_txg: 4
        children[5]:
            type: 'disk'
            id: 5
            guid: 4785535645194044240
            path: '/dev/label/d5'
            phys_path: '/dev/label/d5'
            whole_disk: 1
            DTL: 111005
            create_txg: 4
    features_for_read:
--------------------------------------------
LABEL 1
--------------------------------------------
    version: 28
    name: 'vpool_1'
    state: 0
    txg: 10622922
    pool_guid: 11699457574323802911
    hostid: 1906542976
    hostname: 'nas.ka'
    top_guid: 186008531028276548
    guid: 15459262194439941612
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 186008531028276548
        nparity: 2
        metaslab_array: 30
        metaslab_shift: 37
        ashift: 12
        asize: 18003528253440
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 15459262194439941612
            path: '/dev/label/d0'
            phys_path: '/dev/label/d0'
            whole_disk: 1
            DTL: 110595
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 13647518853205248072
            path: '/dev/label/d1'
            phys_path: '/dev/label/d1'
            whole_disk: 1
            DTL: 110598
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 8858274501554315128
            path: '/dev/label/d2'
            phys_path: '/dev/label/d2'
            whole_disk: 1
            DTL: 110594
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 10634212950175155346
            path: '/dev/label/d3'
            phys_path: '/dev/label/d3'
            whole_disk: 1
            DTL: 110597
            create_txg: 4
        children[4]:
            type: 'disk'
            id: 4
            guid: 9236865265504671092
            path: '/dev/label/d4'
            phys_path: '/dev/label/d4'
            whole_disk: 1
            DTL: 110596
            create_txg: 4
        children[5]:
            type: 'disk'
            id: 5
            guid: 4785535645194044240
            path: '/dev/label/d5'
            phys_path: '/dev/label/d5'
            whole_disk: 1
            DTL: 111005
            create_txg: 4
    features_for_read:
--------------------------------------------
LABEL 2
--------------------------------------------
    version: 28
    name: 'vpool_1'
    state: 0
    txg: 10622922
    pool_guid: 11699457574323802911
    hostid: 1906542976
    hostname: 'nas.ka'
    top_guid: 186008531028276548
    guid: 15459262194439941612
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 186008531028276548
        nparity: 2
        metaslab_array: 30
        metaslab_shift: 37
        ashift: 12
        asize: 18003528253440
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 15459262194439941612
            path: '/dev/label/d0'
            phys_path: '/dev/label/d0'
            whole_disk: 1
            DTL: 110595
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 13647518853205248072
            path: '/dev/label/d1'
            phys_path: '/dev/label/d1'
            whole_disk: 1
            DTL: 110598
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 8858274501554315128
            path: '/dev/label/d2'
            phys_path: '/dev/label/d2'
            whole_disk: 1
            DTL: 110594
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 10634212950175155346
            path: '/dev/label/d3'
            phys_path: '/dev/label/d3'
            whole_disk: 1
            DTL: 110597
            create_txg: 4
        children[4]:
            type: 'disk'
            id: 4
            guid: 9236865265504671092
            path: '/dev/label/d4'
            phys_path: '/dev/label/d4'
            whole_disk: 1
            DTL: 110596
            create_txg: 4
        children[5]:
            type: 'disk'
            id: 5
            guid: 4785535645194044240
            path: '/dev/label/d5'
            phys_path: '/dev/label/d5'
            whole_disk: 1
            DTL: 111005
            create_txg: 4
    features_for_read:
--------------------------------------------
LABEL 3
--------------------------------------------
    version: 28
    name: 'vpool_1'
    state: 0
    txg: 10622922
    pool_guid: 11699457574323802911
    hostid: 1906542976
    hostname: 'nas.ka'
    top_guid: 186008531028276548
    guid: 15459262194439941612
    vdev_children: 1
    vdev_tree:
        type: 'raidz'
        id: 0
        guid: 186008531028276548
        nparity: 2
        metaslab_array: 30
        metaslab_shift: 37
        ashift: 12
        asize: 18003528253440
        is_log: 0
        create_txg: 4
        children[0]:
            type: 'disk'
            id: 0
            guid: 15459262194439941612
            path: '/dev/label/d0'
            phys_path: '/dev/label/d0'
            whole_disk: 1
            DTL: 110595
            create_txg: 4
        children[1]:
            type: 'disk'
            id: 1
            guid: 13647518853205248072
            path: '/dev/label/d1'
            phys_path: '/dev/label/d1'
            whole_disk: 1
            DTL: 110598
            create_txg: 4
        children[2]:
            type: 'disk'
            id: 2
            guid: 8858274501554315128
            path: '/dev/label/d2'
            phys_path: '/dev/label/d2'
            whole_disk: 1
            DTL: 110594
            create_txg: 4
        children[3]:
            type: 'disk'
            id: 3
            guid: 10634212950175155346
            path: '/dev/label/d3'
            phys_path: '/dev/label/d3'
            whole_disk: 1
            DTL: 110597
            create_txg: 4
        children[4]:
            type: 'disk'
            id: 4
            guid: 9236865265504671092
            path: '/dev/label/d4'
            phys_path: '/dev/label/d4'
            whole_disk: 1
            DTL: 110596
            create_txg: 4
        children[5]:
            type: 'disk'
            id: 5
            guid: 4785535645194044240
            path: '/dev/label/d5'
            phys_path: '/dev/label/d5'
            whole_disk: 1
            DTL: 111005
            create_txg: 4
    features_for_read:
(END)
Any idea why it still shows those silly ada#?

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 13 Sep 2015 23:13
by Parkcomm
I'm not sure whats gone wrong so here are some ideas:
Do you have a zfs.cache file? (the default is not NOT have a cache)

if you do "zdb -C" it will show whats in cache rather than whats on disk

You could try:
zpool import -d /dev poolname

It updates the metadata from the dev directory, but I don't know if it will use the ada? or d? labels

one thing I have never tried so you might need to read up on this but I believe it will force the use of the labels
zpool import -d /dev/label/ poolname

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 14 Sep 2015 06:52
by fusion327
Hmmm.

zdb -C does show that the cache is using the /dev/label/d#

and i did use zpool import-d /dev/label/ poolname (after i done all the resilvering)

Would it be because when I resilvered it, I didnt use zpool import -d /dev/label/ poolname? But from what I know, it shouldnt make a difference =\

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 14 Sep 2015 08:06
by Parkcomm
Would it be because when I resilvered it, I didnt use zpool import -d /dev/label/ poolname? But from what I know, it shouldnt make a difference =\
No it should not - this is an interesting one

When you did zdb -C (without a pool name) I would have expected to get an error unless you created a cache file.

I actually did create a cache file and I just did a search and found:
/boot/zfs/zpool.cache
/cf/boot/zfs/zpool.cache

If they become out of synch you could get one being used before boot and one after, maybe?

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 14 Sep 2015 08:15
by Parkcomm
Would it be because when I resilvered it, I didnt use zpool import -d /dev/label/ poolname? But from what I know, it shouldnt make a difference =\
No it should not - this is an interesting one

When you did zdb -C (without a pool name) I would have expected to get an error unless you created a cache file yourself.

I actually did create a cache file and I just did a search and found:
/boot/zfs/zpool.cache
/cf/boot/zfs/zpool.cache

If they become out of synch you could get one being used before boot and one after, maybe?

If that is the case you need to mount /cf as read/write (and don't forget to change it back to read only)

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 15 Sep 2015 00:31
by fusion327
Yes! It works now.

The two cache files were out of sync. The one in the /cf/ folder was from February.

Parkcomm, did anyone tell you you are amazing? =D Thank you so much!

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 15 Sep 2015 01:09
by Parkcomm
My pleasure.

Re: ZFS Faulted with 4/6 Drives in RAIDZ2

Posted: 16 Sep 2015 00:12
by Parkcomm
I just did this - its so easy with healthy pool

Boot from live USB/CD
Pool is exported (do not import it)

glabel label poolname0 ada0
glabel label poolname1 ada1
...
zpool import -f -d /dev/label poolname (to write the metadata)

zpool export poolname

reboot from embedded environment

I did not need to import but this might be necessary

Zpool status

Code: Select all

  pool: MightyMouse
 state: ONLINE
  scan: scrub repaired 0 in 7h42m with 0 errors on Tue Sep 15 07:42:48 2015
config:

	NAME           STATE     READ WRITE CKSUM
	MightyMouse    ONLINE       0     0     0
	  mirror-0     ONLINE       0     0     0
	    label/MM0  ONLINE       0     0     0
	    label/MM1  ONLINE       0     0     0
	  mirror-1     ONLINE       0     0     0
	    label/MM3  ONLINE       0     0     0
	    label/MM2  ONLINE       0     0     0
Bingo - nor rebuilding or resilvering required

got to Disks|ZFS|Configuration|Synchronize and hit Synchronise

All done