Page 2 of 2

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 23 Dec 2015 04:43
by fletchowns
Parkcomm wrote:OK - you can work without a cache. (its a cache after all)

You cache should update every time you make a change to the pool config - if its that old it is stale.

So first get the pool back to a working state.

Mount CF as read/write etc... actually basically do the same as the link above.
How do I mount CF as read/write?

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 23 Dec 2015 05:30
by Parkcomm

Code: Select all

umount -f /cf
mount -w /dev/da0s1a /cf
https://www.freebsd.org/doc/handbook/mount-unmount.html

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 23 Dec 2015 07:40
by fletchowns
Awesome, thanks! Here's what I did:

First, quick double check of the starting point:

Code: Select all

fletchn40l: ~ # ls -la /cf/boot/zfs/zpool.cache
-rw-r--r--  1 root  wheel  2572 Jan  3  2015 /cf/boot/zfs/zpool.cache
fletchn40l: ~ # zpool status
  pool: fletch_vdev
 state: ONLINE
  scan: resilvered 2.03M in 0h0m with 0 errors on Mon Dec 21 20:28:26 2015
config:

        NAME          STATE     READ WRITE CKSUM
        fletch_vdev   ONLINE       0     0     0
          raidz2-0    ONLINE       0     0     0
            ada0.nop  ONLINE       0     0     0
            ada1.nop  ONLINE       0     0     0
            ada2.nop  ONLINE       0     0     0
            ada3.nop  ONLINE       0     0     0
            ada4.nop  ONLINE       0     0     0

errors: No known data errors
Now mount /cf as read/write:

Code: Select all

fletchn40l: ~ # cat /etc/fstab
/dev/da0a /cf ufs ro 1 1
proc /proc procfs rw 0 0
fletchn40l: ~ # mount -v
/dev/xmd0 on / (ufs, local, noatime, acls, writes: sync 341 async 24, reads: sync 679 async 12, fsid 1c0695543ee2f0e6)
devfs on /dev (devfs, local, multilabel, fsid 00ff007171000000)
/dev/xmd1 on /usr/local (ufs, local, noatime, soft-updates, acls, writes: sync 2 async 21, reads: sync 1121 async 0, fsid 1c0695548d3627ad)
procfs on /proc (procfs, local, fsid 01ff000202000000)
fletch_vdev on /mnt/fletch_vdev (zfs, NFS exported, local, nfsv4acls, fsid d055f767de7bbb48)
/dev/xmd2 on /var (ufs, local, noatime, soft-updates, acls, writes: sync 109 async 542, reads: sync 9 async 0, fsid bd197a562dad175a)
tmpfs on /var/tmp (tmpfs, local, fsid 02ff008787000000)
/dev/da0a on /cf (ufs, local, soft-updates, writes: sync 2 async 0, reads: sync 1 async 0, fsid 500795543a72c3ce)
fletchn40l: ~ # umount -f /cf
fletchn40l: ~ # mount -w /dev/da0a /cf
Still get errors on these:

Code: Select all

fletchn40l: ~ # zdb -C
cannot open '/boot/zfs/zpool.cache': No such file or directory
fletchn40l: ~ # zdb -C fletch_vdev
zdb: can't open 'fletch_vdev': No such file or directory
Now try the export & import:

Code: Select all

fletchn40l: ~ # zpool export fletch_vdev
fletchn40l: ~ # zpool import -d /dev fletch_vdev
fletchn40l: ~ # zpool status
  pool: fletch_vdev
 state: ONLINE
  scan: resilvered 2.03M in 0h0m with 0 errors on Mon Dec 21 20:28:26 2015
config:

        NAME          STATE     READ WRITE CKSUM
        fletch_vdev   ONLINE       0     0     0
          raidz2-0    ONLINE       0     0     0
            ada0.nop  ONLINE       0     0     0
            ada1.nop  ONLINE       0     0     0
            ada2.nop  ONLINE       0     0     0
            ada3.nop  ONLINE       0     0     0
            ada4.nop  ONLINE       0     0     0

errors: No known data errors
fletchn40l: ~ # zdb -C
fletch_vdev:
    version: 5000
    name: 'fletch_vdev'
    state: 0
    txg: 20387127
    pool_guid: 4714842937177408258
    hostid: 2142099219
    hostname: 'fletchn40l.local'
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 4714842937177408258
        children[0]:
            type: 'raidz'
            id: 0
            guid: 4622716132764498571
            nparity: 2
            metaslab_array: 30
            metaslab_shift: 36
            ashift: 12
            asize: 10001970626560
            is_log: 0
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 10751594757434406316
                path: '/dev/ada0.nop'
                phys_path: '/dev/ada0.nop'
                whole_disk: 1
                DTL: 189
                create_txg: 4
            children[1]:
                type: 'disk'
                id: 1
                guid: 2233689155040162778
                path: '/dev/ada1.nop'
                phys_path: '/dev/ada1.nop'
                whole_disk: 1
                DTL: 188
                create_txg: 4
            children[2]:
                type: 'disk'
                id: 2
                guid: 878045480102891058
                path: '/dev/ada2.nop'
                phys_path: '/dev/ada2.nop'
                whole_disk: 1
                DTL: 185
                create_txg: 4
            children[3]:
                type: 'disk'
                id: 3
                guid: 869598473616518127
                path: '/dev/ada3.nop'
                phys_path: '/dev/ada3.nop'
                whole_disk: 1
                DTL: 187
                create_txg: 4
            children[4]:
                type: 'disk'
                id: 4
                guid: 6785725410307062142
                path: '/dev/ada4.nop'
                phys_path: '/dev/ada4.nop'
                whole_disk: 1
                DTL: 179
                create_txg: 4
    features_for_read:
        com.delphix:hole_birth
That seems to all match what the zdb -l /dev/ada0-4 was giving me before.

What do I have for zpool.cache now?

Code: Select all

fletchn40l: ~ # find / -name "zpool.cache"
/boot/zfs/zpool.cache
/cf/boot/zfs/zpool.cache
fletchn40l: ~ # ls -la /cf/boot/zfs/zpool.cache
-rw-r--r--  1 root  wheel  2572 Jan  3  2015 /cf/boot/zfs/zpool.cache
fletchn40l: ~ # ls -la /boot/zfs/zpool.cache
-rw-r--r--  1 root  wheel  2668 Dec 23 06:22 /boot/zfs/zpool.cache
So now there's a new one, and they're different. Sounds like I'm at the same point as in that other thread, I'm not clear on what to do next though. Do I overwrite /cf/boot/zfs/zpool.cache with /boot/zfs/zpool.cache and then put /cf back to readonly? Or is /cf/boot/zfs/zpool.cache already supposed to be updated now?

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 23 Dec 2015 20:31
by Parkcomm
you can just delete /cf/boot/zfs/zpool.cache and reboot

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 23 Dec 2015 21:14
by fletchowns
Parkcomm wrote:you can just delete /cf/boot/zfs/zpool.cache and reboot
Woohoo!!! That did it! You're amazing Parkcomm!!! Thank you so much for all of your help. Do you have a favorite charity I can make a donation to?

Resilvering the new ada2 right now :)

Re: My raidz2 array comes up faulted when disk is replaced

Posted: 23 Dec 2015 21:26
by Parkcomm
Now that was the least obvious ZFS problem yet!

Glad to be of help - I've been in the same position myself.

Why not donate to the project and maybe give it a review at https://sourceforge.net/projects/nas4fr ... rce=navbar, specifically mentioning how good the community support is.