User Tools

Site Tools


faq:0149

Q: How do I remove / replace a failed disk in a ZFS array ?
A:
Make sure you identify the correct disk so you do not make a mistake and remove the wrong one. Here is the basic procedure:

This procedure applies to all types of vdevs (Stripe, Mirror RAIDz).

  1. Log into the WebGUI. Look for the device name to be replaced in Disks > ZFS > Pools > Information.
  2. Offline the bad disk in WebGUI Tab; Disks > ZFS > Pools > Tools > Step 1 if possible.
  3. Shutdown server, remove bad disk, replace with good, new disk that has been wiped clean of any old partitions or data. If you do not know how to prepare a drive for use please read Q: How can I easily, quickly and completely wipe / prepare a disk for use? If the replacement disk is not blank you will not be able to complete this procedure.
  4. Boot server and again verify the device name to be replaced in Disks > ZFS > Pools > Information.
    pool: pool0
     state: DEGRADED
    status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
    action: Attach the missing device and online it using 'zpool online'.
       see: http://illumos.org/msg/ZFS-8000-2Q
      scan: none requested
    config:
    
     NAME                     STATE     READ WRITE CKSUM
        pool0                    DEGRADED     0     0     0
          raidz1-0               DEGRADED     0     0     0
            ada1                 ONLINE       0     0     0
            5794960136178487081  UNAVAIL      0     0     0  was /dev/ada2
            ada2                 ONLINE       0     0     0
  5. From WebGUI Tab; Tools > Execute Command or from shell/CLI/SSH as “root”, issue the command 'zpool replace <poolname> <device>'. Insert the correct values for poolname and device obtained from step #4 e.g. 'zpool replace pool0 ada2'.

The ZFS vdev should begin resilvering, verify state in WebGUI Tab; Disks > ZFS > Pools > Information.

References:

For those of you comfortable with shell/CLI/SSH here is an example from Daoyama:

  1. Zpool offline the BAD disk. Note that if you don't offline the disk, it may prevent normal access to degraded pool.
  2. Power off the server if your MB does not support hotswap. Even if it does, do not take the risk of accidentally knocking a second disk off by working on a live server.
  3. Replace the BAD disk at same location.
  4. Zpool replace with same name of the BAD disk.
# zpool offline tank ada1p1
# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 0 in 8h45m with 0 errors on Wed Aug 29 14:46:37 2012
config:

        NAME                     STATE     READ WRITE CKSUM
        tank                     DEGRADED     0     0     0
          raidz2-0               DEGRADED     0     0     0
            ada0p1               ONLINE       0     0     0
            ada7p1               ONLINE       0     0     0
            ada6p1               ONLINE       0     0     0
            ada8p1               ONLINE       0     0     0
            ada4p1               ONLINE       0     0     0
            ada2p1               ONLINE       0     0     0
            8771208834592470066  OFFLINE      0     0     0  was /dev/ada0p1

errors: No known data errors
-----------(replace the disk)
# zpool replace tank 8771208834592470066 ada1p1
# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Dec  5 10:09:55 2012
        40.4G scanned out of 4.88T at 79.5M/s, 17h43m to go
        5.76G resilvered, 0.81% done
config:

        NAME                       STATE     READ WRITE CKSUM
        tank                       DEGRADED     0     0     0
          raidz2-0                 DEGRADED     0     0     0
            ada0p1                 ONLINE       0     0     0
            ada7p1                 ONLINE       0     0     0
            ada6p1                 ONLINE       0     0     0
            ada8p1                 ONLINE       0     0     0
            ada4p1                 ONLINE       0     0     0
            ada2p1                 ONLINE       0     0     0
            replacing-6            OFFLINE      0     0     0
              8771208834592470066  OFFLINE      0     0     0  was /dev/ada0p1
              ada1p1               ONLINE       0     0     0  (resilvering)

errors: No known data errors

Please note: Resilvering takes a very long time to complete.

Basic ZFS ( only! ) ⇒9.0.0.1
faq/0149.txt · Last modified: 2018/08/10 21:25 by zoon01