*New 11.3 series Release:
2019-10-19: XigmaNAS 11.3.0.4.7014 - released

*New 12.0 series Release:
2019-10-05: XigmaNAS 12.0.0.4.6928 - released!

*New 11.2 series Release:
2019-09-23: XigmaNAS 11.2.0.4.6881 - released!

We really need "Your" help on XigmaNAS https://translations.launchpad.net/xigmanas translations. Please help today!

Producing and hosting XigmaNAS costs money. Please consider donating for our project so that we can continue to offer you the best.
We need your support! eg: PAYPAL

Question about ad*.nop devices, RAID-Z2 drive replacements

Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
PaganGod
Status: Offline

Question about ad*.nop devices, RAID-Z2 drive replacements

#1

Post by PaganGod » 26 Mar 2013 00:45

I have a RAID-Z2 of drives of mixed sizes. One of the 1TB drives was throwing a lot of SMART failures, so I OFFLINEd that disk and shut the system down. I replaced it with an identical sized 1TB from the same manufacturer, on same SATA port, and powered box back up. The new drive was detected but I could not get the Disks -> ZFS -> Pools -> Tools to let me do the replace command. When I looked in Disks -> Management it was reporting some issue with disk ada5, which was the one I had OFFLINEd and replaced.

I ended up deleting disk ada5 in Disks -> Management and it gave some warning about the drive being removed from all configuration. I accepted and after doing so I realized ada5 still could not be selected with the 'replace' command in Disks -> ZFS -> Pools -> Tools because as far as the system was concerned there was no ada5 disk to replace.

I tried adding the new disk as a hot spare, but it was not picked up automatically, contrary to what I had expected. Anyway, I deleted the hotspare vDev on the new ada5 disk and issued the following command via SSH session, where 3197540925103661344 is the label the old ada5 was showing before I physically removed it:

Code: Select all

zpool replace zPool1 3197540925103661344 /dev/ada5.nop
This was not accepted so I issued the following slightly modified command:

Code: Select all

zpool replace zPool1 3197540925103661344 /dev/ada5
This was accepted and now I can see the zpool rebuilding, but with ada5 not showing .nop extensions:

Code: Select all

pool: zPool1
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Mar 25 15:57:40 2013
        210G scanned out of 400G at 212M/s, 0h15m to go
        35.0G resilvered, 52.46% done
config:

	NAME                       STATE     READ WRITE CKSUM
	zPool1                     DEGRADED     0     0     1
	  raidz2-0                 DEGRADED     0     0     2
	    ada0.nop               ONLINE       0     0     3  (resilvering)
	    ada1.nop               ONLINE       0     0     1  (resilvering)
	    ada2.nop               ONLINE       0     0     0
	    ada3.nop               ONLINE       0     0     2  (resilvering)
	    ada4.nop               ONLINE       0     0     0
	    replacing-5            OFFLINE      0     0     0
	      3197540925103661344  OFFLINE      0     0     0  was /dev/ada5.nop
	      ada5                 ONLINE       0     0     0  (resilvering)
I believe I misunderstood the use of the "Advanced Format" option and may not have needed it at all. I do have two WD 2TB "Red" drives which I think might use 512K sector emulation, but I am not sure. Do I need to be concerned that one of the disks in the vdev underlying the zpool is not showing the .nop extension whereas all the others are? Did I do something incorrectly, like removing the disk in Disks -> Management, which I should not have done? A lot of my actions were in pursuit of finding out how to replace a failed disk without having to issue a command line command and only using the GUI, because what I read and saw seemed to suggest that should be supported.

In case it is of any value, here is my zpool history output:

Code: Select all

History for 'zPool1':
2013-03-21.16:47:23 zpool create -f -m /mnt/zPool1 zPool1 raidz2 /dev/ada0.nop /dev/ada1.nop /dev/ada2.nop /dev/ada3.nop /dev/ada4.nop /dev/ada5.nop
2013-03-21.16:50:28 zfs create -o compression=off -o dedup=off -o sync=standard -o atime=on zPool1/VMware
2013-03-21.16:58:56 zfs create -o compression=off -o dedup=off -o sync=standard -o atime=on zPool1/FileShare
2013-03-21.17:34:27 zpool offline zPool1 ada5.nop
2013-03-21.18:16:09 zfs set sync=disabled zPool1/VMware
2013-03-25.15:57:49 zpool replace zPool1 3197540925103661344 /dev/ada5
EDIT:
Now I am seeing the following under Disks -> ZFS -> Pools -> Information

Code: Select all

  pool: zPool1
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: resilvered 66.6G in 0h33m with 2 errors on Mon Mar 25 16:31:14 2013
config:

	NAME                       STATE     READ WRITE CKSUM
	zPool1                     DEGRADED     0     0     1
	  raidz2-0                 DEGRADED     0     0     2
	    ada0.nop               ONLINE       0     0     4
	    ada1.nop               ONLINE       0     0     1
	    ada2.nop               ONLINE       0     0     1
	    ada3.nop               ONLINE       0     0     0
	    ada4.nop               ONLINE       0     0     0
	    replacing-5            DEGRADED     0     0     0
	      3197540925103661344  OFFLINE      0     0     0  was /dev/ada5.nop
	      ada5                 ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /mnt/zPool1/VMware/TESTBED-VIR/TESTBED-VIR_1-flat.vmdk
        /mnt/zPool1/VMware/WEBDEV8/WEBDEV8-flat.vmdk
And I am certain I saw the the system reboot during the resilvering, though it has been 100% reliable for the last week of use before replacing the failed drive. The reboot was confirmed by checking the uptime once the system came back up.

And here is the details of the system:

Code: Select all

System information
Version      9.1.0.1 - Sandstorm (revision 636)
Build date   Tue Feb 5 01:22:23 CET 2013
Platform OS  FreeBSD 9.1-RELEASE (reldate 901000)
Platform     x64-embedded on Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz
System       Gigabyte Technology Co., Ltd. Z68X-UD3H-B3

User avatar
crowi
Forum Moderator
Forum Moderator
Posts: 1184
Joined: 21 Feb 2013 16:18
Location: Munich, Germany
Status: Offline

Re: Question about ad*.nop devices, RAID-Z2 drive replacemen

#2

Post by crowi » 26 Mar 2013 09:38

Hi Pagangod,

can you check your Zpool and post the output here:

Code: Select all

zpool status -x
then read this thread:
viewtopic.php?f=59&t=1494
and follow the instructions by fsbruva (third post from the top), I had a similar problem with *.nops showing up
and could solve it.
shortly do the following

Code: Select all

zdb | grep ashift
should give you the ashift value 12 (hopefully)

Code: Select all

zpool export {poolname}
or

Code: Select all

zpool export {poolname} -f 
(forced mode)

Code: Select all

gnop destroy /dev/ad0.nop /dev/ad1.nop
(put here all the drives which show up with .nop)

Code: Select all

zpool import {poolname}
then check the status again:

Code: Select all

zpool status -x
good luck!
NAS 1: Milchkuh: Asrock C2550D4I, Intel Avoton C2550 Quad-Core, 16GB DDR3 ECC, 5x3TB WD Red RaidZ1 +60 GB SSD for ZIL/L2ARC, APC-Back UPS 350 CS, NAS4Free 11.0.0.4.3460 embedded
NAS 2: Backup: HP N54L, 8 GB ECC RAM, 4x4 TB WD Red, RaidZ1, NAS4Free 11.0.0.4.3460 embedded
NAS 3: Office: HP N54L, 8 GB ECC RAM, 2x3 TB WD Red, ZFS Mirror, APC-Back UPS 350 CS NAS4Free 11.0.0.4.3460 embedded

PaganGod
Status: Offline

Re: Question about ad*.nop devices, RAID-Z2 drive replacemen

#3

Post by PaganGod » 26 Mar 2013 18:23

Thanks, crowi, but I am actually not really concerned about the one device in the vdev not using gnop translation. The performance of the array, both locally and over the network, meets or exceeds my expectations and requirements.

I am most concerned with the fact that the pool is still showing degraded, and the strange persistance of the failed drive identifier even after the resilver, and the strange "replacing-5" device. I had expected to just see something like ada5 in place of the "replacing-5" entry after the resilver.

At this point I am considering installing an additional drive as a temporary measure, copying all the data from the zpool datasets over to it, and blowing away the zpool/vdev then recreating it and moving the data back. Any thoughts/suggestions on this proposal from anyone? Of course, if there is a way to fix my existing vdev/zpool I am wide open to that.

PaganGod
Status: Offline

RE: Question about ad*.nop devices, RAID-Z2 drive replacemen

#4

Post by PaganGod » 28 Mar 2013 16:46

SOLVED!

Turns out the detected file corruption prevents the scrub or resilver command from completing. Once I was able to back up the VMware VMDK (virtual disk) file - which by the way required me to use Ghost inside the virtual machine with the option to ignore bad blocks - and switched in the copied virtual disk for the corrupt one, I was able to delete the corrupt file and run a scrub. This time it found and corrected some inconsistencies and now I have a normal pool:

Code: Select all

  pool: zPool1
 state: ONLINE
  scan: scrub repaired 192K in 0h21m with 0 errors on Wed Mar 27 18:02:19 2013
config:

	NAME          STATE     READ WRITE CKSUM
	zPool1        ONLINE       0     0     0
	  raidz2-0    ONLINE       0     0     0
	    ada0.nop  ONLINE       0     0     0
	    ada1.nop  ONLINE       0     0     0
	    ada2.nop  ONLINE       0     0     0
	    ada3.nop  ONLINE       0     0     0
	    ada4.nop  ONLINE       0     0     0
	    ada5      ONLINE       0     0     0

errors: No known data errors
I should mention I did run "zpool clear" via WebGUI to get the error counts reset, so I can tell if I have a new problem in the future. I will still need to swap out several of the devices and autogrow the pool to get where I want to be with regards to disks in the vdev and total storage capacity I want, but I do think that is all very achievable.

I do still have the issue with the replaced ada5 device showing up without the .nop prefix but from the reply I received above and information in other threads, I don't think this is a problem, and in fact think I should probably try to get all the devices off of GEOM translation.

The takeaway here is that you have to get rid of reported corrupt files for resilver or scrub to complete and return your pool and underlying vdev to a normal, healthy state.

Post Reply

Return to “ZFS (only!)”