This is the old XigmaNAS forum in read only mode,
it will taken offline by the end of march 2021!



I like to aks Users and Admins to rewrite/take over important post from here into the new fresh main forum!
Its not possible for us to export from here and import it to the main forum!

Still resilvering after hdd replace

Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
pakpenyo
NewUser
NewUser
Posts: 6
Joined: 01 Nov 2013 12:00
Status: Offline

Still resilvering after hdd replace

Post by pakpenyo »

Hi,

I have problem with my nas after replace zfs hdd.

Code: Select all

progressnas: ~ # zpool status
  pool: vol1
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Jun 16 03:25:42 2014
        5.48T scanned out of 9.31T at 207M/s, 5h23m to go
        1.82T resilvered, 58.85% done
config:

	NAME                        STATE     READ WRITE CKSUM
	vol1                        DEGRADED     0     0    99
	  raidz1-0                  ONLINE       0     0    98
	    ada0                    ONLINE       0     0     0  (resilvering)
	    ada1.nop                ONLINE       0     0   150  (resilvering)
	    ada2.nop                ONLINE       0     0   173  (resilvering)
	  raidz1-1                  DEGRADED     0     0   100
	    replacing-0             DEGRADED     0     0     1
	      12609113799974789003  UNAVAIL      0     0     0  was /dev/ada3/old
	      ada3                  ONLINE       0     0     0  (resilvering)
	    ada4                    ONLINE       0     0    76  (resilvering)
	    ada5                    ONLINE       0     0    62  (resilvering)
I can't offline ada3/old before replace with new hdd (my system cannot boot before replace hdd). Just replace with new hdd (WDC Red 2TB), wipe new hdd with dd script (from wiki). After resilvering process, ada3/old still show at the zpool status. Trying to scrub but it's still degadred. Clear config and import disk is not help. Resilvering process started automatically after reboot.

zpool replace

Code: Select all

progressnas: ~ # zpool replace vol1 12609113799974789003 ada3
invalid vdev specification
use '-f' to override the following errors:
/dev/ada3 is part of active pool 'vol1'
zpool replace -f

Code: Select all

progressnas: ~ # zpool replace -f vol1 12609113799974789003 ada3
invalid vdev specification
the following errors must be manually repaired:
/dev/ada3 is part of active pool 'vol1'
Disks|ZFS|Configuration|Current

Code: Select all

Name	Type	Pool	Devices
vol1_raidz1_0	raidz1	vol1	/dev/ada0, /dev/ada1, /dev/ada2
vol2_raidz1_0	raidz1	vol1	/dev/ada3, /dev/ada4, /dev/ada5
Disks|ZFS|Configuration|Detected

Code: Select all

Name	Type	Pool	Devices
vol1_raidz1_0	raidz1	vol1	/dev/ada0, /dev/ada1, /dev/ada2
vol1_raidz1_1	raidz1	vol1	/dev/replacing-0, /dev/ada4, /dev/ada5
ZDB

Code: Select all

progressnas: ~ # zdb
vol1:
    version: 28
    name: 'vol1'
    state: 0
    txg: 5389466
    pool_guid: 586138567007580426
    hostname: ''
    vdev_children: 2
    vdev_tree:
        type: 'root'
        id: 0
        guid: 586138567007580426
        children[0]:
            type: 'raidz'
            id: 0
            guid: 8748666644158038467
            nparity: 1
            metaslab_array: 30
            metaslab_shift: 35
            ashift: 12
            asize: 6001182375936
            is_log: 0
            create_txg: 4
            children[0]:
                type: 'disk'
                id: 0
                guid: 8743880665745002447
                path: '/dev/ada0'
                phys_path: '/dev/ada0'
                whole_disk: 1
                DTL: 74
                create_txg: 4
            children[1]:
                type: 'disk'
                id: 1
                guid: 12713974317220211934
                path: '/dev/ada1.nop'
                phys_path: '/dev/ada1.nop'
                whole_disk: 1
                DTL: 4121
                create_txg: 4
            children[2]:
                type: 'disk'
                id: 2
                guid: 9091162303968359871
                path: '/dev/ada2.nop'
                phys_path: '/dev/ada2.nop'
                whole_disk: 1
                DTL: 4120
                create_txg: 4
        children[1]:
            type: 'raidz'
            id: 1
            guid: 17239451467821290322
            nparity: 1
            metaslab_array: 179
            metaslab_shift: 35
            ashift: 9
            asize: 6001182375936
            is_log: 0
            create_txg: 70872
            children[0]:
                type: 'replacing'
                id: 0
                guid: 12363376037576259361
                whole_disk: 0
                create_txg: 70872
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 12609113799974789003
                    path: '/dev/ada3/old'
                    phys_path: '/dev/ada3'
                    whole_disk: 1
                    not_present: 1
                    DTL: 4118
                    create_txg: 70872
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 18038151162092769873
                    path: '/dev/ada3'
                    phys_path: '/dev/ada3'
                    whole_disk: 1
                    DTL: 8196
                    create_txg: 70872
                    resilvering: 1
            children[1]:
                type: 'disk'
                id: 1
                guid: 1630761014705835153
                path: '/dev/ada4'
                phys_path: '/dev/ada4'
                whole_disk: 1
                DTL: 4116
                create_txg: 70872
            children[2]:
                type: 'disk'
                id: 2
                guid: 7221327795058766781
                path: '/dev/ada5'
                phys_path: '/dev/ada5'
                whole_disk: 1
                DTL: 4112
                create_txg: 70872
    features_for_read:
I see some topics (in this forum) about that. I see one thread not resolved and only one way is make backup to other nas, and setup new nas. Other thread still not help me. Maybe there is solution without move the file to other and setup new nas? I have other nas but not enough space to hold all the files. ;)

Thanks and apologize for bad english. :)

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2438
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: Still resilvering after hdd replace

Post by b0ssman »

the resilvering looks ok.

please post the output once the resilvering is complete
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

pakpenyo
NewUser
NewUser
Posts: 6
Joined: 01 Nov 2013 12:00
Status: Offline

Re: Still resilvering after hdd replace

Post by pakpenyo »

Complete resilvering

Code: Select all

progressnas: ~ # zpool status
  pool: vol1
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: resilvered 3.10T in 13h59m with 169 errors on Mon Jun 16 17:25:11 2014
config:

	NAME                        STATE     READ WRITE CKSUM
	vol1                        DEGRADED     0     0    61
	  raidz1-0                  ONLINE       0     0    96
	    ada0                    ONLINE       0     0     0
	    ada1.nop                ONLINE       0     0 1.21K
	    ada2.nop                ONLINE       0     0 1.17K
	  raidz1-1                  DEGRADED     0     0    26
	    replacing-0             DEGRADED     0     0     0
	      12609113799974789003  UNAVAIL      0     0     0  was /dev/ada3/old
	      ada3                  ONLINE       0     0     0
	    ada4                    ONLINE       0     0   155
	    ada5                    ONLINE       0     0   174

errors: 169 data errors, use '-v' for a list
b0ssman wrote:the resilvering looks ok.

please post the output once the resilvering is complete
As I said, resilvering process happen everytime reboot. And this problem is happen since 2 weeks ago. The scrub not helping, but i will scrub again tonight.

I see at the other forum, they recommended to shutdown the machine, remove new hdd (at my machine is ada3). Reboot without ada3. Run zpool clear and resilvering. Shutdown again, install ada3, wipe (with dd), and it's should working after resilvering again. But i'm not sure, I worried about the high risk.

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2438
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: Still resilvering after hdd replace

Post by b0ssman »

ok the old drive is still there can you try

zpool offline vol1 12609113799974789003
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

pakpenyo
NewUser
NewUser
Posts: 6
Joined: 01 Nov 2013 12:00
Status: Offline

Re: Still resilvering after hdd replace

Post by pakpenyo »

progressnas: ~ # zpool offline vol1 12609113799974789003
cannot offline 12609113799974789003: no valid replicas

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2438
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: Still resilvering after hdd replace

Post by b0ssman »

how exactly did you trigger the replacement?
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2438
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: Still resilvering after hdd replace

Post by b0ssman »

by the way. safe your data now. the resilvering process is putting a strain on your drives. if one more fails all your data will be gone.
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

substr
experienced User
experienced User
Posts: 113
Joined: 04 Aug 2013 20:21
Status: Offline

Re: Still resilvering after hdd replace

Post by substr »

Your zpool status shows that you have permanent data errors due to checksum failures across multiple drives. What is the output of zpool status -v vol1
Any files listed are corrupted. If you already have a backup of those files, do not back them up again.

At this point, I think you need to be making sure that you have a backup of all important data on that pool.

Then you need to determine why you have checksum errors across multiple drives. First step is to run MemTest on your system and make sure your RAM is good.

pakpenyo
NewUser
NewUser
Posts: 6
Joined: 01 Nov 2013 12:00
Status: Offline

Re: Still resilvering after hdd replace

Post by pakpenyo »

Sorry for late respons.
b0ssman wrote:how exactly did you trigger the replacement?
3 weeks ago my machine suddenly reboots itself and after that I see zpool status is degraded. The status of /dev/ada3 is UNAVAILABLE.

Shutdown the machine, switched to the new hdd (2TB wdc red), new sata cable, zero format (with dd script from nas4free wiki), online, and begin replace process. I see zpool resilvering process. But after complete resilvering, /dev/ada3/old still there.

b0ssman wrote:by the way. safe your data now. the resilvering process is putting a strain on your drives. if one more fails all your data will be gone.
Yes, i've done make a backup.
substr wrote:Your zpool status shows that you have permanent data errors due to checksum failures across multiple drives. What is the output of zpool status -v vol1
Any files listed are corrupted. If you already have a backup of those files, do not back them up again.

At this point, I think you need to be making sure that you have a backup of all important data on that pool.

Then you need to determine why you have checksum errors across multiple drives. First step is to run MemTest on your system and make sure your RAM is good.
I have 132 error with -v. And rsync process (to other nas) ensure this error file, but no problem with corrupted files. I have some old version of this corrupt file.

Then, i destroy the degraded pool and setup new machine, with new version of nasfree, RAIDZ2, new memory and new PSU :cry:

Post Reply

Return to “ZFS (only!)”