zpool import causes system to reboot "blkptr at 0xfffffe0003a5fa40 DVA 1 has invalid VDEV 16384"
Posted: 01 Jan 2016 19:06
Hi
I recently installed NAS4Free 10.2.0.2 - Prester (revision 2235). I setup 5x4TB drives in RAIDZ2. I transferred data from and existing FreeBSD 8 system over NFS. The copy completed without issues. While I was getting the last settings ready to migrate to the new system I noticed that ZFS had detected corruption in one directory.
So I deleted the folder and copied it from source, but the status did not change, so I decided to delete it again and run a scrub. Before running the scrub the status changed to:
The scrub ran for about 30min and then the system rebooted and then went into a boot loop. Unfortunately I could not get the details at which it would reboot.
I then did a fresh install of NAS4Free and tried to import the pool, but each time I did this the system would reboot. Out of interest I tried the import on a FreeBSD 10.2 live CD and it also reboots, so the behaviour is the same.
zpool import reports the pool0 as available
I enabled persistent logging and found the following in the system.log
I found some steps online to import the pool read-only. Which worked.
I then tried to run a check with zdb as per: http://sigtar.com/2009/10/19/opensolari ... nel-panic/
but that runs for a while then segfaults.
from the system.log
I could easily destroy the pool and start over as I still have the source system, but it looks like there is a possible bug in zpool import handling whatever issue my pool has and of course I'm curious what the problem is and how to fix it without starting over.
I did test all the drives with seatools before using them and they all passed. I am currently waiting for the smartctl long tests to complete on the drives to see if there are any other issues.
If anyone has any suggestions or wants further info please let me know.
I recently installed NAS4Free 10.2.0.2 - Prester (revision 2235). I setup 5x4TB drives in RAIDZ2. I transferred data from and existing FreeBSD 8 system over NFS. The copy completed without issues. While I was getting the last settings ready to migrate to the new system I noticed that ZFS had detected corruption in one directory.
Code: Select all
# zpool status -v
pool: pool0
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: none requested
config:
NAME STATE READ WRITE CKSUM
pool0 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ada0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
ada4 ONLINE 0 0 0
errors: Permanent errors have been detected in the following files:
/pool0/media/music/flac/Assemblage 23 - 2004 - GroundCode: Select all
errors: Permanent errors have been detected in the following files:
pool0:<0x2da8a>
I then did a fresh install of NAS4Free and tried to import the pool, but each time I did this the system would reboot. Out of interest I tried the import on a FreeBSD 10.2 live CD and it also reboots, so the behaviour is the same.
zpool import reports the pool0 as available
Code: Select all
# zpool import
pool: pool0
id: 17274685908530395963
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:
pool0 ONLINE
raidz2-0 ONLINE
ada0 ONLINE
ada1 ONLINE
ada2 ONLINE
ada3 ONLINE
ada4 ONLINE
Code: Select all
Jan 1 16:21:28 nas4free syslogd: kernel boot file is /boot/kernel/kernel
Jan 1 16:21:28 nas4free kernel: Solaris: WARNING: blkptr at 0xfffffe0003a5fa40 DVA 1 has invalid VDEV 16384
Jan 1 16:21:28 nas4free kernel:
Jan 1 16:21:28 nas4free kernel:
Jan 1 16:21:28 nas4free kernel: Fatal trap 12: page fault while in kernel mode
Jan 1 16:21:28 nas4free kernel: cpuid = 1; apic id = 01
Jan 1 16:21:28 nas4free kernel: fault virtual address = 0x50
Jan 1 16:21:28 nas4free kernel: fault code = supervisor read data, page not present
Jan 1 16:21:28 nas4free kernel: instruction pointer = 0x20:0xffffffff81e79f94
Jan 1 16:21:28 nas4free kernel: stack pointer = 0x28:0xfffffe0169ef5740
Jan 1 16:21:28 nas4free kernel: frame pointer = 0x28:0xfffffe0169ef5750
Jan 1 16:21:28 nas4free kernel: code segment = base 0x0, limit 0xfffff, type 0x1b
Jan 1 16:21:28 nas4free kernel: = DPL 0, pres 1, long 1, def32 0, gran 1
Jan 1 16:21:28 nas4free kernel: processor eflags = interrupt enabled, resume, IOPL = 0
Jan 1 16:21:28 nas4free kernel: current process = 6 (txg_thread_enter)
Jan 1 16:21:28 nas4free kernel: trap number = 12
Jan 1 16:21:28 nas4free kernel: panic: page fault
Jan 1 16:21:28 nas4free kernel: cpuid = 1
Jan 1 16:21:28 nas4free kernel: KDB: stack backtrace:
Jan 1 16:21:28 nas4free kernel: #0 0xffffffff80a86a70 at kdb_backtrace+0x60
Jan 1 16:21:28 nas4free kernel: #1 0xffffffff80a4a1d6 at vpanic+0x126
Jan 1 16:21:28 nas4free kernel: #2 0xffffffff80a4a0a3 at panic+0x43
Jan 1 16:21:28 nas4free kernel: #3 0xffffffff80ecaedb at trap_fatal+0x36b
Jan 1 16:21:28 nas4free kernel: #4 0xffffffff80ecb1dd at trap_pfault+0x2ed
Jan 1 16:21:28 nas4free kernel: #5 0xffffffff80eca87a at trap+0x47a
Jan 1 16:21:28 nas4free kernel: #6 0xffffffff80eb0c72 at calltrap+0x8
Jan 1 16:21:28 nas4free kernel: #7 0xffffffff81e8071f at vdev_mirror_child_select+0x6f
Jan 1 16:21:28 nas4free kernel: #8 0xffffffff81e802d0 at vdev_mirror_io_start+0x270
Jan 1 16:21:28 nas4free kernel: #9 0xffffffff81e9cd86 at zio_vdev_io_start+0x1d6
Jan 1 16:21:28 nas4free kernel: #10 0xffffffff81e998b2 at zio_execute+0x162
Jan 1 16:21:28 nas4free kernel: #11 0xffffffff81e991b9 at zio_nowait+0x49
Jan 1 16:21:28 nas4free kernel: #12 0xffffffff81e1c91e at arc_read+0x8fe
Jan 1 16:21:28 nas4free kernel: #13 0xffffffff81e577b2 at dsl_scan_prefetch+0xc2
Jan 1 16:21:28 nas4free kernel: #14 0xffffffff81e574a3 at dsl_scan_visitbp+0x583
Jan 1 16:21:28 nas4free kernel: #15 0xffffffff81e5722f at dsl_scan_visitbp+0x30f
Jan 1 16:21:28 nas4free kernel: #16 0xffffffff81e5722f at dsl_scan_visitbp+0x30f
Jan 1 16:21:28 nas4free kernel: Copyright (c) 1992-2015 The FreeBSD Project.Code: Select all
zpool import -F -f -o readonly=on -R /pool0 pool0
zpool status
pool: pool0
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub in progress since Wed Dec 30 13:34:03 2015
1.06T scanned out of 8.53T at 1/s, (scan is slow, no estimated time)
0 repaired, 12.45% done
config:
NAME STATE READ WRITE CKSUM
pool0 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ada0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
ada4 ONLINE 0 0 0
errors: 1 data errors, use '-v' for a listbut that runs for a while then segfaults.
Code: Select all
zdb -e -bcsvL pool0
Traversing all blocks to verify checksums ...
22.1G completed ( 59MB/s) estimated time remaining: 41hr 19min 07sec Segmentation fault
Code: Select all
nas4free kernel: pid 2264 (zdb), uid 0: exited on signal 11I did test all the drives with seatools before using them and they all passed. I am currently waiting for the smartctl long tests to complete on the drives to see if there are any other issues.
If anyone has any suggestions or wants further info please let me know.
