Page 1 of 1

is my ZFS pool unrecoverable?

Posted: 19 Jul 2013 03:28
by stangri
I have a four-disk box with the zfs raidz pool running last Daisuke's build of FreeNAS. I've decided to test out N4F again recently and it worked just fine (no crashes) for almost a week but then it crashed once, rebooted and then it entered perpetual reboot cycle. This is where things go bad:

Code: Select all

Mounting local file systems:.
ZFS filesystem version 5
ZFS storage pool version 28
Load NOP GEOM class


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x101
fault code              = supervisor write data, page not present
instruction pointer     = 0x20:0xffffffff808bfe76
stack pointer           = 0x28:0xffffff804cd9d580
frame pointer           = 0x28:0xffffff804cd9d5a0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 1019 (zpool)
trap number             = 12
panic: page fault
I've tried moving drives to another box (same make & model tho), tried different CF cards with N4F and FreeNAS, they all either lock up or crash after "Load NOP GEOM class". I've tried removing each drive separately -- same resuly. N4F and FN both load when only two drives (out of four) are present in the system but then they of course say they can't bring the pool online.

This box doesn't have a CD-drive and I don't have a usb-cd drive either. I was able to make a flash-drive with the opensolaris install which does have zfs support in shell, but I'm clueless of what to try to do.

Is there any steps I could take to try to troubleshoot each drive individually so they would again work together as a pool?

I thought ZFS had superior fault-tolerance, I'm shocked that something could happen to the drive(s) so that the system just crashes trying to bring pool online.

Can anyone help me please?

PS. I do get a CPU panic in opensolaris if I try "zpool import" as well.

Re: is my ZFS pool unrecoverable?

Posted: 22 Jul 2013 06:53
by stangri
So no suggestions like at all?

What has N4F done to my drives that they crash any ZFS-compatible OS when plugged?

Re: is my ZFS pool unrecoverable?

Posted: 22 Jul 2013 07:31
by dnar
Just a suggestion, try physically removing one drive. Try removing each drive with the remaining 3 installed.

If this helps, you have at least isolated a potential troublesome drive which can be replaced and re-silvered.

Re: is my ZFS pool unrecoverable?

Posted: 22 Jul 2013 07:36
by stangri
dnar wrote:Just a suggestion, try physically removing one drive. Try removing each drive with the remaining 3 installed.

If this helps, you have at least isolated a potential troublesome drive which can be replaced and re-silvered.
stangri wrote: I've tried moving drives to another box (same make & model tho), tried different CF cards with N4F and FreeNAS, they all either lock up or crash after "Load NOP GEOM class". I've tried removing each drive separately -- same result. N4F and FN both load when only two drives (out of four) are present in the system but then they of course say they can't bring the pool online.

Re: is my ZFS pool unrecoverable?

Posted: 22 Jul 2013 10:37
by stangri
Just to add to that, if I roll fresh nas4free image (without any config) it also of course loads on my box, but the moment I try to import pool either from WebGUI or from shell, I get kernel panic.

Code: Select all

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x248
fault code              = supervisor read, page not present
instruction pointer     = 0x20:0x80b9dbc8
stack pointer           = 0x28:0xe0054828
frame pointer           = 0x28:0xe005486c
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 2040 (zpool)
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
#0 0x80bc974f at kdb_backtrace+0x4f
#1 0x80b9617f at panic+0x16f
#2 0x80f2c363 at trap_fatal+0x323
#3 0x80f2c466 at trap_pfault+0xf6
#4 0x80f2d3ca at trap+0x44a
#5 0x80f169bc at calltrap+0x6
#6 0x80b9e457 at _sx_xlock+0x57
#7 0x8166767e at dnode_hold_impl+0x1ee
#8 0x81667a05 at dnode_hold+0x35
#9 0x816530b8 at dmu_buf_hold+0x48
#10 0x816b9d42 at zap_lockdir+0x52
#11 0x816bb6ff at zap_lookup_norm+0x4f
#12 0x816bb949 at zap_lookup+0x69
#13 0x8167b4fa at dsl_pool_open+0xfa
#14 0x81692895 at spa_load+0x615
#15 0x81695df8 at spa_tryimport+0xb8
#16 0x816e12ca at zfs_ioc_pool_tryimport+0x5a
#17 0x816e1fad at zfsdev_ioctl+0xcd
FreeBSD devs recommend compiling debug kernel and trying to run with it so get more information about the kernel panic, but building the embedded image with the debug kernel is beyond my technical abilities.

Can any Nas4Free devs build a custom debug embedded Nas4Free image so I could try to get more information about the problem?

Re: is my ZFS pool unrecoverable?

Posted: 22 Jul 2013 10:39
by kkd
What has N4F done to my drives that they crash any ZFS-compatible OS when plugged?
nothing bad, u can still use it, but u need to know: viewtopic.php?f=57&t=4568&p=24312&hilit=5000#p24312

but i don't think anyone can recover ur pool.

anyway, what kindda raid it was?

Re: is my ZFS pool unrecoverable?

Posted: 22 Jul 2013 10:45
by stangri
What do you mean "nothing"? Nas4Free crashes trying to load or import my ZFS version 5 pool. That didn't happen on its own, did it?

Re: is my ZFS pool unrecoverable?

Posted: 22 Jul 2013 12:03
by kkd
stangri wrote:What do you mean "nothing"? Nas4Free crashes trying to load or import my ZFS version 5 pool. That didn't happen on its own, did it?
ZFS pool information is damaged on ur hdds. Try google "zfs fschk".
but i don't think anyone can recover ur pool.

Re: is my ZFS pool unrecoverable?

Posted: 22 Jul 2013 12:19
by stangri
Right, fsck. For zfs. Sure.

Re: is my ZFS pool unrecoverable?

Posted: 22 Jul 2013 12:24
by kkd

Re: is my ZFS pool unrecoverable?

Posted: 23 Jul 2013 08:03
by fsbruva
kkd wrote:Try google "zfs fschk".
Is this a joke?
No fsck utility equivalent exists for ZFS.
As a moderator, I would expect you not send people on pointless errands. zfs has no fschk capabilities. It has scrubs, and that's it.

Re: is my ZFS pool unrecoverable?

Posted: 26 Jul 2013 12:22
by shakky4711
Hello Stangri,

Not sure if I understood correct if you have put the disks into a new machine and tried to boot a fresh NAS4Free Install there.

If not m my idea would be to put the Disks into another computer where you know it is an a healthy condition. Then boot a fresh NAS4Free there.

I am really surprised, never had such a case where a pool import forced the system to a reboot. A pool can get importet when everything is fine or the operating system rejects with a message why it is not able to import the pool.
In my first ZFS days I was mangling some test systems with a handful of usb thumbdrives to check what is possible and find out how to fix problems, independently what I have done never got the machine to reboot.

So first feeling was

- damaged RAM --> use memtest86+
- damaged mainboard
- damaged PSU --> test with a new one
- damaged harddisk --> test with harddrive manufacturers tools
- dust inside the system or defectice CPU fan, overheat and as a result the CPU emergency shutdown. With BIOS setting "Power on after powerloss" it would again boot up and shut dow and boot and shutdown...

Did you change anything? Hardware, Software, some new electrical installations at you home, a new powerconsuming washing machine or air conditioner?


Good luck
Shakky

Re: is my ZFS pool unrecoverable?

Posted: 26 Jul 2013 18:15
by stangri
Tried in 3 different other machines (the last one with 16Gb of RAM). zpool import command always brings them down.

Re: is my ZFS pool unrecoverable?

Posted: 26 Jul 2013 19:16
by b0ssman
i sounds like there is some corruption in the metadata.

i doubt there is any person on this forum that can help you.

what you could try:
install zfsonlinux on a linux machine and see if the error occurs there and post an issue there.
submit a freebsd ticket.
install solaris 11 and open a service request with oracle (prob required payd support)

Re: is my ZFS pool unrecoverable?

Posted: 26 Jul 2013 19:24
by stangri
Thanks, all I have at home are laptops, so I'll see if I can borrow someone else's computer long enough to install linux/solaris 11 to try.

BTW, I don't think it's a freebsd specific issue, as it's also bringing OpenIndiana down too.

So NAS4Free messed up my disks good.

Re: is my ZFS pool unrecoverable?

Posted: 15 Sep 2013 15:25
by RedneckBob
I too have an issue with my system crashing on "zfs import". It ran fine for weeks, then started crashing under load, and now it crashes on zpool import. I slow decent into complete disfunction.

When doing the initial hardware setup the motherboard and SATA controller were not the best match (PCI motherboard and PCI-X SATA card), but I eventually got it stable and the system ran fine for weeks. Later I added a SSD cache/log and the system continued to run fine for weeks.

Then one day while under load the system crashed. I rebooted, checked the log file only to find nothing, and it was fine for a few days only to crash again. I checked the hardware, reseat the RAM, reset BIOS to its defaults, checked all the drive cables, but it kept crashing under load. Then at one point it crashed and wouldn't mount my ZFS pool. Logged in from the command line, forced the zpool import, only to have it crash again. As a last ditch effort I replaced the mother board, eliminated the SATA card, removed the SSD log/cache, but the problem persists. All 5 of the harddrives pass the "smart long test", so they appear to be in good shape.

At this stage I believe my ZFS is corrupt beyond repair and I'm out of ideas and I've just about given up recovering the disks. Fortunately I was still in test mode. The idea was to replace my old trusty Thecus box with a ZFS box, but I'm a little concerned at this point. I'd be less concerned if I could pinpoint the issue, but I get nothing but a bunch of binary characters in the log file(s). If I could determine the exact problem I wouldn't be as worried going forward with ZFS as a replacement for my Thecus RAID5.

Re: is my ZFS pool unrecoverable?

Posted: 16 Sep 2013 01:57
by stangri
Welcome to the club! They're going to tell you it's your RAM.

Re: is my ZFS pool unrecoverable?

Posted: 16 Sep 2013 03:06
by hotalot
RedneckBob wrote:I too have an issue with my system crashing on "zfs import". It ran fine for weeks, then started crashing under load, and now it crashes on zpool import. I slow decent into complete disfunction.

When doing the initial hardware setup the motherboard and SATA controller were not the best match (PCI motherboard and PCI-X SATA card), but I eventually got it stable and the system ran fine for weeks. Later I added a SSD cache/log and the system continued to run fine for weeks.

Then one day while under load the system crashed. I rebooted, checked the log file only to find nothing, and it was fine for a few days only to crash again. I checked the hardware, reseat the RAM, reset BIOS to its defaults, checked all the drive cables, but it kept crashing under load. Then at one point it crashed and wouldn't mount my ZFS pool. Logged in from the command line, forced the zpool import, only to have it crash again. As a last ditch effort I replaced the mother board, eliminated the SATA card, removed the SSD log/cache, but the problem persists. All 5 of the harddrives pass the "smart long test", so they appear to be in good shape.

At this stage I believe my ZFS is corrupt beyond repair and I'm out of ideas and I've just about given up recovering the disks. Fortunately I was still in test mode. The idea was to replace my old trusty Thecus box with a ZFS box, but I'm a little concerned at this point. I'd be less concerned if I could pinpoint the issue, but I get nothing but a bunch of binary characters in the log file(s). If I could determine the exact problem I wouldn't be as worried going forward with ZFS as a replacement for my Thecus RAID5.
System crashes under load may be caused by a dying power supply. I would check it.

Re: is my ZFS pool unrecoverable?

Posted: 18 Sep 2013 20:32
by RedneckBob
hotalot wrote:
RedneckBob wrote:I too have an issue with my system crashing on "zfs import". It ran fine for weeks, then started crashing under load, and now it crashes on zpool import. I slow decent into complete disfunction.

When doing the initial hardware setup the motherboard and SATA controller were not the best match (PCI motherboard and PCI-X SATA card), but I eventually got it stable and the system ran fine for weeks. Later I added a SSD cache/log and the system continued to run fine for weeks.

Then one day while under load the system crashed. I rebooted, checked the log file only to find nothing, and it was fine for a few days only to crash again. I checked the hardware, reseat the RAM, reset BIOS to its defaults, checked all the drive cables, but it kept crashing under load. Then at one point it crashed and wouldn't mount my ZFS pool. Logged in from the command line, forced the zpool import, only to have it crash again. As a last ditch effort I replaced the mother board, eliminated the SATA card, removed the SSD log/cache, but the problem persists. All 5 of the harddrives pass the "smart long test", so they appear to be in good shape.

At this stage I believe my ZFS is corrupt beyond repair and I'm out of ideas and I've just about given up recovering the disks. Fortunately I was still in test mode. The idea was to replace my old trusty Thecus box with a ZFS box, but I'm a little concerned at this point. I'd be less concerned if I could pinpoint the issue, but I get nothing but a bunch of binary characters in the log file(s). If I could determine the exact problem I wouldn't be as worried going forward with ZFS as a replacement for my Thecus RAID5.
System crashes under load may be caused by a dying power supply. I would check it.
I'll swap RAM and report back.

Re: is my ZFS pool unrecoverable?

Posted: 18 Sep 2013 20:36
by RedneckBob
Could be the power supply, though it is a brand new 700W (way, way overkill) CoolerMaster. I have another one I could try, but that is a lot of work.

Re: is my ZFS pool unrecoverable?

Posted: 18 Sep 2013 20:38
by RedneckBob
You know, the last modification to the system was the addition of a SSD for cache/log. Anyone know the procedure for removing it from ZFS? I'd like to remove it from the equation.

Re: is my ZFS pool unrecoverable?

Posted: 19 Sep 2013 16:07
by RedneckBob
Had two 8G sticks for a total of 16G, so I pulled one stick and that didn't help. Replaced this brand with a single 4MB stick and that didn't help.

Fortunately I'm still in test mode, so I'm going to nuke the configuration, drives, and restart from scratch. This time I'm going to leave out the SSD cache/log drive as I really don't need it and I'm thinking it may be the source of my issues.

Re: is my ZFS pool unrecoverable?

Posted: 26 Sep 2013 23:26
by foo_bar
kkd wrote:
stangri wrote:What do you mean "nothing"? Nas4Free crashes trying to load or import my ZFS version 5 pool. That didn't happen on its own, did it?
ZFS pool information is damaged on ur hdds. Try google "zfs fschk".
but i don't think anyone can recover ur pool.

This post has convinced me to not choose Nas4Free for my next NAS, and to not support the nas4free project. It's one thing to simply say "I don't know", but to give completely useless advise to someone who just lost a ton of data is just mean.

Re: is my ZFS pool unrecoverable?

Posted: 18 Jan 2017 14:57
by nassie
I had a power failure recently and my nas did not came up afterwards (I have a UPS so I suppose it shut down properly). It was also in a boot loop crashing after the 'Load NOP GEOM class'. I saw so many negative posts online that I was afraid it was not possible to get to my drive anymore. I managed to install the latest nas4fee (11.0.0.4) and there I was able to import the zpool (got a warning thought that it might be in use). Just putted the original usb stick in another pc to backup the configuration, then restored that in the new installation and it seems to be working fine again. Maybe I am lucky, maybe it helps someone....