an odd corruption
Posted: 27 Jun 2014 03:34
Firstly, this is using straight FreeBSD 9.2, not NAS4Free 9.2. The hardware is Xeon w/ECC RAM. The pool is a single mirror vdev plus a single SSD w/L2ARC partition. All three are layered over GELI, with the two mirrors being set to 4096 byte sectors in GELI to match the AF 4K SATA discs. One half of the mirror is on Intel AHCI controller, the other half is on 3Ware controller.
I had a kernel panic caused by (I think, since I repeated it) a buggy userspace FUSE module, and when the system came back up, I had one file with permanent errors. What makes it mysterious is that this file had been created earlier that day, and had last been accessed an hour earlier as well and was about 33GB. No read, write, or checksum errors were shown on the pool. The file itself seemed to be completely fine, with no errors on a full read. Was this file metadata damaged by the kernel panic? If so, how? Was this file damaged earlier in the day? If so, how? Where and how did ZFS determine this file was damaged?
I haven't destroyed the pool(and probably won't, although it is backed up), but tried removing the file, rollback to earlier snapshot on the dataset, and then destroyed the dataset, yet zpool status -v pool still shows the remnant hexadecimal of this permanently damaged file.
My concern is that there is an underlying cause that I need to find. The only explanation I can invent is that somehow prior to the kernel panic, ZFS data structures in RAM were damaged, and made it to disk, and then the kernel panic'd, but this really is not very convincing as there was really no write load at the time, and especially not to that file or even the dataset it was in.
We've seen others who report pool metadata corruptions that prevent import, which usually go down to lack of ECC, or other such things. Could there be something else happening? A kernel panic, by itself, shouldn't be able to cause pool corruption.
I had a kernel panic caused by (I think, since I repeated it) a buggy userspace FUSE module, and when the system came back up, I had one file with permanent errors. What makes it mysterious is that this file had been created earlier that day, and had last been accessed an hour earlier as well and was about 33GB. No read, write, or checksum errors were shown on the pool. The file itself seemed to be completely fine, with no errors on a full read. Was this file metadata damaged by the kernel panic? If so, how? Was this file damaged earlier in the day? If so, how? Where and how did ZFS determine this file was damaged?
I haven't destroyed the pool(and probably won't, although it is backed up), but tried removing the file, rollback to earlier snapshot on the dataset, and then destroyed the dataset, yet zpool status -v pool still shows the remnant hexadecimal of this permanently damaged file.
My concern is that there is an underlying cause that I need to find. The only explanation I can invent is that somehow prior to the kernel panic, ZFS data structures in RAM were damaged, and made it to disk, and then the kernel panic'd, but this really is not very convincing as there was really no write load at the time, and especially not to that file or even the dataset it was in.
We've seen others who report pool metadata corruptions that prevent import, which usually go down to lack of ECC, or other such things. Could there be something else happening? A kernel panic, by itself, shouldn't be able to cause pool corruption.