Page 1 of 1

ZFS pool instability

Posted: 29 Sep 2013 15:31
by nicks88
Hi all,

Initially when I opted to use ZFS on my NAS (was running Freenas, now N4F) around 3 years ago, there was not much information readily available on what processor/RAM specifications were required. Since then (having read many articles) I have always accepted the poor performance (~25MB/s read/write) and also instability (I have drives dropping out the pool occasionally). Now, I'm getting increasingly frustrated with the instability, so wanted to get the forum's opinion on the following:

1) Is the instability definitely due to my weak setup? If so, what further steps can I do to tune.
2) Should I consider moving to a software RAID instead? My data is not vastly important (music and ripped Blu-rays) so silent data corruption (bit rot) of a few files won't be the end of the world.

For 1) my setup is as follows:

N4F Version: 9.1.0.1 - Sandstorm (revision 636)
Platform: x64-embedded on AMD Athlon(tm) 64 X2 Dual Core Processor 5000+
RAM: 2GB DDR 800 (N4F reports 70-85% of 1674 MiB constantly in use when operating)
Zpool: 4 x 2TB Samsung HD203WI in RAIDZ1-0
ZFS Kernal tune settings:
2GB available memory
Prefetch disabled
txg.timeout 5
vdev.max_pending 10
vdev.min_pending 4
write_limit_override 0
no_write_throttle 0

When I speak of instability, when a drive (not a specific one each time) is reported to have dropped out of a pool, I have to remove the drive - install into another system, format then reinsert for resilvering to take place (bringing it back to a 'healthy' state). Here's what it looks like degraded (incidentally I have replaced all SATA cables):

Code: Select all

  pool: media
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 211M in 0h13m with 0 errors on Fri Sep 27 12:21:25 2013
config:

        NAME                     STATE     READ WRITE CKSUM
        media                    DEGRADED     0     0     0
          raidz1-0               DEGRADED     0     0     0
            2840289362076943308  REMOVED      0     0     0  was /dev/ada1
            ada2                 ONLINE       0     0     0
            ada0                 ONLINE       0     0     0
            ada3                 ONLINE       0     0     0
It's sporadic when this happens, sometimes every few weeks, sometimes every few days. Clearly not right.

For 2) Would a software RAID5 solution perform better and more reliably on my existing setup? As I say, I'm not too fussed about bit rot of data (I have backup on 2x3TB disks). I don't particularly want to spend additional money on a new server (current mainboard only supports 2GB).


Sorry for the long post - any advice or questions appreciated.

Thanks,

Re: ZFS pool instability

Posted: 29 Sep 2013 21:19
by Buhu
That's not normal! What's your complete hardware list ? I believe in mad hardware.........

Re: ZFS pool instability

Posted: 30 Sep 2013 07:48
by b0ssman
if prefetch is disabled and you have 80% memory use there is something wrong.
what other processes are you running?

als look into the dmesg output when a drive disconnects, as it might be the amd sata controllers that are giving you the problem.

Re: ZFS pool instability

Posted: 30 Sep 2013 19:24
by alexplatform
Why are you concluding that this is a ZFS related issue? sounds to me that this is a hardware problem; perhaps a failing drive, bad cabling, etc? As suggested in a previous post, the answer is very likely in your logs.

Re: ZFS pool instability

Posted: 08 Oct 2013 15:35
by nicks88
Hi all,

Thanks for your replies/confirmation that my setup was not healthy given the specification.

I have now replaced the motherboard and RAM with the exact same model (I had an identical machine used for other purposes). The system appears to be more healthy - no drives have dropped out yet after a couple of scrubs. Performance has mildly improved as well. RAM utilisation remains the same (I am also running SAB, CP, HP and SB).

Hence, I can only imagine the SATA controller on the motherboard or RAM was the issue.

Thanks again.