Page 1 of 1
ZFS send / receive - system reboots
Posted: 31 May 2013 15:05
by mjackson
So i have two systems, each with two 6G LSI IT mode controllers, and each controller it attached to a 45 bay jbod (one channel to 21 drives in the back, the other channel to 24 drives in the front. I have 2 SLOG ssd, 2 cache ssds, and the rest of the bays have hitachi 3TB sas drives attached. ZFS config is 4 drive raidz2 vdevs.
The problem is that when I start to push datasets to these machines with zfs send/receive (data is coming from opensolaris machines), they reboot. It could be hours, it could be days, but they'll eventually reboot. I've run the zfs tune and set it to 32GB of memory (the boxes each have 48gb of memory, thought i might have exhausted it, so i turned zfs tune back to 32 from 48).
Additionally, after they reboot, the console is stuck at 'Load NOP GEOM class' for a while. By a while, i mean hours. It'll finally get past that, and come right up.
Any ideas on how to capture the crash? I'm running embedded off the USB stick... if I can't figure it out, i'm going to have to do a full load of FreeBSD 9.1 on it to get any further.
It sounds like this issue:
viewtopic.php?f=15&t=3981
Which sounds like this issue:
viewtopic.php?f=15&t=2861
Re: ZFS send / receive - system reboots
Posted: 02 Jun 2013 06:33
by Lee Sharp
So, you have 45 drives, right? How big? Most of the errors I am seeing with thsi do not have 1 gig of ram per TB of disk space. If you do, that is a big clue!
Re: ZFS send / receive - system reboots
Posted: 02 Jun 2013 09:17
by raulfg3
mjackson wrote:The problem is that when I start to push datasets to these machines with zfs send/receive (data is coming from opensolaris machines), they reboot. It could be hours, it could be days, but they'll eventually reboot.
please post ZFS version on Opensolaris machines, ZFS on N4F is Ver 28:
https://blogs.oracle.com/stw/entry/zfs_ ... ile_system
Re: ZFS send / receive - system reboots
Posted: 03 Jun 2013 18:08
by mjackson
ZFS version on the source machines is version 28.
Re: ZFS send / receive - system reboots
Posted: 04 Jun 2013 14:14
by mjackson
Hmmm, i just had a third system reboot, but this time is was a nas4free source system doing a zfs send / recv to a freebsd 9.1-P3 target system. The N4F box rebooted, but was back up in about 3-4 minutes.
I just added a 16gb swap zvol to the source n4f system, we'll see if that helps. It was 250GB into a 255GB zfs send/recv when it faulted.
Re: ZFS send / receive - system reboots
Posted: 04 Jun 2013 14:47
by raulfg3
ZFS need some tunes to work stable.
Do you install Zfskerntune in your system?
viewtopic.php?f=71&t=1278&p=19924
Re: ZFS send / receive - system reboots
Posted: 04 Jun 2013 16:53
by mjackson
Yes, i've run the not-included-with-nas4free zfstuner, and set it for both the real memory of the machine (48gb)(faulted), and one step below the real memory of the machine (32gb)(faulted again).
I actually have 90 drives attached to each of the two original system that were faulting; 4 SSDs, and 86 spindles, but I've loaded freebsd 9.1-P3 on both of those systems, and they seem to be stable now. My smaller 24 drive n4f system is the one that rebooted on me late last night. I added 16gb of zvol swap this morning and restarted the zfs send/recv, and it's moved 533G so far without a fault. Maybe it just needed some swap to work with. I hope so.
Re: ZFS send / receive - system reboots
Posted: 05 Jun 2013 14:26
by mjackson
No reboots so far, and it's been replicating since my post yesterday (5.04T so far)
Re: ZFS send / receive - system reboots
Posted: 05 Jun 2013 19:58
by raulfg3
Please post a resume to help others in same situation.
Its not clear to me , what You do to have a stable system
Re: ZFS send / receive - system reboots
Posted: 06 Jun 2013 03:22
by mjackson
I added a swap file to the system. There's plenty of references to how to do that. I don't know if n4f + zfs requires it, but it seems to if you're replicating with zfs send/recv, it might be necessary. I had 3 systems all rebooting. Two of them i switches from n4f to a straight freebsd install with geom mirrored boot, root, and swap, which stabilized them, and the other one that's still running n4f, I added a swap file. I've got about 106TB usable on each of the larger systems (4 drive raidz2 vdevs), and 20TB usable (2 drive zfs mirrors) on the smaller system.
Re: ZFS send / receive - system reboots
Posted: 06 Jun 2013 21:35
by mjackson
No luck, the smaller system running n4f embedded faulted twice this morning, even with the swap file. It definitely seems to be related to the zfs send/recv workload.
Re: ZFS send / receive - system reboots
Posted: 09 Jul 2013 16:16
by mjackson
I may have found a workaround to the problem. I believe that the ARC and the vnode cache are fighting for the same memory, and the systems makes a large request for memory faster than the ARC will give it up. I've set primarycache and secondarycache to metadata, and it seems to have helped. I'm going to try and restart replication on my servers that were having the reboot issue and see how it performs.