Unstable system crashes performing zfs send/receive
Posted: 27 Jul 2014 07:01
I've got a NAS4Free system that is running really well EXCEPT that it crashes. I have swapped out hardware and tried to narrow down the issue without much (any) success.
I have found that I can reliabily crash Nas4Free after ~15minutes by making a copy of one pool to another:
I had top running through ssh and the last update prioir to the crash was:
The hardware is:
- e5-2609
- 32G ECC RAM (currently trying another 32G of non-ECC RAM in a single socket desk motherboard - didn't help)
- LSI 2008 controllers (P19 IT firmware)
- x540 nics (currently trying i350's - didn't help)
- two disks in a plain mirror for swap (doesn't seem to help)
I am using:
- ipv4 and ipv6
- 9k MTU
- lagg
- vlan's
- iSCSI and NFS (no CIFS)
As an experiment I tried the same zfs send/receive using FreeNAS. It looked to be going really well (it lasted well past the 15 minutes) but crashed with a kernel problem when I added a VLAN interface (I thought that it was going so well I might add some networking - bad idea).
I found this specific way to crash it because the pool has a vdev with 8 disks (bad idea) and I want to migrate it onto a new pool with a raidz2 vdev of 6 devices. Is that somehow causing an issue?
How can I capture the cause of the crash please? Without understanding what is crashing I am not making any progress on this. When it crashes I don't see anything.
I have found that I can reliabily crash Nas4Free after ~15minutes by making a copy of one pool to another:
Code: Select all
# zfs snapshot -r tank@01
# zfs send -R tank@01 | zfs receive -Fdvu
Code: Select all
last pid: 5289; load averages: 2.06, 1.60, 1.93 up 0+00:34:03 16:15:48
33 processes: 1 running, 32 sleeping
CPU: 0.0% user, 0.0% nice, 25.1% system, 0.0% interrupt, 74.9% idle
Mem: 442M Active, 32M Inact, 22G Wired, 12M Buf, 8151M Free
ARC: 20G Total, 586M MFU, 18G MRU, 830M Anon, 529M Header, 8631K Other
Swap: 64G Total, 64G Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
5105 root 1 52 0 37624K 3280K nokva 3 1:14 21.88% zfs
5104 root 1 29 0 37624K 3332K pipewr 1 0:29 4.59% zfs
4894 root 1 20 0 59672K 15828K select 1 0:07 0.29% bsnmpd
4287 root 309 20 0 1438M 423M uwait 3 0:10 0.00% istgt
5112 root 1 20 0 16596K 2488K CPU2 2 0:01 0.00% top
4354 root 1 20 0 12084K 1632K select 2 0:01 0.00% powerd
4317 root 1 20 0 54868K 6240K select 3 0:00 0.00% snmp-ups
4950 greg 1 20 0 56272K 5096K select 3 0:00 0.00% sshd
3877 root 1 20 0 12112K 1840K select 3 0:00 0.00% syslogd
4319 root 1 20 0 20392K 3692K select 2 0:00 0.00% upsd
3634 root 1 20 0 6280K 740K select 0 0:00 0.00% devd
4352 root 1 20 0 20404K 3828K nanslp 0 0:00 0.00% upsmon
4953 root 1 20 0 14508K 3564K pause 2 0:00 0.00% csh
- e5-2609
- 32G ECC RAM (currently trying another 32G of non-ECC RAM in a single socket desk motherboard - didn't help)
- LSI 2008 controllers (P19 IT firmware)
- x540 nics (currently trying i350's - didn't help)
- two disks in a plain mirror for swap (doesn't seem to help)
I am using:
- ipv4 and ipv6
- 9k MTU
- lagg
- vlan's
- iSCSI and NFS (no CIFS)
As an experiment I tried the same zfs send/receive using FreeNAS. It looked to be going really well (it lasted well past the 15 minutes) but crashed with a kernel problem when I added a VLAN interface (I thought that it was going so well I might add some networking - bad idea).
I found this specific way to crash it because the pool has a vdev with 8 disks (bad idea) and I want to migrate it onto a new pool with a raidz2 vdev of 6 devices. Is that somehow causing an issue?
How can I capture the cause of the crash please? Without understanding what is crashing I am not making any progress on this. When it crashes I don't see anything.