Page 1 of 2

Server freezing, stuck process when using zfs send

Posted: 03 Nov 2015 15:38
by Bmillett
Good day,

I have been using NAS4Free on many systems for many years. Up until recently, I have had zero issues and have found N4F to be very resilient and dependable. I have been upgrading older hardware for my clients this past 3 months. The following is the configuration:
Dell R710 Server, Dual XEON, 32-48GB RAM
Dell PERC 6/ir or IBM BR10i (both flashed to IT mode with LSI firmware)
6 SATA disks (some Western Digital RED, some Western Digital SE)
ZFS is setup as 3 mirrored pairs
3 Samsung SSD's on a card with a Marvel chipset (used for cache and log-mirror)
Booting on 64GB Flash

I use 2 storage servers at each clients location. These servers are running ZFS storage, hosting several ESX 5.1 servers. In the evenings (and some during the day) I use zxfer to copy or mirror the data between the 2 servers. On some, this is a fair amount of data. Presently, I have 4 clients with a pair of servers running 10.2.0.2 (1962). One site has 9.3.0.2 (1771). I discovered yesterday that I had not upgraded this site to v10.

I have had performance issues with the systems running 10.2.0.2. These systems will "hang" and have a stuck zfs send process that requires a reboot to fix. The running virtual machines (ESX) will sometimes continue to run, but often will become unresponsive. This event usually happens when I kick off a zxfer or zfs send between the 2 storage servers. Maybe due to a heavy load?? A few times, the normal operation of the serves during the day will have the same issue and will need a restart, usually after a few days of normal operation. Presently, I have the automated "sync" between servers turned off until this issue can be resolved.

I have checked BIOS, memory configurations, RAID controller firmware, etc.
NAS4free was installed from scratch on most of these servers (10.2.0.2).
I have HP Procurve switches on a separate SAN network adapter for the virtual machines. Jumbo frames are enabled.
***** I discovered yesterday, that the pair of servers NOT having these issues is the pair running 9.3.0.2.
After struggling with changing hardware, memory configurations, etc, I am led to believe that 10.2.0.2 is the culprit.

Questions:
1- Can I downgrade (fresh install) from 10.2.0.2 to 9.3.0.2?
Will my ZFS volume groups import on the v9 system ok?
I don't need to import the settings as I am very fast at setting up a N4F box at this point.
I just need my ZFS data to import, if this is the solution!!!!.
2- Is there something I am missing with 10.2.0.2 to cause this?
I have used the ZFSKernTune and played with several memory configurations.
I just upgraded the RAID controller from the PERC to the IBM on one pair of servers in hopes that would fix it. No joy.
I have nothing left to change and 10.2.0.2 is the only remaining difference between the systems with the issue and those without.

I have not had to seek assistance previously and have been in the IT systems business for 30+ years. I've been working with ZFS and ESX for over 5 years.
I am the type of person to always figure it out. I am frustrated that I cannot in this case.
I am hoping I can find a way to make 10.2.0.2 work as it is the newest, but if not, I hope I can downgrade to 9.3.0.2.

Your expert advice/opinions/guidance will be very much appreciated!!

Re: Server freezing, stuck process when using zfs send

Posted: 03 Nov 2015 18:41
by daoyama
Try this parameters only:

vfs.zfs.arc_max 16G
vm.kmem_size 24G

Add it from System|Advanced|loader.conf.
If you already use ZFS kernltune, remove all ZFS related values in loader.conf before adding.

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 03:40
by Bmillett
Well, my hopes were very high as the system did its backup last night and was successful. I left the systems alone today and started the nightly backup a bit ago. Drat!
The zxfer script (zfs send/receive) process stuck during a transfer. When the backup script was aborted, the zfs send process on the source server stayed in the process list and could not be killed. The only was is to restart the server to free the process. This means stopping several virtual machines.

If this version 10 can't be made to work dependably, then I would want to downgrade to version 9, providing my ZFS pool would move across without issue. I can't imaging my setup is so different from a generic setup that someone would not be having these problems somewhere.

I appreciate your advice and ideas.

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 08:42
by Parkcomm
There are potentially a couple of features that might get in the way of a roll-back - if you did the zpool upgrade command.

http://open-zfs.org/wiki/Feature_Flags
https://www.freebsd.org/cgi/man.cgi?que ... ormat=html

Embedded_data - "the contents of highly-compressible blocks are stored in the block "pointer" itself
This is an auto feature so it might prevent a rollback

Large_block - " The large_block feature allows the record size on a dataset to be set larger than 128KB.
This will only be a problem if you set large block sizes

btw - I'm pretty sure that the open-zfs table is missing data, I believe the following features were also new in N4F V10
  • spacemap_histogram
  • enabled_txg
  • hole_birth
  • extensible_dataset
  • embedded_data
  • bookmarks
  • filesystem_limits
Zpool get all will tell you which features you have active

Zpool import will generate an error if you have features on the pool that are not supported, so I'd just do that and see what happens

"For each unsupported feature enabled on an imported pool a pool property named unsupported@feature_guid will indicate why the import was allowed despite the unsupported feature. "

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 14:00
by Bmillett
I will try this on a test system today.

Is there any information on why version 10 has the issues with my seemingly generic equipment?
Of course, it would be great if v10 would just work.
I have 14 servers that I would have to downgrade.

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 14:50
by Parkcomm
Honestly I don't think its the core OS - and probably it'll be one of those simple things you kick yourself for. But if I had 14 servers I downgrade a couple first and see if that solves the problem. (and someone has to be first if its a new bug)

If its not the OS itself there should be clue in here http://sourceforge.net/projects/nas4fre ... t/download or https://www.freebsd.org/releases/10.2R/relnotes.html (or associated dos)

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 14:55
by Bmillett
These are all fresh installs. No upgrades.
The strange thing is that 2 of the 14 are running version 9 on the exact same hardware without any issues.
This points to the v10 as the culprit, either the OS itself or something defaulted that is different that v9 and causing the problem.
It's extremely frustrating as some of these servers are in different states (locations in the country).

Thanks for the advice.

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 15:32
by erico.bettoni
I'm also experiencing crashes after upgrade to 10.2.
I usually get out of memory erros. The system has 32GB of ECC RAM. I've set arc to 12G, kmem to 16G and after a file copy the proccess accesing the file just dies.

Initially I tought it was specific to samba, but I've been getting errors on SCP and FTP as well.

SCP error:

Code: Select all

packet_write_poll: Connection to XXX.XXX.XXX.XXX: Cannot allocate memory
Fatal error: Lost connection with the server
At the moment of the crach

Code: Select all

CPU: 12.3% user,  0.0% nice,  1.4% system,  0.0% interrupt, 86.3% idle
Mem: 150M Active, 1058M Inact, 23G Wired, 5488K Cache, 108M Buf, 6973M Free
ARC: 12G Total, 6172M MFU, 4948M MRU, 272K Anon, 910M Header, 366M Other
Swap: 1024M Total, 1024M Free
The samba proccess just crashes.
FTP proccess crash with a out of memory error too. I will try to get the exact message.

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 15:37
by erico.bettoni
One additional information. So far the crashes only happen when reading the files. I've yet to see a crash writing large files.

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 17:15
by erico.bettoni
The error I get from the FTP is this:

Client:

Code: Select all

451 Transfer aborted. Insufficient memory or file locked
Server:

Code: Select all

notice: user XXXXXXXX: aborting transfer: Insufficient memory or file locked

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 18:28
by erico.bettoni
sysctl -a | grep mem

Code: Select all

kern.ipc.maxmbufmem: 7516192768
device  mem
vm.lowmem_period: 10
vm.kmem_map_free: 2879500288
vm.kmem_map_size: 12152885248
vm.kmem_size_scale: 1
vm.kmem_size_max: 1319413950874
vm.kmem_size_min: 0
vm.kmem_zmax: 65536
vm.kmem_size: 15032385536
vfs.ufs.dirhash_lowmemcount: 0
vfs.ufs.dirhash_mem: 36723
vfs.ufs.dirhash_maxmem: 27111424
vfs.tmpfs.memory_reserved: 4194304
net.link.bridge.pfil_member: 1
hw.physmem: 34273939456
hw.usermem: 10216255488
hw.realmem: 34359738368
hw.pci.host_mem_start: 2147483648
hw.cbb.start_memory: 2281701376
p1003_1b.memlock: 0
p1003_1b.memlock_range: 0
p1003_1b.memory_protection: 0
p1003_1b.shared_memory_objects: 200112
kstat.zfs.misc.arcstats.memory_throttle_count: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 80
dev.xen.balloon.high_mem: 0
dev.xen.balloon.low_mem: 0
compat.ia32.maxvmem: 0
sysctl -a | grep vfs.zfs.arc

Code: Select all

vfs.zfs.arc_meta_limit: 3221225472
vfs.zfs.arc_free_target: 56540
vfs.zfs.arc_shrink_shift: 5
vfs.zfs.arc_average_blocksize: 8192
vfs.zfs.arc_min: 12884901888
vfs.zfs.arc_max: 12884901888

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 18:54
by daoyama
erico.bettoni wrote:I'm also experiencing crashes after upgrade to 10.2.
I usually get out of memory erros. The system has 32GB of ECC RAM. I've set arc to 12G, kmem to 16G and after a file copy the proccess accesing the file just dies.
Probably, your manual tuning is BAD. First check without any modification. I feel kmem 16G is too small.
If you want keep memory for other use, you should use vm.kmem_size_scale=2 without other values.

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 19:51
by erico.bettoni
daoyama wrote:
erico.bettoni wrote:I'm also experiencing crashes after upgrade to 10.2.
I usually get out of memory erros. The system has 32GB of ECC RAM. I've set arc to 12G, kmem to 16G and after a file copy the proccess accesing the file just dies.
Probably, your manual tuning is BAD. First check without any modification. I feel kmem 16G is too small.
If you want keep memory for other use, you should use vm.kmem_size_scale=2 without other values.
I have tried other values. What you consider good values for 32GB?
I used zfskerntune and tried with 8, 12, 16, 24 and 32, all leading to the same issue.

I also run virtualbox on the same machine, but the error happens even without any VM running...

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 20:20
by erico.bettoni
Other related thread:
viewtopic.php?t=9633

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 20:37
by erico.bettoni
Also, how do I remove every tunned value zfskerntune has added?

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 21:16
by daoyama
erico.bettoni wrote:Also, how do I remove every tunned value zfskerntune has added?
If you use embedded version, you need only to upgrade same version from System|Firmware.

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 22:16
by Bmillett
I will try the upgrade and clearing out of all the variables.
I am planning to reinstall all my machines with 9.3.
It will be very difficult as I have to move 3-4TB on each pair of machines.

I hope this thread helps those that may be considering v10 right now.

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 22:22
by Parkcomm
You should have a look at this https://forums.freebsd.org/threads/memo ... d-10.41880


Sent from my foam - stupid auto correct.

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 22:58
by Bmillett
That article doesn't give (at least from my reading) any clear fix. Even the last post shows 10.2 is having the same issues for others.
Maybe FreeBSD is not the way to go since this issue can't be identified and fixed.
We were previously using Nexentastor and it NEVER went down. 3+ years and almost no reboots.
I like NAS4Free, but if I can't maintain something stable in a process as simple and this (NFS server), then it is not for production.
I am going to try a 9.3 reinstall on my largest pair of servers this weekend.
Nexenta can't be installed on a flash drive. And its not technically free.
Fingers crossed......

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 23:19
by erico.bettoni
Please keep us posted.
I will try to reset the values with firmware upgrade with the same version.

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 23:34
by Parkcomm
Bmillet - Thats why I raised it - I said earlier I didn't know of any stability issues.

Thought you should know, even though the reported bug was resolved in 10.2. Although one user reports seeing it - I'm not clear he's seeing the same error and there is not an open bug in 10.2 for this issue.

Do you have a lab setup you can work with or only the live systems? If you have a lab setup I'd:
  • Set it up as per the field installs
  • Thrash it with iozone (or similar) test tool
  • Remove tunables - thrash it again
  • Setup simple vfs.zfs.arc_max tunable thrash it again
  • Remove L2Arc - thrash it again
  • install 9.3 - thrash it again
And see what causes the failure modes

Re: Server freezing, stuck process when using zfs send

Posted: 05 Nov 2015 23:49
by Bmillett
I do have a single machine that is near my office that is not production yet.
I am going to try 9.3 on it first. Then i'll go back to 10 and try the rest.

Will report back.

Thanks!!!

Re: Server freezing, stuck process when using zfs send

Posted: 06 Nov 2015 01:10
by Parkcomm
can you provide the output of

Code: Select all

top -Sosize
sysctl -a | grep kmem
sysctl -a | grep zfs.arc
As well. If you look at Erico's posts you can see the ARC is maxed out and the free memory at the time time he gets memory error. What could be happening is that other processes in conjunction with ZFS are exhausting the kernal heap.

So Dayomas advice to Eric makes sense.

Re: Server freezing, stuck process when using zfs send

Posted: 06 Nov 2015 15:07
by erico.bettoni
Well, crap!

Just reseted everything and get the same errors.

Then, tried to limit ARC to half my RAM, and get the same error.

This is the memory status at the exact moment of the error:

Code: Select all

Mem: 484M Active, 434M Inact, 27G Wired, 98M Buf, 3573M Free
ARC: 16G Total, 2143M MFU, 13G MRU, 33M Anon, 144M Header, 294M Other
Swap: 1024M Total, 1024M Free
And the sysctl's

Code: Select all

vm.kmem_map_free: 21762248704
vm.kmem_map_size: 11616325632
vm.kmem_size_scale: 1
vm.kmem_size_max: 1319413950874
vm.kmem_size_min: 0
vm.kmem_zmax: 65536
vm.kmem_size: 33378574336
vfs.zfs.arc_meta_limit: 4294967296
vfs.zfs.arc_free_target: 56540
vfs.zfs.arc_shrink_shift: 5
vfs.zfs.arc_average_blocksize: 8192
vfs.zfs.arc_min: 8589934592
vfs.zfs.arc_max: 17179869184
The shitty part is that I cannot go back to 9.3 because my pool is upgraded.

Re: Server freezing, stuck process when using zfs send

Posted: 06 Nov 2015 15:23
by erico.bettoni
daoyama-san, any insight?

Those values don't add up right?
I should not be getting memory errors right?

Re: Server freezing, stuck process when using zfs send

Posted: 06 Nov 2015 23:47
by Bmillett
I have 2 servers at each of my clients locations. They mirror each other each night (and some at noon) using zfs send/receive.
It is this copy process that causes the machines to "break" quickly.
Because I have 2 machines, I will move everything to one, format the 2nd. Move everything to the 2nd. Format the 1st and then get the syncing back up again with both on 9.3.
Very disappointing that this is happening. It's conforting to see that I'm not the only one. My hardware setup is as simple as possible so it "runs forever".
I hope 9.3 will just run and run. Like a good unix should!!

Re: Server freezing, stuck process when using zfs send

Posted: 07 Nov 2015 00:17
by daoyama
probably I cannot help. I'm using 16GB only and vfs.zfs.prefetch_disable=1 with mirrored ZIL and L2ARC on SSD.

Re: Server freezing, stuck process when using zfs send

Posted: 07 Nov 2015 03:20
by erico.bettoni
I have two ssd also, one for l2arc and one for Zil. I've tried removing both from the pool. Same error. :/

Re: Server freezing, stuck process when using zfs send

Posted: 08 Nov 2015 14:39
by daoyama
erico.bettoni wrote:I have two ssd also, one for l2arc and one for Zil. I've tried removing both from the pool. Same error. :/
Did you check with vfs.zfs.prefetch_disable=1?

Re: Server freezing, stuck process when using zfs send

Posted: 08 Nov 2015 22:48
by erico.bettoni
daoyama wrote:
erico.bettoni wrote:I have two ssd also, one for l2arc and one for Zil. I've tried removing both from the pool. Same error. :/
Did you check with vfs.zfs.prefetch_disable=1?
Yes, just did that. No change, same error. This is really weird.