*New 11.4 series Release:
2020-07-03: XigmaNAS 11.4.0.4.7633 - released!

*New 12.1 series Release:
2020-04-17: XigmaNAS 12.1.0.4.7542 - released


We really need "Your" help on XigmaNAS https://translations.launchpad.net/xigmanas translations. Please help today!

Producing and hosting XigmaNAS costs money. Please consider donating for our project so that we can continue to offer you the best.
We need your support! eg: PAYPAL

send heavy tcp stream temporarily freezes all networking

NIC, network controllers, compatibility questions, WOL, wake on lan
Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
harryc
Starter
Starter
Posts: 25
Joined: 08 Nov 2012 22:12
Status: Offline

send heavy tcp stream temporarily freezes all networking

#1

Post by harryc »

sys1: ~# zfs send verybig@snapshot | ssh nas4freeclone "cat > /dev/null"

will send about 60mb/s over a 1gb link between two identical nas4free systems (S5000PSL 20GB 6 drive SATA raidz2)... until at some random time the sending system will just stop sending. It will resume sending right where it left off if a third computer browses to the nas4free web interface, for a while. It will also resume sending without a hitch if an "ifconfig lo0" command is given. (yes lo0) -- but only via an ssh session, the console doesn't echo characters (until sending is otherwise resumed). The problem happens using the em driver. But in frustration I installed an re nic board and exactly the same thing happened.

Of interest, on sys1 in an ssh session from a third system. 'echo hi' does not restart the sedning (even though it does generate tcp traffic). Also "ping 127.0.0.1" will also ... just stop after maybe seq=34 to perhaps 90. It just stops. ^C works to break out to a prompt (but traffic doesn't resume). giving the ping 127.0.0.1 command again.... works and also restarts the apparently suspended zfs snapshot send.

Go figure! I almost need to put in a never ending batch job that just "ifconfig lo0; sleep 5"-- that resumes the frozen the re0 or em0 traffic. (Which didn't work, the sleep never returned. but changing ifconfig lo0 > /dev/null and putting that into an infinite loop did automatically resume locked sessions-- at the price of 100% cpu usage on one core...)

This happens whether or not the nas4free "Enable tuning of some kernel variables" is set or not, whether powerD is set or not, with I/OAT is enabled or not in the BIOS, whether C1 sleep is enabled, or any other possible BIOS settings change or not. The console terminal doesn't echo characters during the lockups, but the numlock / shift lock keys work. The UPS via serial port complains of communications being lost.

But give the command 'ifconfig lo0' (or any ifconfig xx' command) and it all just picks up right where it left off as if nothing happened. No errors in the logs (other than the note of the interrupted UPS communications). Don't know if it matters but there is no swap on the system, it boots from the usb, the embedded version.

Really, Really frustrating. Extensive net searching turns up variations of this bug in freebsd 9.2, folks go back to v8 if their ZFS format hasn't been upgraded and the problem goes away. There appear to be no other suggestions or workarounds that I've been able to find. Of course this bug completely kills the ability of nas4free to be an iscsi target.

Any ideas?
Last edited by harryc on 13 Jun 2014 07:25, edited 1 time in total.

harryc
Starter
Starter
Posts: 25
Joined: 08 Nov 2012 22:12
Status: Offline

Re: send heavy tcp stream temporarily freezes all networking

#2

Post by harryc »

PS: the 'sleep 10' command given in ssh sleeps forever, or until ^C happens when the sending stops. ^C brings a prompt back in an ssh terminal, and the ifconfig lo0 command wakes up the frozen sending process. So whatever it is sleep depends upon stops ticking as well. This is in agreement with the many posts that speak of nas4free and freenas 'losing time'. Being minutes or hours off for unexplained reasons until the next ntp update.

harryc
Starter
Starter
Posts: 25
Joined: 08 Nov 2012 22:12
Status: Offline

Re: send heavy tcp stream temporarily freezes all networking

#3

Post by harryc »

Update: A sysctl command does the same thing... wakes up the hung thread. I'm starting to think this is an issue about timers.

harryc
Starter
Starter
Posts: 25
Joined: 08 Nov 2012 22:12
Status: Offline

SOLVED: send heavy tcp stream temporarily freezes all networ

#4

Post by harryc »

Well it looks like the problme is freebsd choosing the wrong timer. Not about tcp stack locks and semaphores, not about Intel PRO/1000 ethernet em drivers or realtek re drivers, not about zfs deadlocks, or msi interrupts, or i/oat or processor sleep state settings. I've changed one sysctl:

sysctl kern.eventtimer.timer=LAPIC

From the previous HPET and... hasn't recurred in a half an hour, the incoming traffic graph on the receiving ssh system hasn't taken a dip, hanging in at a rock steady 60mb/s. Replacing "zfs send ... | ssh othersys "cat > /dev/null" with "cat /dev/random | ssh ..." gave the same steady throughput. If the problem recurs I'll update this post, but I've never before seen such a steady traffic graph. Previously the zfs sending system wouldn't last 5 minutes. Here's the bootup timer default, which led to the strange 'hangs'. Notice the odd 0 frequency for LAPIC, and the change later. For the search engines: FreeBSD 9.2-RELEASE-p4 Systems: 2 S5000PSL. 6 SATA disks, zfs raidz2, booting from 'embedded' usb nas4free v972. Workaround to lost time, missing time, loses time, temporary hang, temporary freeze, temporary lock, temporary lockup.

fs1: ~ # sysctl -a | grep timer
kern.eventtimer.choice: HPET(450) HPET1(440) HPET2(440) LAPIC(400) i8254(100) RTC(0)
kern.eventtimer.et.LAPIC.flags: 15
kern.eventtimer.et.LAPIC.frequency: 0
kern.eventtimer.et.LAPIC.quality: 400
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.et.HPET.flags: 3
kern.eventtimer.et.HPET.frequency: 14318180
kern.eventtimer.et.HPET.quality: 450
kern.eventtimer.et.HPET1.flags: 3
kern.eventtimer.et.HPET1.frequency: 14318180
kern.eventtimer.et.HPET1.quality: 440
kern.eventtimer.et.HPET2.flags: 3
kern.eventtimer.et.HPET2.frequency: 14318180
kern.eventtimer.et.HPET2.quality: 440
kern.eventtimer.periodic: 0
kern.eventtimer.timer: HPET
kern.eventtimer.activetick: 1
kern.eventtimer.idletick: 0
kern.eventtimer.singlemul: 2
net.inet.tcp.timer_race: 0
net.inet.tcp.per_cpu_timers: 0
machdep.acpi_timer_freq: 3579545
p1003_1b.timers: 200112
p1003_1b.delaytimer_max: 2147483647
p1003_1b.timer_max: 32
dev.attimer.0.%desc: AT timer
dev.attimer.0.%driver: attimer
dev.attimer.0.%location: handle=\_SB_.PCI0.LPC_.TMR_
dev.attimer.0.%pnpinfo: _HID=PNP0100 _UID=0
dev.attimer.0.%parent: acpi0
dev.acpi_timer.0.%desc: 24-bit timer at 3.579545MHz
dev.acpi_timer.0.%driver: acpi_timer
dev.acpi_timer.0.%location: unknown
dev.acpi_timer.0.%pnpinfo: unknown
dev.acpi_timer.0.%parent: acpi0
dev.em.0.interrupts.rx_pkt_timer: 0
dev.em.0.interrupts.rx_abs_timer: 0
dev.em.0.interrupts.tx_pkt_timer: 0
dev.em.0.interrupts.tx_abs_timer: 0
dev.em.1.interrupts.rx_pkt_timer: 0
dev.em.1.interrupts.rx_abs_timer: 0
dev.em.1.interrupts.tx_pkt_timer: 0
dev.em.1.interrupts.tx_abs_timer: 0

The command:

sysctl kern.eventtimer.timer=LAPIC

led to:

sysctl -a | grep timer
kern.eventtimer.choice: HPET(450) HPET1(440) HPET2(440) LAPIC(400) i8254(100) RTC(0)
kern.eventtimer.et.LAPIC.flags: 15
kern.eventtimer.et.LAPIC.frequency: 166253845
kern.eventtimer.et.LAPIC.quality: 400
kern.eventtimer.et.RTC.flags: 17
kern.eventtimer.et.RTC.frequency: 32768
kern.eventtimer.et.RTC.quality: 0
kern.eventtimer.et.i8254.flags: 1
kern.eventtimer.et.i8254.frequency: 1193182
kern.eventtimer.et.i8254.quality: 100
kern.eventtimer.et.HPET.flags: 3
kern.eventtimer.et.HPET.frequency: 14318180
kern.eventtimer.et.HPET.quality: 450
kern.eventtimer.et.HPET1.flags: 3
kern.eventtimer.et.HPET1.frequency: 14318180
kern.eventtimer.et.HPET1.quality: 440
kern.eventtimer.et.HPET2.flags: 3
kern.eventtimer.et.HPET2.frequency: 14318180
kern.eventtimer.et.HPET2.quality: 440
kern.eventtimer.periodic: 0
kern.eventtimer.timer: LAPIC
kern.eventtimer.activetick: 1
kern.eventtimer.idletick: 0
kern.eventtimer.singlemul: 2
net.inet.tcp.timer_race: 0
net.inet.tcp.per_cpu_timers: 0
machdep.acpi_timer_freq: 3579545
p1003_1b.timers: 200112
p1003_1b.delaytimer_max: 2147483647
p1003_1b.timer_max: 32
dev.attimer.0.%desc: AT timer
dev.attimer.0.%driver: attimer
dev.attimer.0.%location: handle=\_SB_.PCI0.LPC_.TMR_
dev.attimer.0.%pnpinfo: _HID=PNP0100 _UID=0
dev.attimer.0.%parent: acpi0
dev.acpi_timer.0.%desc: 24-bit timer at 3.579545MHz
dev.acpi_timer.0.%driver: acpi_timer
dev.acpi_timer.0.%location: unknown
dev.acpi_timer.0.%pnpinfo: unknown
dev.acpi_timer.0.%parent: acpi0
dev.em.0.interrupts.rx_pkt_timer: 0
dev.em.0.interrupts.rx_abs_timer: 0
dev.em.0.interrupts.tx_pkt_timer: 0
dev.em.0.interrupts.tx_abs_timer: 0
dev.em.1.interrupts.rx_pkt_timer: 0
dev.em.1.interrupts.rx_abs_timer: 0
dev.em.1.interrupts.tx_pkt_timer: 0
dev.em.1.interrupts.tx_abs_timer: 0


Wow. What a ride. Cost me about three days. Now, what am I going to do with two add-on ethernet re type nics it turns out I don't need? I hope this helps someone else. Could someone who knows post it in the right spot on the freebsd bug system? It looks like freebsd just chose the wrong timer.

User avatar
Parkcomm
Advanced User
Advanced User
Posts: 388
Joined: 21 Sep 2012 12:58
Location: Australia
Status: Offline

Re: send heavy tcp stream temporarily freezes all networking

#5

Post by Parkcomm »

Hey Harry,

I've observed a similar behaviour since upgrading to 10.1 - Ive looked at my timers and I have

Code: Select all

kern.eventtimer.timer: HPET
However I have

Code: Select all

kern.eventtimer.et.HPET.flags: 7
Whereas you had a value of 3 - Flag 4 means the "timer is per CPU" so it makes sense that this could be a problem. LAPIC also has flag 8, "timer may stop when CPU goes to sleep state", which I can't see a having an effect on load.

So I was wondering how did you arrive at the LAPIC/HPET issue?
Last edited by Parkcomm on 30 Aug 2015 03:27, edited 2 times in total.
NAS4Free Embedded 10.2.0.2 - Prester (revision 2003), HP N40L Microserver (AMD Turion) with modified BIOS, ZFS Mirror 4 x WD Red + L2ARC 128M Apple SSD, 10G ECC Ram, Intel 1G CT NIC + inbuilt broadcom

User avatar
Parkcomm
Advanced User
Advanced User
Posts: 388
Joined: 21 Sep 2012 12:58
Location: Australia
Status: Offline

Re: send heavy tcp stream temporarily freezes all networking

#6

Post by Parkcomm »

Played with LAPIC and HPET, ad as expected this had little effect.

However

Code: Select all

/boot/loader.conf
hw.em.rxd=4096
hw.em.txd=4096
reduced the interrupt load to a third of its previous value under load, and got 3-4 the throughtput
NAS4Free Embedded 10.2.0.2 - Prester (revision 2003), HP N40L Microserver (AMD Turion) with modified BIOS, ZFS Mirror 4 x WD Red + L2ARC 128M Apple SSD, 10G ECC Ram, Intel 1G CT NIC + inbuilt broadcom

harryc
Starter
Starter
Posts: 25
Joined: 08 Nov 2012 22:12
Status: Offline

Re: send heavy tcp stream temporarily freezes all networking

#7

Post by harryc »

Can't speak to v10 of course. In v9 I did set the tx and rx descriptor pool size to 4096 prior, the problem was only resolved by using a different timer. Remember, my issue wasn't throughput, it was a complete thread lockup in the context of tcp transfers. The timer capability in my case was set by the hardware detection routines, the system provided three which were independent of the processor core count.

No idea where in Freebsd losing track of a timer expiration happened. If I had to guess, I bet there was some other hardware that shared the interrupt line (MSI is off in that system's case), but the *BSD code depended on MSI thinking it 'owned' the interrupt. I've long since moved on as this system was part of a much larger project, not a 'let's dive into BSD' task.

harryc
Starter
Starter
Posts: 25
Joined: 08 Nov 2012 22:12
Status: Offline

Re: send heavy tcp stream temporarily freezes all networking

#8

Post by harryc »

You might find these changes of interest re: throughput. I don't remember if they're installed by default on nas4free or I pasted them in long ago:

hw.em.rx_abs_int_delay 250 Tune em driver for servers Edit option Delete option
hw.em.rx_int_delay 250 Tune em driver for servers Edit option Delete option
hw.em.rx_process_limit -1 Tune em driver for servers Edit option Delete option
hw.em.rxd 4096 Tune em driver for servers Edit option Delete option
hw.em.tx_abs_int_delay 250 Tune em driver for servers Edit option Delete option
hw.em.tx_int_delay 250 Tune em driver for servers Edit option Delete option
hw.em.txd 4096 Tune em driver for servers

User avatar
Parkcomm
Advanced User
Advanced User
Posts: 388
Joined: 21 Sep 2012 12:58
Location: Australia
Status: Offline

Re: send heavy tcp stream temporarily freezes all networking

#9

Post by Parkcomm »

Here are the defaults - I have not been able to measure the effect of the above yet.

Code: Select all

hw.em.tx_int_delay: 66
hw.em.rx_int_delay: 0
hw.em.tx_abs_int_delay: 66
hw.em.rx_abs_int_delay: 66
hw.em.rxd: 4096 (modified by me)
hw.em.txd: 4096 (modified by me)
hw.em.smart_pwr_down: 0
hw.em.sbp: 0
hw.em.enable_msix: 1
hw.em.rx_process_limit: 100
hw.em.eee_setting: 1
NAS4Free Embedded 10.2.0.2 - Prester (revision 2003), HP N40L Microserver (AMD Turion) with modified BIOS, ZFS Mirror 4 x WD Red + L2ARC 128M Apple SSD, 10G ECC Ram, Intel 1G CT NIC + inbuilt broadcom

User avatar
Parkcomm
Advanced User
Advanced User
Posts: 388
Joined: 21 Sep 2012 12:58
Location: Australia
Status: Offline

Re: send heavy tcp stream temporarily freezes all networking

#10

Post by Parkcomm »

Problem solved using this config, thanks harryc

Code: Select all

#sysctl -a | grep hw.em
hw.em.tx_int_delay: 512
hw.em.rx_int_delay: 512
hw.em.tx_abs_int_delay: 1024
hw.em.rx_abs_int_delay: 1024
hw.em.rxd: 4096
hw.em.txd: 4096
hw.em.smart_pwr_down: 0
hw.em.sbp: 0
hw.em.enable_msix: 1
hw.em.rx_process_limit: 100
hw.em.eee_setting: 1
NAS4Free Embedded 10.2.0.2 - Prester (revision 2003), HP N40L Microserver (AMD Turion) with modified BIOS, ZFS Mirror 4 x WD Red + L2ARC 128M Apple SSD, 10G ECC Ram, Intel 1G CT NIC + inbuilt broadcom

brejoc
NewUser
NewUser
Posts: 1
Joined: 23 Feb 2016 22:07
Status: Offline

Re: send heavy tcp stream temporarily freezes all networking

#11

Post by brejoc »

Could this be related to this problem? -> https://bugs.freebsd.org/bugzilla/show_ ... ?id=202140

User avatar
Parkcomm
Advanced User
Advanced User
Posts: 388
Joined: 21 Sep 2012 12:58
Location: Australia
Status: Offline

Re: send heavy tcp stream temporarily freezes all networking

#12

Post by Parkcomm »

Not for me - I had power management turned off during while i sorted out the heavy interrupt issue
NAS4Free Embedded 10.2.0.2 - Prester (revision 2003), HP N40L Microserver (AMD Turion) with modified BIOS, ZFS Mirror 4 x WD Red + L2ARC 128M Apple SSD, 10G ECC Ram, Intel 1G CT NIC + inbuilt broadcom

Post Reply

Return to “LAN , Network controllers”