*New 11.4 series Release:
2020-07-03: XigmaNAS 11.4.0.4.7633 - released!

*New 12.1 series Release:
2020-04-17: XigmaNAS 12.1.0.4.7542 - released


We really need "Your" help on XigmaNAS https://translations.launchpad.net/xigmanas translations. Please help today!

Producing and hosting XigmaNAS costs money. Please consider donating for our project so that we can continue to offer you the best.
We need your support! eg: PAYPAL

RTL8169 NIC Drops Connection Under Load

NIC, network controllers, compatibility questions, WOL, wake on lan
Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
bjones371
NewUser
NewUser
Posts: 9
Joined: 17 Nov 2014 11:49
Status: Offline

RTL8169 NIC Drops Connection Under Load

#1

Post by bjones371 »

Hi,

I've set up N4F using SMB to share a ZFS Dataset with compression enabled to a Windows Server so I can take backups. After some tweaking with ZFSKernTune and a few other recommended best practices I've managed to get the throughput quite high, however the network card dies after a period of time under load, and requires a reboot of N4F to come back.

I have a 10/100 onboard card which is connected and used purely for management and is on my normal LAN, and a 10/100/1000 RTL8169 PCI card which is the card bound to the CIFS service and the one being used for the data transfer. The PCI card is connected to a separate switch off the main LAN, as is another PCI card in the Windows Server, so the data transfer is going across it's own switch with no other data. When the problem occurs, the PCI card in the N4F box stops responding to ping and I can no longer connect to the CIFS share. If I connect to the WebUI on the LAN card then I can reboot N4F and the share becomes accessible again, but again drops the connection under load.

Are there any log files I can obtain that will help diagnose this issue? Hardware is a Core 2 Quad Q6600 and 4GB (2 x 2GB) RAM. The RAM usage does not go above 60% during file transfers according to Status > System. I'm running the embedded version of N4F.

Thanks,
B

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2454
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#2

Post by b0ssman »

get an intel card. dont waste your time with realtek crap.
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

bjones371
NewUser
NewUser
Posts: 9
Joined: 17 Nov 2014 11:49
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#3

Post by bjones371 »

b0ssman wrote:get an intel card. dont waste your time with realtek crap.
That was my first thought and I'm hunting around to see if I've got one I can try instead, but all I've found so far is another Realtek based one :lol:

Just thought I'd post here to see if there was anything else I could look in to in the meantime.

bjones371
NewUser
NewUser
Posts: 9
Joined: 17 Nov 2014 11:49
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#4

Post by bjones371 »

Definitely looking like a NIC problem, tried running a backup through the 10/100 onboard NIC instead and reached 30GB without issue, normally it'd bomb out within the first 5GB. Will try the other RTL card I have (8168-based rather than 8169, though I don't hold much hope since it's the re(4) driver for both of those cards unless I try compiling a new one) out of curiosity, and look to move to an Intel-based card in the future.

User avatar
ChriZathens
Forum Moderator
Forum Moderator
Posts: 799
Joined: 23 Jun 2012 09:14
Location: Athens, Greece
Contact:
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#5

Post by ChriZathens »

Yeap, Intel cards work much better in *nix systems....
Having said that, my NAS uses a Realtek card and I have transferred as much as 3TB of data at once without issues.
In fact I have an Intel PCI-E card lying at my desk, but since I have no issues with the Realtek card, I am too lazy to plug the Intel..
But perhaps I am just one of a few lucky ones..
My Nas
  1. Case: Fractal Design Define R2
  2. M/B: Supermicro x9scl-f
  3. CPU: Intel Celeron G1620
  4. RAM: 16GB DDR3 ECC (2 x Kingston KVR1333D3E9S/8G)
  5. PSU: Chieftec 850w 80+ modular
  6. Storage: 8x2TB HDDs in a RaidZ2 array ~ 10.1 TB usable disk space
  7. O/S: XigmaNAS 11.2.0.4.6625 -amd64 embedded
  8. Extra H/W: Dell Perc H310 SAS controller, crosflashed to LSI 9211-8i IT mode, 8GB Innodisk D150SV SATADOM for O/S

Backup Nas: U-NAS NSC-400, Gigabyte MB10-DS4 (4x4TB Seagate Exos disks in RaidZ configuration - 32GB RAM)

bjones371
NewUser
NewUser
Posts: 9
Joined: 17 Nov 2014 11:49
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#6

Post by bjones371 »

Found a 1Gbps Intel NIC and the whole thing is so much more responsive, even things like browsing to the share via SMB are instant now rather than suffering from a few seconds worth of delays... Didn't think it would have made that much of a difference!

Running a backup to it now, hopefully will get some sustained high throughput, it's started at around 50-70MB/s, with Task Manager averaging at around 350Mb/s. Strange how speeds were fine in iPerf but no good when trying to use Samba!

Anyway, all sorted - thanks folks.

EDIT: Spoke too soon! The throughput was much higher while it worked, but it's now bombed again with exactly the same symptoms...

bjones371
NewUser
NewUser
Posts: 9
Joined: 17 Nov 2014 11:49
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#7

Post by bjones371 »

Getting a lot of em0: watchdog timeout -- resetting errors logged when the issue occurs if that helps.

User avatar
ChriZathens
Forum Moderator
Forum Moderator
Posts: 799
Joined: 23 Jun 2012 09:14
Location: Athens, Greece
Contact:
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#8

Post by ChriZathens »

My Nas
  1. Case: Fractal Design Define R2
  2. M/B: Supermicro x9scl-f
  3. CPU: Intel Celeron G1620
  4. RAM: 16GB DDR3 ECC (2 x Kingston KVR1333D3E9S/8G)
  5. PSU: Chieftec 850w 80+ modular
  6. Storage: 8x2TB HDDs in a RaidZ2 array ~ 10.1 TB usable disk space
  7. O/S: XigmaNAS 11.2.0.4.6625 -amd64 embedded
  8. Extra H/W: Dell Perc H310 SAS controller, crosflashed to LSI 9211-8i IT mode, 8GB Innodisk D150SV SATADOM for O/S

Backup Nas: U-NAS NSC-400, Gigabyte MB10-DS4 (4x4TB Seagate Exos disks in RaidZ configuration - 32GB RAM)

bjones371
NewUser
NewUser
Posts: 9
Joined: 17 Nov 2014 11:49
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#9

Post by bjones371 »

ChriZathens wrote:Found this:
https://forum.pfsense.org/index.php?topic=81929.0
See if it helps...
Thanks, I'd spotted that earlier but couldn't get the settings to "stick" after a reboot, but realised it's because I'm running embedded so needed to remount /cf in rw to edit the /cf/boot/loader.conf. Unfortunately it seems to have made the problem worse if anything, can only manage 500MB or so now before the NIC shuts off. There's a few more errors output in dmesg following that change too:

Code: Select all

em0: link state changed to UP
em0: Watchdog timeout -- resetting
em0: link state changed to DOWN
ahcich3: Timeout on slot 21 port 0
ahcich3: is 00000001 cs 00000000 ss 00000000 rs 00300000 tfd 50 serr 00000000 cmd 00049517
(ada0:ahcich3:0:0:0): WRITE_DMA. ACB: ca 00 20 16 02 46 00 00 00 00 00 00
(ada0:ahcich3:0:0:0): CAM status: Command timeout
(ada0:ahcich3:0:0:0): Retrying command
ada0 being the disk drive suggests there's either a failing disk (SMART checks out OK), or the SATA interface can't keep up with the incoming data on the NIC maybe? Starting to wonder if it's just that the hardware can't cope with ZFS :roll:

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2454
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#10

Post by b0ssman »

btw the mainboard should have a pcie slot that you could us for a pcie network card.
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

User avatar
ChriZathens
Forum Moderator
Forum Moderator
Posts: 799
Joined: 23 Jun 2012 09:14
Location: Athens, Greece
Contact:
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#11

Post by ChriZathens »

bjones371 wrote:
ChriZathens wrote:Found this:
https://forum.pfsense.org/index.php?topic=81929.0
See if it helps...
Thanks, I'd spotted that earlier but couldn't get the settings to "stick" after a reboot, but realised it's because I'm running embedded so needed to remount /cf in rw to edit the /cf/boot/loader.conf. Unfortunately it seems to have made the problem worse if anything, can only manage 500MB or so now before the NIC shuts off. There's a few more errors output in dmesg following that change too:

Code: Select all

em0: link state changed to UP
em0: Watchdog timeout -- resetting
em0: link state changed to DOWN
ahcich3: Timeout on slot 21 port 0
ahcich3: is 00000001 cs 00000000 ss 00000000 rs 00300000 tfd 50 serr 00000000 cmd 00049517
(ada0:ahcich3:0:0:0): WRITE_DMA. ACB: ca 00 20 16 02 46 00 00 00 00 00 00
(ada0:ahcich3:0:0:0): CAM status: Command timeout
(ada0:ahcich3:0:0:0): Retrying command
ada0 being the disk drive suggests there's either a failing disk (SMART checks out OK), or the SATA interface can't keep up with the incoming data on the NIC maybe? Starting to wonder if it's just that the hardware can't cope with ZFS :roll:
Just for the record, you can add the settings you like in System|Advanced|loader.conf - no need to mount cf and such
As for your ahci error, check the smart status of the specific disk and see if UDMA_CRC_Error_Count has anything but 0 - this means you have a bad cable
My Nas
  1. Case: Fractal Design Define R2
  2. M/B: Supermicro x9scl-f
  3. CPU: Intel Celeron G1620
  4. RAM: 16GB DDR3 ECC (2 x Kingston KVR1333D3E9S/8G)
  5. PSU: Chieftec 850w 80+ modular
  6. Storage: 8x2TB HDDs in a RaidZ2 array ~ 10.1 TB usable disk space
  7. O/S: XigmaNAS 11.2.0.4.6625 -amd64 embedded
  8. Extra H/W: Dell Perc H310 SAS controller, crosflashed to LSI 9211-8i IT mode, 8GB Innodisk D150SV SATADOM for O/S

Backup Nas: U-NAS NSC-400, Gigabyte MB10-DS4 (4x4TB Seagate Exos disks in RaidZ configuration - 32GB RAM)

bjones371
NewUser
NewUser
Posts: 9
Joined: 17 Nov 2014 11:49
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#12

Post by bjones371 »

Thanks for the pointer on the loader.conf - I'd used the Advanced | Sysctl.conf already so not sure how I missed loader! I rebuilt the pen drive from fresh yesterday and recreated the ZFS Pool and Dataset from scratch as I'd done a lot of tinkering with various things in trying to get the Realtek card working. The throughput on my next attempt was considerably higher than it had been, but it still bombed after around 30GB worth of data. I've got a different module for the Intel NIC (v7.4.2) that I compiled in FreeBSD 9.2-RELEASE and tried out yesterday before I rebuilt it and that didn't help either, not tried it since rebuilding though. Disabling MSI in the loader.conf still has the effect of making the network connection fail more quickly though, so at least it's consistent.

The motherboard does have a PCIe slot, but the only PCIe NIC I have is another Realtek one, so I've not bothered entertaining the idea of putting it in :lol:

UDMA CRC Errors come back clean according to smart, here's the info it's spitting out:

Code: Select all

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST2000DM001-1ER164
Serial Number:    Z4Z0T6SM
LU WWN Device Id: 5 000c50 079579072
Firmware Version: CC25
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Tue Nov 18 09:20:09 2014 UTC

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		(   80) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 213) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x1085)	SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   114   100   006    Pre-fail  Always       -       61222872
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       24
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   100   253   030    Pre-fail  Always       -       199662
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       43
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       24
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   067   045    Old_age   Always       -       31 (Min/Max 31/32)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       16
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       122
194 Temperature_Celsius     0x0022   031   040   000    Old_age   Always       -       31 (0 18 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       40h+08m+43.886s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       360122796
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       8614696021

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%         4         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Thanks for all the help so far by the way!

User avatar
ChriZathens
Forum Moderator
Forum Moderator
Posts: 799
Joined: 23 Jun 2012 09:14
Location: Athens, Greece
Contact:
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#13

Post by ChriZathens »

OK, so cables seem OK....
Please tell me something else... do you have any scripts that check the status of your hdds frequently?
My Nas
  1. Case: Fractal Design Define R2
  2. M/B: Supermicro x9scl-f
  3. CPU: Intel Celeron G1620
  4. RAM: 16GB DDR3 ECC (2 x Kingston KVR1333D3E9S/8G)
  5. PSU: Chieftec 850w 80+ modular
  6. Storage: 8x2TB HDDs in a RaidZ2 array ~ 10.1 TB usable disk space
  7. O/S: XigmaNAS 11.2.0.4.6625 -amd64 embedded
  8. Extra H/W: Dell Perc H310 SAS controller, crosflashed to LSI 9211-8i IT mode, 8GB Innodisk D150SV SATADOM for O/S

Backup Nas: U-NAS NSC-400, Gigabyte MB10-DS4 (4x4TB Seagate Exos disks in RaidZ configuration - 32GB RAM)

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2454
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#14

Post by b0ssman »

What Chipset does the motherboard have?


Sent from my iPhone using Tapatalk
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

bjones371
NewUser
NewUser
Posts: 9
Joined: 17 Nov 2014 11:49
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#15

Post by bjones371 »

No scripts or anything like that running, it's a clean NAS4Free Embedded build I have running now - all I've done is set up a basic ZFS structure and shared it using Samba, beyond that it's pretty much as you'd find it installed fresh off the disk.

Motherboard is using an nVidia Geforce 7050 / 610i chipset, so not the most modern of hardware I admit. This website has a good breakdown of the motherboard specs. http://www.ascendtech.us/ecs-mcp73vt-pm ... 3vtpm.aspx

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2454
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#16

Post by b0ssman »

the nivida chipset will be your problem.
it is not well supported under freebsd and causes problems with the sata controller and other stuff.
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

bjones371
NewUser
NewUser
Posts: 9
Joined: 17 Nov 2014 11:49
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#17

Post by bjones371 »

Cool - I'll start looking for an alternative that's not BSD based :-)

sarwanov
NewUser
NewUser
Posts: 5
Joined: 29 Jan 2015 16:08
Status: Offline

Re: RTL8169 NIC Drops Connection Under Load

#18

Post by sarwanov »

There is no need to waste your time anymore you just need to get an intel card and that' all.
Graduated from Soran University with First Class Degree with Honours in Computer Science.

Post Reply

Return to “LAN , Network controllers”