*New 11.3 series Release:
2019-10-19: XigmaNAS 11.3.0.4.7014 - released

*New 12.0 series Release:
2019-10-05: XigmaNAS 12.0.0.4.6928 - released!

*New 11.2 series Release:
2019-09-23: XigmaNAS 11.2.0.4.6881 - released!

We really need "Your" help on XigmaNAS https://translations.launchpad.net/xigmanas translations. Please help today!

Producing and hosting XigmaNAS costs money. Please consider donating for our project so that we can continue to offer you the best.
We need your support! eg: PAYPAL

ISCI failures from Xenserver...

iSCSI over TCP/IP.
Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
rsgooch
NewUser
NewUser
Posts: 1
Joined: 24 Oct 2013 22:35
Status: Offline

ISCI failures from Xenserver...

#1

Post by rsgooch » 24 Oct 2013 23:16

I have a xenserver 6.0 server running. Iw as connection via SMB for my backup software. The vm is running Ubuntu 12.04 and iscsid version 2.0-817. I can connect to the Nas4Free Box and mount the iscsi target but when my backup goes off the next morning the VM has remounted the iscsi target as read only and generated a lot of I/O errors. On the NAS4Free box has the following errors in the Log:
User Event
Oct 25 02:35:06 nyx istgt[1742]: Login from iqn.1993-08.org.debian:01:8c18fe31eff0 (192.168.24.5) on iqn.2007-09.jp.ne.peach.istgt:disk0 LU1 (192.168.24.251:3260,2), ISID=23d010000, TSIH=14, CID=0, HeaderDigest=off, DataDigest=off
Oct 25 00:34:29 nyx istgt[1742]: Login from iqn.1993-08.org.debian:01:8c18fe31eff0 (192.168.24.5) on iqn.2007-09.jp.ne.peach.istgt:disk0 LU1 (192.168.24.251:3260,2), ISID=23d010000, TSIH=13, CID=0, HeaderDigest=off, DataDigest=off
Oct 25 00:30:30 nyx istgt[1742]: Login from iqn.1993-08.org.debian:01:8c18fe31eff0 (192.168.24.5) on iqn.2007-09.jp.ne.peach.istgt:disk0 LU1 (192.168.24.251:3260,2), ISID=23d010000, TSIH=12, CID=0, HeaderDigest=off, DataDigest=off
Oct 25 00:03:00 nyx istgt[1742]: Login from iqn.1993-08.org.debian:01:8c18fe31eff0 (192.168.24.5) on iqn.2007-09.jp.ne.peach.istgt:disk0 LU1 (192.168.24.251:3260,2), ISID=23d010000, TSIH=11, CID=0, HeaderDigest=off, DataDigest=off
Oct 24 23:54:05 nyx istgt[1742]: Login from iqn.1993-08.org.debian:01:8c18fe31eff0 (192.168.24.5) on iqn.2007-09.jp.ne.peach.istgt:disk0 LU1 (192.168.24.251:3260,2), ISID=23d010000, TSIH=10, CID=0, HeaderDigest=off, DataDigest=off
Oct 24 23:50:24 nyx istgt[1742]: Login from iqn.1993-08.org.debian:01:8c18fe31eff0 (192.168.24.5) on iqn.2007-09.jp.ne.peach.istgt:disk0 LU1 (192.168.24.251:3260,2), ISID=23d010000, TSIH=9, CID=0, HeaderDigest=off, DataDigest=off
Oct 24 23:44:13 nyx istgt[1742]: Login from iqn.1993-08.org.debian:01:8c18fe31eff0 (192.168.24.5) on iqn.2007-09.jp.ne.peach.istgt:disk0 LU1 (192.168.24.251:3260,2), ISID=23d010000, TSIH=8, CID=0, HeaderDigest=off, DataDigest=off
Oct 24 23:41:28 nyx istgt[1742]: Login from iqn.1993-08.org.debian:01:8c18fe31eff0 (192.168.24.5) on iqn.2007-09.jp.ne.peach.istgt:disk0 LU1 (192.168.24.251:3260,2), ISID=23d010000, TSIH=7, CID=0, HeaderDigest=off, DataDigest=off
Oct 24 23:41:25 nyx istgt[1742]: istgt_iscsi.c:5702:worker: ***ERROR*** iscsi_read_pdu() failed
Oct 24 23:41:25 nyx istgt[1742]: istgt_iscsi.c:5685:worker: ***ERROR*** conn->state = 1
Oct 24 23:41:25 nyx istgt[1742]: istgt_iscsi.c: 766:istgt_iscsi_read_pdu: ***ERROR*** readv() failed (-1,errno=54,iqn.1993-08.org.debian:01:8c18fe31eff0,time=0)
Oct 24 23:37:51 nyx istgt[1742]: Login from iqn.1993-08.org.debian:01:8c18fe31eff0 (192.168.24.5) on iqn.2007-09.jp.ne.peach.istgt:disk0 LU1 (192.168.24.251:3260,2), ISID=23d010000, TSIH=6, CID=0, HeaderDigest=off, DataDigest=off
Oct 24 23:37:47 nyx istgt[1742]: istgt_iscsi.c:5702:worker: ***ERROR*** iscsi_read_pdu() failed
Oct 24 23:37:47 nyx istgt[1742]: istgt_iscsi.c:5685:worker: ***ERROR*** conn->state = 1
Oct 24 23:37:47 nyx istgt[1742]: istgt_iscsi.c: 766:istgt_iscsi_read_pdu: ***ERROR*** readv() failed (-1,errno=54,iqn.1993-08.org.debian:01:8c18fe31eff0,time=0)
Oct 24 23:24:29 nyx istgt[1742]: Login from iqn.1993-08.org.debian:01:8c18fe31eff0 (192.168.24.5) on iqn.2007-09.jp.ne.peach.istgt:disk0 LU1 (192.168.24.251:3260,2), ISID=23d010000, TSIH=5, CID=0, HeaderDigest=off, DataDigest=off
Oct 24 23:23:32 nyx istgt[1742]: Login from iqn.1993-08.org.debian:01:8c18fe31eff0 (192.168.24.5) on iqn.2007-09.jp.ne.peach.istgt:disk0 LU1 (192.168.24.251:3260,2), ISID=23d010000, TSIH=4, CID=0, HeaderDigest=off, DataDigest=off
Oct 24 23:16:56 nyx istgt[1742]: Login from iqn.1993-08.org.debian:01:8c18fe31eff0 (192.168.24.5) on iqn.2007-09.jp.ne.peach.istgt:disk0 LU1 (192.168.24.251:3260,2), ISID=23d010000, TSIH=3, CID=0, HeaderDigest=off, DataDigest=off


Errors from the VM on Xen:
Oct 24 23:16:04 backup kernel: [105066.336084] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4321156376, last ping 4321157628, now 4321158880
Oct 24 23:16:04 backup kernel: [105066.336112] connection1:0: detected conn error (1011)
Oct 24 23:16:05 backup iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Oct 24 23:16:54 backup iscsid: connection1:0 is operational after recovery (4 attempts)
Oct 24 23:17:01 backup CRON[27743]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Oct 24 23:23:09 backup kernel: [105491.712082] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4321262720, last ping 4321263972, now 4321265224
Oct 24 23:23:09 backup kernel: [105491.712110] connection1:0: detected conn error (1011)
Oct 24 23:23:10 backup iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Oct 24 23:23:31 backup iscsid: connection1:0 is operational after recovery (2 attempts)
Oct 24 23:24:05 backup kernel: [105547.872088] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4321276760, last ping 4321278012, now 4321279264
Oct 24 23:24:05 backup kernel: [105547.872113] connection1:0: detected conn error (1011)
Oct 24 23:24:06 backup iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Oct 24 23:24:28 backup iscsid: connection1:0 is operational after recovery (2 attempts)
Oct 24 23:37:23 backup kernel: [106345.920080] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4321476272, last ping 4321477524, now 4321478776
Oct 24 23:37:23 backup kernel: [106345.920108] connection1:0: detected conn error (1011)
Oct 24 23:37:24 backup iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Oct 24 23:37:50 backup iscsid: connection1:0 is operational after recovery (2 attempts)
Oct 24 23:41:14 backup kernel: [106577.072098] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4321534063, last ping 4321535313, now 4321536564
Oct 24 23:41:14 backup kernel: [106577.072126] connection1:0: detected conn error (1011)
Oct 24 23:41:15 backup iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Oct 24 23:41:26 backup iscsid: connection1:0 is operational after recovery (1 attempts)
Oct 24 23:43:03 backup kernel: [106685.696083] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4321561217, last ping 4321562467, now 4321563720
Oct 24 23:43:03 backup kernel: [106685.696110] connection1:0: detected conn error (1011)
Oct 24 23:43:04 backup iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Oct 24 23:43:54 backup iscsid: connect to 192.168.24.251:3260 failed (No route to host)
Oct 24 23:44:11 iscsid: last message repeated 2 times
Oct 24 23:44:11 backup iscsid: connection1:0 is operational after recovery (7 attempts)
Oct 24 23:45:22 backup kernel: [106824.304070] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4321595871, last ping 4321597121, now 4321598372
Oct 24 23:45:22 backup kernel: [106824.304097] connection1:0: detected conn error (1011)
Oct 24 23:45:22 backup iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Oct 24 23:45:58 backup iscsid: connect to 192.168.24.251:3260 failed (No route to host)
Oct 24 23:46:33 iscsid: last message repeated 5 times
Oct 24 23:47:22 iscsid: last message repeated 9 times
Oct 24 23:47:22 backup kernel: [106944.992093] session1: session recovery timed out after 120 secs
Oct 24 23:47:23 backup kernel: [106944.996307] sd 0:0:0:0: [sda] Unhandled error code
Oct 24 23:47:23 backup kernel: [106944.996311] sd 0:0:0:0: [sda] Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
Oct 24 23:47:23 backup kernel: [106944.996316] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 a2 b8 6c 18 00 02 e0 00
Oct 24 23:47:23 backup kernel: [106944.996329] end_request: recoverable transport error, dev sda, sector 2729995288
Oct 24 23:47:23 backup kernel: [106944.996346] Buffer I/O error on device sda1, logical block 341249155
Oct 24 23:47:23 backup kernel: [106944.996355] Buffer I/O error on device sda1, logical block 341249156
Oct 24 23:47:23 backup kernel: [106944.996362] Buffer I/O error on device sda1, logical block 341249157
Oct 24 23:47:23 backup kernel: [106944.996368] Buffer I/O error on device sda1, logical block 341249158
Oct 24 23:47:23 backup kernel: [106944.996374] Buffer I/O error on device sda1, logical block 341249159
Oct 24 23:47:23 backup kernel: [106944.996381] Buffer I/O error on device sda1, logical block 341249160
Oct 24 23:47:23 backup kernel: [106944.996388] Buffer I/O error on device sda1, logical block 341249161
Oct 24 23:47:23 backup kernel: [106944.996394] Buffer I/O error on device sda1, logical block 341249162
Oct 24 23:47:23 backup kernel: [106944.996401] Buffer I/O error on device sda1, logical block 341249163
Oct 24 23:47:23 backup kernel: [106944.996407] Buffer I/O error on device sda1, logical block 341249164
Oct 24 23:47:23 backup kernel: [106944.996414] Buffer I/O error on device sda1, logical block 341249165
Oct 24 23:47:23 backup kernel: [106944.996420] Buffer I/O error on device sda1, logical block 341249166
Oct 24 23:47:23 backup kernel: [106944.996427] Buffer I/O error on device sda1, logical block 341249167
.....
Oct 24 23:47:29 backup iscsid: connect to 192.168.24.251:3260 failed (No route to host)
Oct 24 23:48:33 iscsid: last message repeated 10 times
Oct 24 23:49:34 iscsid: last message repeated 10 times
Oct 24 23:50:22 iscsid: last message repeated 8 times
Oct 24 23:50:22 backup iscsid: connection1:0 is operational after recovery (47 attempts)
Oct 24 23:51:32 backup kernel: [107194.272075] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4321688360, last ping 4321689612, now 4321690864
Oct 24 23:51:32 backup kernel: [107194.272103] connection1:0: detected conn error (1011)
Oct 24 23:51:32 backup iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Oct 24 23:52:22 backup iscsid: connect to 192.168.24.251:3260 failed (No route to host)
Oct 24 23:53:28 iscsid: last message repeated 11 times
Oct 24 23:53:32 backup kernel: [107314.656082] session1: session recovery timed out after 120 secs
Oct 24 23:53:34 backup iscsid: connect to 192.168.24.251:3260 failed (No route to host)
Oct 24 23:54:03 iscsid: last message repeated 4 times
Oct 24 23:54:03 backup iscsid: connection1:0 is operational after recovery (21 attempts)
Oct 24 23:54:28 backup kernel: [107370.400083] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4321732392, last ping 4321733644, now 4321734896
Oct 24 23:54:28 backup kernel: [107370.400112] connection1:0: detected conn error (1011)
Oct 24 23:54:28 backup iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Oct 24 23:55:04 backup iscsid: connect to 192.168.24.251:3260 failed (No route to host)
Oct 24 23:56:04 iscsid: last message repeated 9 times
Oct 24 23:56:28 iscsid: last message repeated 4 times
Oct 24 23:56:28 backup kernel: [107490.784088] session1: session recovery timed out after 120 secs
Oct 24 23:56:28 backup iscsid: connect to 192.168.24.251:3260 failed (No route to host)
Oct 24 23:57:34 iscsid: last message repeated 10 times
Oct 24 23:58:34 iscsid: last message repeated 10 times
Oct 24 23:59:34 iscsid: last message repeated 10 times
Oct 25 00:00:00 iscsid: last message repeated 5 times
Oct 25 00:00:00 backup kernel: [107703.207040] sd 0:0:0:0: [sda] Unhandled error code
Oct 25 00:00:00 backup kernel: [107703.207046] sd 0:0:0:0: [sda] Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
Oct 25 00:00:00 backup kernel: [107703.207050] sd 0:0:0:0: [sda] CDB: Read(16): 88 00 00 00 00 02 0b 04 08 08 00 00 00 08 00 00
Oct 25 00:00:00 backup kernel: [107703.207067] end_request: recoverable transport error, dev sda, sector 8774748168
Oct 25 00:00:00 backup kernel: [107703.207111] sd 0:0:0:0: [sda] Unhandled error code
Oct 25 00:00:00 backup kernel: [107703.207114] sd 0:0:0:0: [sda] Result: hostbyte=DID_TRANSPORT_FAILFAST driverbyte=DRIVER_OK
Oct 25 00:00:00 backup kernel: [107703.207117] sd 0:0:0:0: [sda] CDB: Read(16): 88 00 00 00 00 02 0b 04 08 08 00 00 00 08 00 00
Oct 25 00:00:00 backup kernel: [107703.207131] end_request: recoverable transport error, dev sda, sector 8774748168
Oct 25 00:00:05 backup iscsid: connect to 192.168.24.251:3260 failed (No route to host)
Oct 25 00:01:11 iscsid: last message repeated 11 times
Oct 25 00:02:17 iscsid: last message repeated 11 times
Oct 25 00:02:58 iscsid: last message repeated 6 times
Oct 25 00:02:58 backup iscsid: connection1:0 is operational after recovery (82 attempts)
Oct 25 00:03:08 backup kernel: [107890.384083] connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4321862389, last ping 4321863640, now 4321864892
Oct 25 00:03:08 backup kernel: [107890.384112] connection1:0: detected conn error (1011)
Oct 25 00:03:08 backup iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Oct 25 00:03:29 backup iscsid: connect to 192.168.24.251:3260 failed (No route to host)
Oct 25 00:04:34 iscsid: last message repeated 10 times
Oct 25 00:05:08 iscsid: last message repeated 6 times
Oct 25 00:05:08 backup kernel: [108010.976136] session1: session recovery timed out after 120 secs
Oct 25 00:05:11 backup iscsid: connect to 192.168.24.251:3260 failed (No route to host)
:

My backup kicks off at about 9PM the biggest load is around 11 PM and then things go bad.

I installed a separate NIC in the the Xenserver and the NAS4Free box and have only 1 vm using the iscsi target and the only 1 target setup on the NAS box. The are on their own GB switch. I moved them from single nic configuration thinking it was the load across the Procurve switch. I have 8GB of RAM on the NAS box and using an ADAPTEC RAID PCIExpress RAID card on the NAS box.

I am still trying to find a solutions with Google, but thought I would post here as well.

Any help would be greatly appreciated!

Thanks!
rsgooch

00Roush
Starter
Starter
Posts: 64
Joined: 15 Sep 2013 09:27
Status: Offline

Re: ISCI failures from Xenserver...

#2

Post by 00Roush » 08 Dec 2013 06:51

In doing a bit of searching on Google it appears this issue is most likely related to very high load on your disks in the NAS4Free server. Basically when your backup job runs it causes NAS4Free disks to be very busy. ISCSI in Xenserver is most likely running into cases were I/O is taking longer than it expects (I/O timeout?). Running gstat on your NAS4Free box during backup could help verify the load.

Based on my understanding there is a couple of things that might help.
If you haven't already... install the ZFS kernel tune extension and set yours for 8GB.
Depending on your work load you could look at adding a SDD log (SLOG) device if you haven't already and are running ZFS.
The most effective would probably be to examine your NAS4Free setup and possibly adjust to get higher performance. This could mean adding more disks or more powerful CPU.

Could you post up your disk setup in NAS4Free?

Not really an expert at this stuff so that is all I can come up with. Hope it helps.

00Roush

hailthemelody
Starter
Starter
Posts: 39
Joined: 26 Feb 2017 15:03
Status: Offline

Re: ISCI failures from Xenserver...

#3

Post by hailthemelody » 04 Mar 2017 02:25

What NICs are you using? I've seen this issue with Broadcoms before. Can elaborate if you confirm.

hailthemelody
Starter
Starter
Posts: 39
Joined: 26 Feb 2017 15:03
Status: Offline

Re: ISCI failures from Xenserver...

#4

Post by hailthemelody » 04 Mar 2017 02:29

Whoa - pardon the necrobump! In case anybody has the same issue, I'll elaborate that I had a similar error happen when sending a high amount of ISCSI traffic to a Broadcom NIC. A SysAdmin colleague shared that the Broadcoms can report that they received a packet when they are overwhelmed (even though they dropped them), resulting in corruptions to what is being saved to the LUN disk. The VM then freaks out in the way shown in your screen grabs, depending on the VM OS of course.

Post Reply

Return to “iSCSI (Internet Small Computer Systems Interface)”