This is the old XigmaNAS forum in read only mode,
it will taken offline by the end of march 2021!



I like to aks Users and Admins to rewrite/take over important post from here into the new fresh main forum!
Its not possible for us to export from here and import it to the main forum!

Drive timeout and removal (LSI9211 & Intel X25-E)

Hard disks, HDD, RAID Hardware, disk controllers, SATA, PATA, SCSI, IDE, On Board, USB, Firewire, CF (Compact Flash)
Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
User avatar
ccie4526
NewUser
NewUser
Posts: 12
Joined: 30 Nov 2014 22:25
Status: Offline

Drive timeout and removal (LSI9211 & Intel X25-E)

Post by ccie4526 »

Ok, I'm going nuts trying to figure this out.

I have a ZFS pool created with separate L2ARC and ZIL disks. Randomly, the ZIL disk (Intel X25-E, 32GB) will disappear (ZFS shows DEGRADED), and I find corresponding timeout issues in the system.log:

Code: Select all

Nov 21 08:30:38 mcs7835-nas kernel: mps0: mpssas_scsiio_timeout checking sc 0xffffff8002507000 cm 0xffffff8002554f00
Nov 21 08:30:38 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 736 command timeout cm 0xffffff8002554f00 ccb 0xfffffe0010ca4800
Nov 21 08:30:38 mcs7835-nas kernel: mps0: mpssas_alloc_tm freezing simq
Nov 21 08:30:38 mcs7835-nas kernel: mps0: timedout cm 0xffffff8002554f00 allocated tm 0xffffff800251a148
Nov 21 08:30:38 mcs7835-nas kernel: mps0: mpssas_scsiio_timeout checking sc 0xffffff8002507000 cm 0xffffff800254e5f0
Nov 21 08:30:38 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 654 command timeout cm 0xffffff800254e5f0 ccb 0xfffffe0010cb8800
Nov 21 08:30:38 mcs7835-nas kernel: mps0: queued timedout cm 0xffffff800254e5f0 for processing by tm 0xffffff800251a148
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 736 completed timedout cm 0xffffff8002554f00 ccb 0xfffffe0010ca4800 during recovery ioc 8048 scsi 0 state c xfer 4096
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 654 completed timedout cm 0xffffff800254e5f0 ccb 0xfffffe0010cb8800 during recovery ioc 804b scsi 0 state c(da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 654 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (noperiph:mps0:0:10:0): SMID 1 abort TaskMID 736 status 0x4a code 0x0 count 2
Nov 21 08:30:42 mcs7835-nas kernel: (noperiph:mps0:0:10:0): SMID 1 finished recovery after aborting TaskMID 736
Nov 21 08:30:42 mcs7835-nas kernel: mps0: mpssas_free_tm releasing simq
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): CAM status: Command timeout
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): Retrying command
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 745 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 593 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 698 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 604 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 455 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 474 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 158 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 488 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 961 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 583 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 735 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 109 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 942 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 792 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 140 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 665 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 248 terminated ioc 804b scsi 0 state c xfe
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 691 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 271 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 351 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 527 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 952 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 847 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 76 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 184 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 146 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 520 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 288 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 683 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 336 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:44 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 656 terminated ioc 804b scsi 0 state 0 xfer 0
Nov 21 08:30:44 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 953 terminated ioc 804b scsi 0 state 0 xfer 0
Nov 21 08:30:45 mcs7835-nas kernel: mps0: mpssas_alloc_tm freezing simq
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): CAM status: CCB request aborted by the host
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): Retrying command
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): CAM status: CCB request aborted by the host
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): Retrying command
Nov 21 08:30:45 mcs7835-nas kernel: mps0: mpssas_remove_complete on handle 0x000c, IOCStatus= 0x0
Nov 21 08:30:45 mcs7835-nas kernel: mps0: mpssas_free_tm releasing simq
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): lost device - 2 outstanding, 3 refs
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): oustanding 1
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): oustanding 0
Nov 21 08:30:46 mcs7835-nas kernel: (da2:mps0:0:10:0): removing device entry
Yet interestingly enough, mere seconds later, the drive shows back up:

Code: Select all

Nov 21 08:30:49 mcs7835-nas kernel: da2 at mps0 bus 0 scbus0 target 10 lun 0
Nov 21 08:30:49 mcs7835-nas kernel: da2: <ATA SSDSA2SH032G1GN 8860> Fixed Direct Access SCSI-6 device
Nov 21 08:30:49 mcs7835-nas kernel: da2: 300.000MB/s transfers
Nov 21 08:30:49 mcs7835-nas kernel: da2: Command Queueing enabled
Nov 21 08:30:49 mcs7835-nas kernel: da2: 30517MB (62500000 512 byte sectors: 255H 63S/T 3890C)
Nov 21 08:30:49 mcs7835-nas kernel: ses0: pass2,da2: SAS Device Slot Element: 1 Phys at Slot 0
Nov 21 08:30:49 mcs7835-nas kernel: ses0:  phy 0: SATA device
Nov 21 08:30:49 mcs7835-nas kernel: ses0:  phy 0: parent 5001438022fbc726 addr 5001438022fbc70a
Needless to say, I have to remove and re-add the disk as the log disk for the zpool, and it runs fine again for a random period of time (sometimes days, sometimes weeks) before it drops back out again.

I am running 9.2.0.1(rev 972).

Controller is an LSI9211-8i with IT firmware for JBOD:

Code: Select all

mps0: <LSI SAS2008> port 0x4000-0x40ff mem 0xfbef0000-0xfbef3fff,0xfbe80000-0xfbebffff irq 26 at device 0.0 on pci20
mps0: Firmware: 19.00.00.00, Driver: 14.00.00.01-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
Ideas what I can do to get this resolved? I really hate having my ZIL disk randomly disappearing.
NAS1:Nas4Free 10.2.0.2.2235. HP DL380G6, 72GB DDR3, 2x X5560 QuadCore, 2x10Gb NIC, LSI9211-8i & LSI9200-8e with IT firmware. 16x 146GB 10K SAS, 25x 300GB 10K SAS. 2x 100GB SSD SAS, Intel 750 NVMe 400GB.
NAS2:Nas4Free 10.2.0.2.2433. HP N54L, 8GB DDR2, AMD Turion DualCore, 1Gb NIC, 2x 1TB WD Red, 1x 32Gb X25-e SSD
NAS3:Nas4Free 10.2.0.2.2235. Sun X4540, 64GB DDR2, 2x AMD Opteron QuadCore, 2x10Gb NIC, 19x 750GB SATA, Intel 750 NVMe 400GB

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2438
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: Drive timeout and removal (LSI9211 & Intel X25-E)

Post by b0ssman »

Which firmware version did you flash? I think the FreeBSD driver is on version p16


Sent from my iPhone using Tapatalk
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

User avatar
ccie4526
NewUser
NewUser
Posts: 12
Joined: 30 Nov 2014 22:25
Status: Offline

Re: Drive timeout and removal (LSI9211 & Intel X25-E)

Post by ccie4526 »

Hmm, I thought that showed...
mps0: Firmware: 19.00.00.00, Driver: 14.00.00.01-fbsd
NAS1:Nas4Free 10.2.0.2.2235. HP DL380G6, 72GB DDR3, 2x X5560 QuadCore, 2x10Gb NIC, LSI9211-8i & LSI9200-8e with IT firmware. 16x 146GB 10K SAS, 25x 300GB 10K SAS. 2x 100GB SSD SAS, Intel 750 NVMe 400GB.
NAS2:Nas4Free 10.2.0.2.2433. HP N54L, 8GB DDR2, AMD Turion DualCore, 1Gb NIC, 2x 1TB WD Red, 1x 32Gb X25-e SSD
NAS3:Nas4Free 10.2.0.2.2235. Sun X4540, 64GB DDR2, 2x AMD Opteron QuadCore, 2x10Gb NIC, 19x 750GB SATA, Intel 750 NVMe 400GB

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2438
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: Drive timeout and removal (LSI9211 & Intel X25-E)

Post by b0ssman »

try updating to the 14 firmware then
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

User avatar
ccie4526
NewUser
NewUser
Posts: 12
Joined: 30 Nov 2014 22:25
Status: Offline

Re: Drive timeout and removal (LSI9211 & Intel X25-E)

Post by ccie4526 »

Yesterday evening, the X25 went timeout *again*, and it borked all of the ESXi servers that had the iscsi target mounted. Took down the entire VM cluster for the umpteenth time. I'm done, I can't have production systems randomly dropping offline because of this. I just did a zpool remove of that drive from that array and am just going to run that array server without it henceforth.

I do have a new HP N54L that I'm going to build up as a home NAS (with N4F embedded), I'll move the X25 over to it and see what I can do at that point.
NAS1:Nas4Free 10.2.0.2.2235. HP DL380G6, 72GB DDR3, 2x X5560 QuadCore, 2x10Gb NIC, LSI9211-8i & LSI9200-8e with IT firmware. 16x 146GB 10K SAS, 25x 300GB 10K SAS. 2x 100GB SSD SAS, Intel 750 NVMe 400GB.
NAS2:Nas4Free 10.2.0.2.2433. HP N54L, 8GB DDR2, AMD Turion DualCore, 1Gb NIC, 2x 1TB WD Red, 1x 32Gb X25-e SSD
NAS3:Nas4Free 10.2.0.2.2235. Sun X4540, 64GB DDR2, 2x AMD Opteron QuadCore, 2x10Gb NIC, 19x 750GB SATA, Intel 750 NVMe 400GB

User avatar
ccie4526
NewUser
NewUser
Posts: 12
Joined: 30 Nov 2014 22:25
Status: Offline

Re: Drive timeout and removal (LSI9211 & Intel X25-E)

Post by ccie4526 »

Just an update, got that N54L up and running with the X25 as a log drive in that machine, and no issues with timeout/removal thus far.
NAS1:Nas4Free 10.2.0.2.2235. HP DL380G6, 72GB DDR3, 2x X5560 QuadCore, 2x10Gb NIC, LSI9211-8i & LSI9200-8e with IT firmware. 16x 146GB 10K SAS, 25x 300GB 10K SAS. 2x 100GB SSD SAS, Intel 750 NVMe 400GB.
NAS2:Nas4Free 10.2.0.2.2433. HP N54L, 8GB DDR2, AMD Turion DualCore, 1Gb NIC, 2x 1TB WD Red, 1x 32Gb X25-e SSD
NAS3:Nas4Free 10.2.0.2.2235. Sun X4540, 64GB DDR2, 2x AMD Opteron QuadCore, 2x10Gb NIC, 19x 750GB SATA, Intel 750 NVMe 400GB

Post Reply

Return to “Hard disk & controller”