Page 1 of 1

Drive timeout and removal (LSI9211 & Intel X25-E)

Posted: 30 Nov 2014 22:39
by ccie4526
Ok, I'm going nuts trying to figure this out.

I have a ZFS pool created with separate L2ARC and ZIL disks. Randomly, the ZIL disk (Intel X25-E, 32GB) will disappear (ZFS shows DEGRADED), and I find corresponding timeout issues in the system.log:

Code: Select all

Nov 21 08:30:38 mcs7835-nas kernel: mps0: mpssas_scsiio_timeout checking sc 0xffffff8002507000 cm 0xffffff8002554f00
Nov 21 08:30:38 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 736 command timeout cm 0xffffff8002554f00 ccb 0xfffffe0010ca4800
Nov 21 08:30:38 mcs7835-nas kernel: mps0: mpssas_alloc_tm freezing simq
Nov 21 08:30:38 mcs7835-nas kernel: mps0: timedout cm 0xffffff8002554f00 allocated tm 0xffffff800251a148
Nov 21 08:30:38 mcs7835-nas kernel: mps0: mpssas_scsiio_timeout checking sc 0xffffff8002507000 cm 0xffffff800254e5f0
Nov 21 08:30:38 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 654 command timeout cm 0xffffff800254e5f0 ccb 0xfffffe0010cb8800
Nov 21 08:30:38 mcs7835-nas kernel: mps0: queued timedout cm 0xffffff800254e5f0 for processing by tm 0xffffff800251a148
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 736 completed timedout cm 0xffffff8002554f00 ccb 0xfffffe0010ca4800 during recovery ioc 8048 scsi 0 state c xfer 4096
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 654 completed timedout cm 0xffffff800254e5f0 ccb 0xfffffe0010cb8800 during recovery ioc 804b scsi 0 state c(da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 654 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (noperiph:mps0:0:10:0): SMID 1 abort TaskMID 736 status 0x4a code 0x0 count 2
Nov 21 08:30:42 mcs7835-nas kernel: (noperiph:mps0:0:10:0): SMID 1 finished recovery after aborting TaskMID 736
Nov 21 08:30:42 mcs7835-nas kernel: mps0: mpssas_free_tm releasing simq
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): CAM status: Command timeout
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): Retrying command
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 745 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 593 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 698 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 604 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 455 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 474 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 158 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 488 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 961 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:42 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 583 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 735 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 109 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 942 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 792 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 140 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 665 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 248 terminated ioc 804b scsi 0 state c xfe
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 691 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 271 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 351 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 527 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 952 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 847 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 76 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 184 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 146 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 520 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 288 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 683 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:43 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 336 terminated ioc 804b scsi 0 state c xfer 0
Nov 21 08:30:44 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00 length 4096 SMID 656 terminated ioc 804b scsi 0 state 0 xfer 0
Nov 21 08:30:44 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00 length 131072 SMID 953 terminated ioc 804b scsi 0 state 0 xfer 0
Nov 21 08:30:45 mcs7835-nas kernel: mps0: mpssas_alloc_tm freezing simq
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(10). CDB: 2a 00 00 04 03 00 00 01 00 00
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): CAM status: CCB request aborted by the host
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): Retrying command
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): WRITE(6). CDB: 0a 04 04 00 08 00
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): CAM status: CCB request aborted by the host
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): Retrying command
Nov 21 08:30:45 mcs7835-nas kernel: mps0: mpssas_remove_complete on handle 0x000c, IOCStatus= 0x0
Nov 21 08:30:45 mcs7835-nas kernel: mps0: mpssas_free_tm releasing simq
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): lost device - 2 outstanding, 3 refs
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): oustanding 1
Nov 21 08:30:45 mcs7835-nas kernel: (da2:mps0:0:10:0): oustanding 0
Nov 21 08:30:46 mcs7835-nas kernel: (da2:mps0:0:10:0): removing device entry
Yet interestingly enough, mere seconds later, the drive shows back up:

Code: Select all

Nov 21 08:30:49 mcs7835-nas kernel: da2 at mps0 bus 0 scbus0 target 10 lun 0
Nov 21 08:30:49 mcs7835-nas kernel: da2: <ATA SSDSA2SH032G1GN 8860> Fixed Direct Access SCSI-6 device
Nov 21 08:30:49 mcs7835-nas kernel: da2: 300.000MB/s transfers
Nov 21 08:30:49 mcs7835-nas kernel: da2: Command Queueing enabled
Nov 21 08:30:49 mcs7835-nas kernel: da2: 30517MB (62500000 512 byte sectors: 255H 63S/T 3890C)
Nov 21 08:30:49 mcs7835-nas kernel: ses0: pass2,da2: SAS Device Slot Element: 1 Phys at Slot 0
Nov 21 08:30:49 mcs7835-nas kernel: ses0:  phy 0: SATA device
Nov 21 08:30:49 mcs7835-nas kernel: ses0:  phy 0: parent 5001438022fbc726 addr 5001438022fbc70a
Needless to say, I have to remove and re-add the disk as the log disk for the zpool, and it runs fine again for a random period of time (sometimes days, sometimes weeks) before it drops back out again.

I am running 9.2.0.1(rev 972).

Controller is an LSI9211-8i with IT firmware for JBOD:

Code: Select all

mps0: <LSI SAS2008> port 0x4000-0x40ff mem 0xfbef0000-0xfbef3fff,0xfbe80000-0xfbebffff irq 26 at device 0.0 on pci20
mps0: Firmware: 19.00.00.00, Driver: 14.00.00.01-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
Ideas what I can do to get this resolved? I really hate having my ZIL disk randomly disappearing.

Re: Drive timeout and removal (LSI9211 & Intel X25-E)

Posted: 01 Dec 2014 06:41
by b0ssman
Which firmware version did you flash? I think the FreeBSD driver is on version p16


Sent from my iPhone using Tapatalk

Re: Drive timeout and removal (LSI9211 & Intel X25-E)

Posted: 04 Dec 2014 00:06
by ccie4526
Hmm, I thought that showed...
mps0: Firmware: 19.00.00.00, Driver: 14.00.00.01-fbsd

Re: Drive timeout and removal (LSI9211 & Intel X25-E)

Posted: 04 Dec 2014 10:20
by b0ssman
try updating to the 14 firmware then

Re: Drive timeout and removal (LSI9211 & Intel X25-E)

Posted: 05 Dec 2014 14:35
by ccie4526
Yesterday evening, the X25 went timeout *again*, and it borked all of the ESXi servers that had the iscsi target mounted. Took down the entire VM cluster for the umpteenth time. I'm done, I can't have production systems randomly dropping offline because of this. I just did a zpool remove of that drive from that array and am just going to run that array server without it henceforth.

I do have a new HP N54L that I'm going to build up as a home NAS (with N4F embedded), I'll move the X25 over to it and see what I can do at that point.

Re: Drive timeout and removal (LSI9211 & Intel X25-E)

Posted: 27 Dec 2014 20:17
by ccie4526
Just an update, got that N54L up and running with the X25 as a log drive in that machine, and no issues with timeout/removal thus far.