- Lenovo RS110 with 8GB RAM, Dual Core Xeon
- ARC-1320-4i4X controller
- SansDigital TR4X+ Enclosure
- 4x Western Digital WD30EFRX drives
- 2x Western Digital WD1003FBYX drives
EXTERNAL is using ZRAID-2
INTERNAL is using ZFS mirror.
I have several ZFS volumes and datasets on each
What happens is that all of a sudden the volumes and datasets on the EXTERNAL will stop working. Any attempt to access them will result in the command hanging, including the web interface. The INTERNAL is unaffected and continues to work fine. The last items in the log are:
Code: Select all
Jul 28 15:55:49 <user.crit> storage kernel: arcsas: Completion Q Entry=0x30177, Slot No.=0x177, Status_Buff.Err_Info=0x00000000,01000000, INT status=0x1
Jul 28 15:55:49 <user.crit> storage kernel: Device 0x1 Task file error, Status Reg=0x51, Error Reg=0x40.
Jul 28 15:55:49 <user.crit> storage kernel: AbortReq reset command 0xffffff8139115720: Reset pPort(0x1) pCCB->EntryIndex(0x1) Slot(0x179)
Jul 28 15:55:49 <user.crit> storage kernel: arcsas_cmd_done: target=0x1, lun=0x0, SCSI Command=0x28,0x0,0x15,0x35,0xfd,0x10,0x0,0x0,0x38,0x0,cmd_status=0x208, scsi_status=0x0, ccb_status=0x6
Jul 28 15:55:49 <user.crit> storage kernel: AbortReq reset command 0xffffff8139191aa0: Reset pPort(0x1) pCCB->EntryIndex(0x1) Slot(0x17b)
Jul 28 15:55:49 <user.crit> storage kernel: arcsas_cmd_done: target=0x1, lun=0x0, SCSI Command=0x2a,0x0,0x7c,0x90,0x91,0xe0,0x0,0x0,0x8,0x0,cmd_status=0x208, scsi_status=0x0, ccb_status=0x6
Jul 28 15:55:49 <user.crit> storage kernel: AbortReq reset command 0xffffff81391aadc0: Reset pPort(0x1) pCCB->EntryIndex(0x1) Slot(0x180)
Jul 28 15:55:49 <user.crit> storage kernel: arcsas_cmd_done: target=0x1, lun=0x0, SCSI Command=0x2a,0x0,0x7c,0x90,0x91,0xd8,0x0,0x0,0x8,0x0,cmd_status=0x208, scsi_status=0x0, ccb_status=0x6
Jul 28 15:55:49 <user.crit> storage kernel: arcsas: Target=0x 1, lun=0, GONE!!!
Jul 28 15:55:49 <user.crit> storage kernel: (da1:arcsas0:0:1:0): lost device - 4 outstanding, 3 refs
Jul 28 15:55:49 <user.crit> storage kernel: (da1:arcsas0:0:1:0): oustanding 3
Jul 28 15:55:49 <user.crit> storage kernel: (da1:arcsas0:0:1:0): oustanding 2
Jul 28 15:55:49 <user.crit> storage kernel: (da1:arcsas0:0:1:0): oustanding 1
Jul 28 15:55:49 <user.crit> storage kernel: (da1:arcsas0:0:1:0): READ(10). CDB: 28 00 7c 8c 38 d0 00 00 28 00
Jul 28 15:55:49 <user.crit> storage kernel: (da1:arcsas0:0:1:0): CAM status: SCSI Status Error
Jul 28 15:55:49 <user.crit> storage kernel: (da1:arcsas0:0:1:0): SCSI status: Check Condition
Jul 28 15:55:49 <user.crit> storage kernel: (da1:arcsas0:0:1:0): SCSI sense: RECOVERED ERROR asc:0,0 (No additional sense information)
Jul 28 15:55:49 <user.crit> storage kernel: (da1:arcsas0:0:1:0): Info: 0x7c8c38d0
Jul 28 15:55:49 <user.crit> storage kernel: (da1:arcsas0:0:1:0): oustanding 0
Code: Select all
storage: ~ # zpool status
pool: external1
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: none requested
config:
NAME STATE READ WRITE CKSUM
external1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
da2 ONLINE 0 0 0
da0 ONLINE 0 0 0
da1 ONLINE 3 19 0
da3 ONLINE 0 0 0
errors: No known data errors
What could be happening?
How can I fix this without rebooting?


