*New 12.1 series Release:
2019-11-08: XigmaNAS 12.1.0.4.7091 - released!

*New 11.3 series Release:
2019-10-19: XigmaNAS 11.3.0.4.7014 - released


We really need "Your" help on XigmaNAS https://translations.launchpad.net/xigmanas translations. Please help today!

Producing and hosting XigmaNAS costs money. Please consider donating for our project so that we can continue to offer you the best.
We need your support! eg: PAYPAL

[SOLVED] ZFS can't mount file system any more- help!

Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
jjbinx
NewUser
NewUser
Posts: 8
Joined: 28 Jan 2013 22:27
Status: Offline

[SOLVED] ZFS can't mount file system any more- help!

#1

Post by jjbinx » 28 Jan 2013 22:41

I've been happily running NAS4Free 9.1.0.1 - Sandstorm for about a month now. Everything was working fine until yesterday.

Here's how it is setup:

HP Proliant server with a 250gb main drive which it boots from.
3 x 3TB Seagate harddrives (ST3000DM001-1CH166) named for the system as ada1, ada2, ada3
8GB RAM (replacing the 2GB which came with it)
Raidz1 (which I assume is the equivalent of RAID5)
ZFS

Yesterday the drive was no longer accessible. I am now getting this error message under Disks|ZFS|Pools|Management:

Health: FAULTED

Code: Select all

 pool: Lisi
 state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
	replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

	NAME                      STATE     READ WRITE CKSUM
	Lisi                      UNAVAIL      0     0     0
	  raidz1-0                UNAVAIL      0     0     0
	    4840076910066282584   UNAVAIL      0     0     0  was /dev/ada1
	    raid5/Miles           ONLINE       0     0     0
	    14966468137362153201  UNAVAIL      0     0     0  was /dev/ada3
I'm unable to figure out how to solve the problem. Below is the complete output from DMESG:

Code: Select all

$ dmesg
Copyright (c) 1992-2012 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.1-RELEASE #0 r244224M: Fri Dec 14 19:53:48 JST 2012
    aoyama@nas4free.local:/usr/obj/nas4free/usr/src/sys/NAS4FREE-amd64 amd64
CPU: AMD Turion(tm) II Neo N40L Dual-Core Processor (1497.54-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x100f63  Family = 10  Model = 6  Stepping = 3
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x837ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,NodeId>
  TSC: P-state invariant
real memory  = 4294967296 (4096 MB)
avail memory = 3961733120 (3778 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <HP     ProLiant>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0 <Version 2.1> irqs 0-23 on motherboard
kbd1 at kbdmux0
cryptosoft0: <software crypto> on motherboard
acpi0: <HP ProLiant> on motherboard
acpi0: Power Button (fixed)
acpi0: reservation of fee00000, 1000 (3) failed
acpi0: reservation of ffb80000, 80000 (3) failed
acpi0: reservation of fec10000, 20 (3) failed
acpi0: reservation of fed80000, 1000 (3) failed
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, d7f00000 (3) failed
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 550
Event timer "HPET1" frequency 14318180 Hz quality 450
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
vgapci0: <VGA-compatible display> port 0xe000-0xe0ff mem 0xf0000000-0xf7ffffff,0xfe8f0000-0xfe8fffff,0xfe700000-0xfe7fffff irq 18 at device 5.0 on pci1
pcib2: <ACPI PCI-PCI bridge> irq 18 at device 6.0 on pci0
pci2: <ACPI PCI bus> on pcib2
bge0: <HP NC107i PCIe Gigabit Server Adapter, ASIC rev. 0x5784100> mem 0xfe9f0000-0xfe9fffff irq 18 at device 0.0 on pci2
bge0: CHIP ID 0x05784100; ASIC REV 0x5784; CHIP REV 0x57841; PCI-E
miibus0: <MII bus> on bge0
brgphy0: <BCM5784 10/100/1000baseT PHY> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge0: Ethernet address: 00:9c:02:aa:67:15
ahci0: <ATI IXP700 AHCI SATA controller> port 0xd000-0xd007,0xc000-0xc003,0xb000-0xb007,0xa000-0xa003,0x9000-0x900f mem 0xfe6ffc00-0xfe6fffff irq 19 at device 17.0 on pci0
ahci0: AHCI v1.20 with 4 3Gbps ports, Port Multiplier supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ohci0: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xfe6fe000-0xfe6fefff irq 18 at device 18.0 on pci0
usbus0 on ohci0
ehci0: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xfe6ff800-0xfe6ff8ff irq 17 at device 18.2 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
ohci1: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xfe6fd000-0xfe6fdfff irq 18 at device 19.0 on pci0
usbus2 on ohci1
ehci1: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xfe6ff400-0xfe6ff4ff irq 17 at device 19.2 on pci0
usbus3: EHCI version 1.0
usbus3 on ehci1
pci0: <serial bus, SMBus> at device 20.0 (no driver attached)
atapci0: <ATI IXP700/800 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
isab0: <PCI-ISA bridge> at device 20.3 on pci0
isa0: <ISA bus> on isab0
pcib3: <ACPI PCI-PCI bridge> at device 20.4 on pci0
pci3: <ACPI PCI bus> on pcib3
ohci2: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xfe6fc000-0xfe6fcfff irq 18 at device 22.0 on pci0
usbus4 on ohci2
ehci2: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xfe6ff000-0xfe6ff0ff irq 17 at device 22.2 on pci0
usbus5: EHCI version 1.0
usbus5 on ehci2
amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb4
acpi_button0: <Power Button> on acpi0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: cannot reserve I/O port range
ctl: CAM Target Layer loaded
acpi_throttle0: <ACPI CPU Throttling> on cpu0
hwpstate0: <Cool`n'Quiet 2.0> on cpu0
ZFS filesystem version 5
ZFS storage pool version 28
Timecounters tick every 10.000 msec
ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding disabled, default to accept, logging disabled
iSCSI boot driver version 0.2.6
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 480Mbps High Speed USB v2.0
usbus2: 12Mbps Full Speed USB v1.0
usbus3: 480Mbps High Speed USB v2.0
usbus4: 12Mbps Full Speed USB v1.0
usbus5: 480Mbps High Speed USB v2.0
ugen0.1: <ATI> at usbus0
uhub0: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <ATI> at usbus1
uhub1: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
ugen2.1: <ATI> at usbus2
uhub2: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen3.1: <ATI> at usbus3
uhub3: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
ugen4.1: <ATI> at usbus4
uhub4: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
ugen5.1: <ATI> at usbus5
uhub5: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus5
(aprobe0:ahcich0:0:0:0): SETFEATURES ENABLE SATA FEATURE. ACB: ef 10 00 00 00 40 00 00 00 00 02 00
(aprobe0:ahcich0:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ahcich0:0:0:0): RES: 51 04 00 00 00 40 00 00 00 02 00
(aprobe0:ahcich0:0:0:0): Retrying command
(aprobe0:ahcich0:0:0:0): SETFEATURES ENABLE SATA FEATURE. ACB: ef 10 00 00 00 40 00 00 00 00 02 00
(aprobe0:ahcich0:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ahcich0:0:0:0): RES: 51 04 00 00 00 40 00 00 00 02 00
(aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
uhub4: 4 ports with 4 removable, self powered
uhub0: 5 ports with 5 removable, self powered
uhub2: 5 ports with 5 removable, self powered
uhub5: 4 ports with 4 removable, self powered
uhub1: 5 ports with 5 removable, self powered
uhub3: 5 ports with 5 removable, self powered
ugen0.2: <vendor 0x1267> at usbus0
ums0: <vendor 0x1267 PS2+USB Mouse, class 0/0, rev 1.10/0.01, addr 2> on usbus0
ums0: 3 buttons and [XYZ] coordinates ID=0
ugen0.3: <Logitech> at usbus0
ukbd0: <Logitech USB Multimedia Keyboard, class 0/0, rev 1.10/0.70, addr 3> on usbus0
kbd0 at ukbd0
uhid0: <Logitech USB Multimedia Keyboard, class 0/0, rev 1.10/0.70, addr 3> on usbus0
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <VB0250EAVER HPG7> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 238475MB (488397168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <ST3000DM001-1CH166 CC24> ATA-8 SATA 3.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <ST3000DM001-1CH166 CC24> ATA-8 SATA 3.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada2: Previously was known as ad8
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <ST3000DM001-1CH166 CC24> ATA-8 SATA 3.x device
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada3: Previously was known as ad10
SMP: AP CPU #1 Launched!
Timecounter "TSC-low" frequency 11699497 Hz quality 800
Trying to mount root from ufs:/dev/ufsid/50e0efb5bfee6516 [rw]...
GEOM_RAID5: registered shutdown event handler.
GEOM_RAID5: Miles: device created (stripesize=131072).
GEOM_RAID5: Miles: ada3(2): disk attached.
GEOM_RAID5: Miles: ada2(1): disk attached.
GEOM_RAID5: Miles: ada1(0): disk attached.
GEOM_RAID5: Miles: activated.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada3.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada3.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada3.
pid 1836 (syslogd), uid 0: exited on signal 11
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada3.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada3.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada3.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada3.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada3.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada3.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada3.
ZFS WARNING: Unable to attach to ada1.
ZFS WARNING: Unable to attach to ada3.
I'm worried now that I've lost the contents of the drives. Any assistance would be much appreciated!
Last edited by al562 on 31 Jan 2013 05:22, edited 1 time in total.
Reason: Added [SOLVED] Tag.

al562
Advanced User
Advanced User
Posts: 221
Joined: 12 Dec 2012 08:02
Location: New Jersey, U.S.A.
Contact:
Status: Offline

Re: ZFS can't mount file system any more- help!

#2

Post by al562 » 28 Jan 2013 23:05

Hi Jjbinx,

I'd like to help, but I need to know the whole story.
It's obvious from the information you've posted these drives have been moved around.
It looks like you have a working SoftRAID5 using some of the drives you'd need for your pool.
Without knowing exactly what you did and why it will be almost impossible to recover your data.
Please post more information.

Thanks,
Al

jjbinx
NewUser
NewUser
Posts: 8
Joined: 28 Jan 2013 22:27
Status: Offline

Re: ZFS can't mount file system any more- help!

#3

Post by jjbinx » 28 Jan 2013 23:10

Hi Al,

The drives had not been moved around at all. The server sits in a Comms cabinet, it was working one day and had stopped the next. Just before I posted on here I did remove the drives to remove one of the memory sticks in the system in case that had something to do with it but the drives were put back in the same order, and the error message remains the same.

Interestingly when I go into Status | Disks all the drives appear to be online and working and in the same order:

Code: Select all

Disk	Size	Description	Device model	Serial number	File system	I/O statistics	Temperature	Status
ada0	238476MB	VB0250EAVER HPG7 	VB0250EAVER 	Z2AXB6A5 	UFS 	115.20 KiB/t, 6 tps, 0.63 MiB/s 	35 °C 	ONLINE 
ada1	2861589MB	ada1 	ST3000DM001-1CH166 	W1F1QT4P 	ZFS storage pool device 	83.20 KiB/t, 0 tps, 0.02 MiB/s 	35 °C 	ONLINE 
ada2	2861589MB	ada2 	ST3000DM001-1CH166 	W1F1PBV6 	ZFS storage pool device 	106.78 KiB/t, 0 tps, 0.03 MiB/s 	33 °C 	ONLINE 
ada3	2861589MB	ada3 	ST3000DM001-1CH166 	W1F1TA1C 	ZFS storage pool device 	121.25 KiB/t, 0 tps, 0.03 MiB/s 	32 °C 	ONLINE 
Miles	5723177MB	Software RAID 	n/a 	n/a 	ZFS storage pool device 	n/a 	n/a 	COMPLETE
If there's anything else you need to see please ask, I'll be here for some time :(

jjbinx
NewUser
NewUser
Posts: 8
Joined: 28 Jan 2013 22:27
Status: Offline

Re: ZFS can't mount file system any more- help!

#4

Post by jjbinx » 28 Jan 2013 23:38

I've just realised something - there was a power failure at some point over the weekend.

al562
Advanced User
Advanced User
Posts: 221
Joined: 12 Dec 2012 08:02
Location: New Jersey, U.S.A.
Contact:
Status: Offline

Re: ZFS can't mount file system any more- help!

#5

Post by al562 » 29 Jan 2013 00:22

Hi Jjbinx,

OK, you posted about the power failure as I was typing this. That makes it even more important to answer the questions below, especially about what happened immediately after booting from the power failure.
jjbinx wrote:The drives had not been moved around at all.
This server was not setup a month ago and not touched. Your dmesg indicates the drives have been moved big time:

Code: Select all

ada0: Previously was known as ad4
ada1: Previously was known as ad6
ada2: Previously was known as ad8
ada3: Previously was known as ad10
This indicates a Motherboard/controller addition/change to me.
jjbinx wrote:I did remove the drives . . . . but the drives were put back in the same order,
OK, I'll take your word for it, but those drives are not in the same state they were in when the server was first built. So someone else must have changed things around.

Interestingly your dmesg shows a SoftRAID5 apparently built with the same members of your pool. I had no idea this was even possible nor can I think of a way to deliberately do it.

Code: Select all

    GEOM_RAID5: Miles: device created (stripesize=131072).
    GEOM_RAID5: Miles: ada3(2): disk attached.
    GEOM_RAID5: Miles: ada2(1): disk attached.
    GEOM_RAID5: Miles: ada1(0): disk attached.
    GEOM_RAID5: Miles: activated.
    ZFS WARNING: Unable to attach to ada1.
    ZFS WARNING: Unable to attach to ada3.
    ZFS WARNING: Unable to attach to ada1.
    ZFS WARNING: Unable to attach to ada3.
    ZFS WARNING: Unable to attach to ada1.
    ZFS WARNING: Unable to attach to ada3.
When I build a SoftRAID5 it does not look like this. When I build a Raidz it does not look like this.

On top of all that it looks like you have at least one bad drive, or cable, connector, port, controller, possibly BIOS settings issue:

Code: Select all

    (aprobe0:ahcich0:0:0:0): SETFEATURES ENABLE SATA FEATURE. ACB: ef 10 00 00 00 40 00 00 00 00 02 00
    (aprobe0:ahcich0:0:0:0): CAM status: ATA Status Error
    (aprobe0:ahcich0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
    (aprobe0:ahcich0:0:0:0): RES: 51 04 00 00 00 40 00 00 00 02 00
    (aprobe0:ahcich0:0:0:0): Retrying command
    (aprobe0:ahcich0:0:0:0): SETFEATURES ENABLE SATA FEATURE. ACB: ef 10 00 00 00 40 00 00 00 00 02 00
    (aprobe0:ahcich0:0:0:0): CAM status: ATA Status Error
    (aprobe0:ahcich0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
    (aprobe0:ahcich0:0:0:0): RES: 51 04 00 00 00 40 00 00 00 02 00
    (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
If this is your OS drive it is a big problem.
jjbinx wrote: If there's anything else you need to see please ask
We need to know exactly what you did and why. I'm not sure you understand the amount of effort it will take to troubleshoot something like this so allow me to be more specific.
jjbinx wrote:it was working one day and had stopped the next
So you went home one night and it was running. You came back in the next day and it was turned off? What exactly happened? What state did you find it in? If it was still running, were there any errors on the screen? If it was still running how did you reset it? Did you cold boot it? What errors appeared on the screen when you cold booted it? Where there any errors in the log?
jjbinx wrote:to remove one of the memory sticks in the system in case that had something to do with it
Why did you suspect RAM? What troubleshooting did you do that indicated RAM was a problem? So it had 8GB of RAM, how much did you remove and how much is it running with now? Where you always getting those CAM errors or are they new?

What did you configure? A RAIDz or a SoftRAID5? Where any changes in the configuration made later? If so , what? As far as I can tell right now, you have a possibly fixable SoftRAID5 and a Raidz with 2 drives out of three failed which means it's about as recoverable as the space shuttle Challenger. Unless we have some more information that can better explain what you've posted I am not sure what to try to recover your data.

Regards,
Al

jjbinx
NewUser
NewUser
Posts: 8
Joined: 28 Jan 2013 22:27
Status: Offline

Re: ZFS can't mount file system any more- help!

#6

Post by jjbinx » 29 Jan 2013 01:00

Hi Al,

Thanks for your continued patience and very comprehensive reply. I'll try to answer all of your questions/points one by one, please let me know if I miss anything out.

The server belongs to a friend of mine, I work in IT but have little or no experience with BSD or NAS devices.

When we first installed NAS4Free we tried to build a RAID5 setup but had no joy in getting it to work. We did a bit of reading online and realised we should have configured a ZFS-based Raidz1 which we assumed would be similar to and superior to RAID5. We thought we had deleted all traces of our original RAID5 setup and proceeded to configure the server as Raidz1 using the 3 x 3TB drives. Apparently some of those RAID5 settings are lurking. It hadn't noticeably affected the server until this point. We set up the file structure under the ZFS Raidz1 pool we called Lisi.

Code: Select all

ada0: Previously was known as ad4
ada1: Previously was known as ad6
ada2: Previously was known as ad8
ada3: Previously was known as ad10
We decided to call the drives ada1, ada2, and ada3 using the config/web interface. Since NAS4free has been installed the drives have never been removed until today. The server is kept in a locked Comms cabinet and the door of the microserver has got a lock on it. We only removed the drives to get to the motherboard so we could remove a stick of RAM (we also took out a redundant graphics card).

The reason why we removed the RAM is because we looked at the Dmesg log and found a mention of "pid 1836 (syslogd), uid 0: exited on signal 11" which a friend had suggested could be caused by faulty memory (a segmentation fault).

The server had 8gb of Ram and now has 4gb, although if the cause of this is not memory related we'd rather install the memory and bring it back to 8gb again.

We only know that there was a power failure because in the same room is an Acer NAS that has to be switched on manually after a power cut. The HP server (this one) had restarted itself but we didn't know it had lost the RAID until we tried to access the drive some time later.

Code: Select all

(aprobe0:ahcich0:0:0:0): SETFEATURES ENABLE SATA FEATURE. ACB: ef 10 00 00 00 40 00 00 00 00 02 00
    (aprobe0:ahcich0:0:0:0): CAM status: ATA Status Error
    (aprobe0:ahcich0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
    (aprobe0:ahcich0:0:0:0): RES: 51 04 00 00 00 40 00 00 00 02 00
    (aprobe0:ahcich0:0:0:0): Retrying command
    (aprobe0:ahcich0:0:0:0): SETFEATURES ENABLE SATA FEATURE. ACB: ef 10 00 00 00 40 00 00 00 00 02 00
    (aprobe0:ahcich0:0:0:0): CAM status: ATA Status Error
    (aprobe0:ahcich0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
    (aprobe0:ahcich0:0:0:0): RES: 51 04 00 00 00 40 00 00 00 02 00
    (aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
This bit is baffling. The server boots just fine from the smaller harddrive, we can access the web interface ok. The server type is an HP Proliant Turion ii n40l microserver http://h18004.www1.hp.com/products/quic ... 6_div.HTML

The BIOS indicates the SATA mode is set to AHCI - is that the correct setting? I've no idea if these error messages have always been present as it never occurred to us to check dmesg until we had the recent problems.

There definitely has not been a change of motherboard or controller. It's pretty much straight out of the box, except for a memory upgrade, the additional harddrives and a graphics card which we realised was not needed so has now been removed.

The harddrives all appear to be working and also appear to be recognised by the system, hence our suspicion that something had gone wrong with the ZFS RAID setup.

al562
Advanced User
Advanced User
Posts: 221
Joined: 12 Dec 2012 08:02
Location: New Jersey, U.S.A.
Contact:
Status: Offline

Re: ZFS can't mount file system any more- help!

#7

Post by al562 » 29 Jan 2013 05:00

Hi Jjbinx,

OK, I can see how the data you've posted matches what you say you did. I can't recreate it exactly.
I hope you have a backup of the data since I doubt there's any other way to recover what was there.

It looks like you somehow managed to create a ZFS pool with RAID drives. This pool now shows 2 disks are gone. If you could somehow replace those 2 disks the pool would be OK. Problem is, it appears the 2 disks are there but not recognized. Furthermore it appears all disks are recognized as part of a SoftRAID5. If you try to fix the pool, you will probably break the RAID5. I have consulted with some of the other moderators and they have not come up with any miracles yet.

If I were you I'd rebuild the server from scratch making sure there were no errors and all the drives and RAM are good. I'd build a standard Raidz and restore my data from backup.

As far as I am concerned any advise I give you has the potential of destroying more data than it will recover. I am happy to give it to you, but only as long as you agree/realize there is almost no chance of getting this to work again.

You can wait a day or two and see if anyone else comes up with a better idea.

Let me know how you'd like to proceed.

Regards,
Al

User avatar
raulfg3
Site Admin
Site Admin
Posts: 4978
Joined: 22 Jun 2012 22:13
Location: Madrid (ESPAÑA)
Contact:
Status: Offline

Re: ZFS can't mount file system any more- help!

#8

Post by raulfg3 » 29 Jan 2013 07:59

you need to revise that your BIOS are properly config and DO NOT HAVE any BIOS (soft) RAID config.

Do you say that Disk are config as AHCI, this is correct, but perhaps there are some other parameter that is refer to soft RAID in BIOS.
12.0.0.4 (revision 6766)+OBI on SUPERMICRO X8SIL-F 8GB of ECC RAM, 12x3TB disk in 3 vdev in RaidZ1 = 32TB Raw size only 22TB usable

Wiki
Last changes

jjbinx
NewUser
NewUser
Posts: 8
Joined: 28 Jan 2013 22:27
Status: Offline

Re: ZFS can't mount file system any more- help!

#9

Post by jjbinx » 29 Jan 2013 17:32

al562 wrote:Hi Jjbinx,
As far as I am concerned any advise I give you has the potential of destroying more data than it will recover. I am happy to give it to you, but only as long as you agree/realize there is almost no chance of getting this to work again.
Al
Thanks, Al. I think we're pretty much at that state right now. Fortunately we do have some of the most crucial data backed up elsewhere, but there's still plenty of data on the drives that we'd like to get back if at all possible. I think we're at the stage of "Let's try something more radical and if doesn't work then we're no worse off".

One random idea we had, which probably won't work, was: could we somehow delete the Raid and then re-add/import it, or does the act of deleting the Raid also permanently delete all the data associated with it? Would it be possible to install a different OS on the machine and see if we could somehow read the data off the drives?

I'm not at the machine at the moment, so I'll double-check later, but to the best of my knowledge the BIOS is set to AHCI and not RAID.

So, happy to try out any ideas you might have.

Many thanks

fsbruva
Advanced User
Advanced User
Posts: 383
Joined: 21 Sep 2012 14:50
Status: Offline

Re: ZFS can't mount file system any more- help!

#10

Post by fsbruva » 29 Jan 2013 17:48

As far as you know, the zfs pool was the recipient of the data, correct? Additionally, can you download the config and find both the zfs and softraid sections?

To the experts:
Could it be as simple as using dd to wipe only the GEOM metadata? It appears to me that the GEOM raid is getting populated earlier in the boot sequence than the zfs pool, possibly because there is still metadata there? Or enough that it thinks the array is present?

al562
Advanced User
Advanced User
Posts: 221
Joined: 12 Dec 2012 08:02
Location: New Jersey, U.S.A.
Contact:
Status: Offline

Re: ZFS can't mount file system any more- help!

#11

Post by al562 » 29 Jan 2013 19:07

Hi Guys,

@ Raulfg3 - I agree, the only way I can think this would happen is if a RAID controller provided multiple instances of the drives. I certainly have not been able to reproduce this, but I have no RAID controller available. I see an ATI SATA (4ports) controller and an ATI PATA controller (2 ports) in the dmesg, these look like they built into the MB.

@ Fsbruba -
fsbruva wrote:the zfs pool was the recipient of the data, correct? Additionally, can you download the config and find both the zfs and softraid sections?
This is my understanding too. That may be a good way to approach this, especially if we assume a RAID controller provided duplicate drives. If this is the case then SoftRAID Metadata was written first. Part of that Metadata had to be overwritten/corrupted when the ZFS pool was created. In this scenario the pool would theoretically be where the data resides, therefore if we can recover the pool we may be able to recover the data.
fsbruva wrote:Could it be as simple as using dd to wipe only the GEOM metadata?
There is a possibility, but I would employ all means possible that do not write the drives first. There is too much risk of doing more damage, especially if it is a Hardware RAID. This line of thought does lead to some avenues I had not explored.

@ Jjbinx - Can you tell us if you are sure the RAID5 is SoftRAID and not Hardware RAID? I'd like to make sure you created it using the WebGUI and not the controller's firmware. Please go to the WebGUI and get a status.php. Like this:

Code: Select all

http://nnn.nnn.nnn.nnn/index.php
Replace "index" with "status" in your address bar:

Code: Select all

http://nnn.nnn.nnn.nnn/status.php
then press enter.
You will get a complete report on the system. Save it as "html complete, or html single page" then Zip it and attach to your next post please. We are going to want to see what your config looks like.

@ Raulfg3 & Fsbruba - Remember that the OS drive may be generating errors and the damage could be the result of both this and the power outage.
I am thinking it is worth booting from LiveCD or LiveUSB, this removes the config and OS drive from the equation and may allow an attempt at repairing the pool. The question I need you guys to think about is how to try and recover the pool if it appears possible, remember it now looks like 2 drives are toast.

@ Jjbinx - After you get the information requested above, please try to boot the server from LiveCD or Live USB and then post the results of the following:

Code: Select all

zpool list
zpool status
zpool history Lisi
Give us some time to review these results and recommend further actions. Let us know if there are problems booting from LiveCD/USB.

Regards,
Al

fsbruva
Advanced User
Advanced User
Posts: 383
Joined: 21 Sep 2012 14:50
Status: Offline

Re: ZFS can't mount file system any more- help!

#12

Post by fsbruva » 30 Jan 2013 14:53

al562 wrote: There is a possibility, but I would employ all means possible that do not write the drives first. There is too much risk of doing more damage, especially if it is a Hardware RAID.
Well, complete deletion of the GEOM data is not necessarily what I meant. If there were a safe place to write data, dd could be used to make a backup of the GEOM data (I do this every time I have been pressed into a corner and need to wipe out a partition table on my Macbook). So, if a USB drive could be mounted, then the dd operation could have if= the GEOM sectors, and of= some file. Then, another dd operation could have if=/dev/zero and of= the GEOM sectors. This way, the writing of zeros can be undone.
al562 wrote: @ Raulfg3 & Fsbruba - Remember that the OS drive may be generating errors and the damage could be the result of both this and the power outage.
I am thinking it is worth booting from LiveCD or LiveUSB, this removes the config and OS drive from the equation and may allow an attempt at repairing the pool. The question I need you guys to think about is how to try and recover the pool if it appears possible, remember it now looks like 2 drives are toast.
I agree with the LiveCD suggestion. Furthermore, a dmesg dump from the live media would certainly help with narrowing down problem areas. However, if the boot process of the live cd results in similar behavior, I also wonder if we might be able to add the disks safely after boot. Given that the NAS was able to function when it thought it had a zfs pool, I think the "devices could not be opened" is a result of graid5 getting a hold the of disks earlier than zfs can.

al562
Advanced User
Advanced User
Posts: 221
Joined: 12 Dec 2012 08:02
Location: New Jersey, U.S.A.
Contact:
Status: Offline

Re: ZFS can't mount file system any more- help!

#13

Post by al562 » 30 Jan 2013 17:16

Hi Guys,
fsbruva wrote:if a USB drive could be mounted, then the dd operation could have if= the GEOM sectors, and of= some file. Then, another dd operation could have if=/dev/zero and of= the GEOM sectors. This way, the writing of zeros can be undone.
If Jjbinx is comfortable doing this then I would try it after we see the results of booting with LiveCD/USB. I've never attempted to erase just the GEOM metadata, but the good thing is that it's only in the last sector while ZFS writes 4 copies (2 in the first and 2 in the last). It would be best to determine the exact location of the GEOM data and only backup/remove that, but it could be as easy as just wiping out the last sector, ZFS should be capable of surviving that and still find the disks and use them.
Let's wait and see what the results are just booting LiveCD/USB.

Regards,
Al

fsbruva
Advanced User
Advanced User
Posts: 383
Joined: 21 Sep 2012 14:50
Status: Offline

Re: ZFS can't mount file system any more- help!

#14

Post by fsbruva » 30 Jan 2013 19:13

I think GEOM data is in the last sector, from what I can read of the geom_raid5.c file:
"This is the automatic method, where metadata are stored in every device's last sector."

@JJbinx, what are the results of a liveCD boot?

jjbinx
NewUser
NewUser
Posts: 8
Joined: 28 Jan 2013 22:27
Status: Offline

Re: ZFS can't mount file system any more- help!

#15

Post by jjbinx » 30 Jan 2013 21:46

Hi everyone,

Here is the file you requested - status report in .zip format.

I didn't use a live CD, as such, I installed the OS using a memory stick. I'm just trying to locate it and I'll report back once I've found it. I'll respond to your other questions shortly too!

Regards
You do not have the required permissions to view the files attached to this post.

jjbinx
NewUser
NewUser
Posts: 8
Joined: 28 Jan 2013 22:27
Status: Offline

Re: ZFS can't mount file system any more- help!

#16

Post by jjbinx » 30 Jan 2013 22:31

Right, I've booted from the live CD/image via USB and then pressed option 6 to go into a shell.

nas4free:~# zpool list
no pools available

nas4free:~# zpool status
no pools available

nas4free:~# zpool history Lisi
Cannot open 'Lisi': no such pool

I assume I'm getting these because I booted from the USB and it hasn't mounted the drives?

Here is the output from dmesg while it was booted from the live image (see attached).

The BIOS is still set to AHCI (not RAID). The only thing we did change is to disable ACPI in the hopes that those errors would go, but they haven't.
You do not have the required permissions to view the files attached to this post.

jjbinx
NewUser
NewUser
Posts: 8
Joined: 28 Jan 2013 22:27
Status: Offline

Re: ZFS can't mount file system any more- help!

#17

Post by jjbinx » 30 Jan 2013 23:19

Hi, this is 'The Friend' here who looked at the logic of the Shut Down Messages which said something like

GEOM_RAID5: Delete MILES raid
GEOM_RAID5: Destroy MILES raid

So I assumed the Dataset for that RAID (which did not have any files in it) was very small and so the GEOM_RAID5 on Start UP was recreating the MILES raid and grabbing the 3 HDDs names ada1, ada2, ada3.

So, being the owner of the Server, I decided with a little trepidation to go into the WebGui

|Disks|Software RAID|RAID5|Management

And I decided to Delete the MILES (named) raid and crossed all my fingers and toes.

We then rebooted the Server and held our breath.

Guess what happened - the zfs raidz1 pool, Lisi was restored and we can access the files in the Share and just for good measure played the well known popular song by Sir Cliff RIchard called 'Congratulations' - well no, we didn't do the last bit - I hate Cliff Richard !!!!

Anyhow, the system now seems to be working. What seems to have happened is that when we set everything up on 30 December 2012 it turns out, the server seemed to be 'latched onto' the zfs Lisi pool but when there was a power cut and the server was rebooted, it seems to have seen the RAID5 instructions and allocated those drives to the MILES raid5 pool instead so when the zfs instructions were called, there were no ada1,2 or 3 drives to allocate.

So, we have a solution but for completion, I will now pass you onto jjbinks to give you the revised dmesg file.

Thanks for all your help

Nikki
x


Here's the output from dmesg now the system is working again:

Code: Select all

Copyright (c) 1992-2012 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.1-RELEASE #0 r244224M: Fri Dec 14 19:53:48 JST 2012
    aoyama@nas4free.local:/usr/obj/nas4free/usr/src/sys/NAS4FREE-amd64 amd64
CPU: AMD Turion(tm) II Neo N40L Dual-Core Processor (1497.53-MHz K8-class CPU)
  Origin = "AuthenticAMD"  Id = 0x100f63  Family = 10  Model = 6  Stepping = 3
  Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT>
  Features2=0x802009<SSE3,MON,CX16,POPCNT>
  AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>
  AMD Features2=0x837ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS,SKINIT,WDT,NodeId>
  TSC: P-state invariant
real memory  = 4294967296 (4096 MB)
avail memory = 3961733120 (3778 MB)
Event timer "LAPIC" quality 400
ACPI APIC Table: <HP     ProLiant>
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s)
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
ioapic0 <Version 2.1> irqs 0-23 on motherboard
kbd1 at kbdmux0
cryptosoft0: <software crypto> on motherboard
acpi0: <HP ProLiant> on motherboard
acpi0: Power Button (fixed)
acpi0: reservation of fee00000, 1000 (3) failed
acpi0: reservation of ffb80000, 80000 (3) failed
acpi0: reservation of fec10000, 20 (3) failed
acpi0: reservation of fed80000, 1000 (3) failed
acpi0: reservation of 0, a0000 (3) failed
acpi0: reservation of 100000, d7f00000 (3) failed
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0
Timecounter "i8254" frequency 1193182 Hz quality 0
Event timer "i8254" frequency 1193182 Hz quality 100
atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0
Event timer "RTC" frequency 32768 Hz quality 0
hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0
Timecounter "HPET" frequency 14318180 Hz quality 950
Event timer "HPET" frequency 14318180 Hz quality 550
Event timer "HPET1" frequency 14318180 Hz quality 450
Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
acpi_timer0: <32-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
vgapci0: <VGA-compatible display> port 0xe000-0xe0ff mem 0xf0000000-0xf7ffffff,0xfe8f0000-0xfe8fffff,0xfe700000-0xfe7fffff irq 18 at device 5.0 on pci1
pcib2: <ACPI PCI-PCI bridge> irq 18 at device 6.0 on pci0
pci2: <ACPI PCI bus> on pcib2
bge0: <HP NC107i PCIe Gigabit Server Adapter, ASIC rev. 0x5784100> mem 0xfe9f0000-0xfe9fffff irq 18 at device 0.0 on pci2
bge0: CHIP ID 0x05784100; ASIC REV 0x5784; CHIP REV 0x57841; PCI-E
miibus0: <MII bus> on bge0
brgphy0: <BCM5784 10/100/1000baseT PHY> PHY 1 on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge0: Ethernet address: 00:9c:02:aa:67:15
ahci0: <ATI IXP700 AHCI SATA controller> port 0xd000-0xd007,0xc000-0xc003,0xb000-0xb007,0xa000-0xa003,0x9000-0x900f mem 0xfe6ffc00-0xfe6fffff irq 19 at device 17.0 on pci0
ahci0: AHCI v1.20 with 4 3Gbps ports, Port Multiplier supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ohci0: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xfe6fe000-0xfe6fefff irq 18 at device 18.0 on pci0
usbus0 on ohci0
ehci0: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xfe6ff800-0xfe6ff8ff irq 17 at device 18.2 on pci0
usbus1: EHCI version 1.0
usbus1 on ehci0
ohci1: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xfe6fd000-0xfe6fdfff irq 18 at device 19.0 on pci0
usbus2 on ohci1
ehci1: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xfe6ff400-0xfe6ff4ff irq 17 at device 19.2 on pci0
usbus3: EHCI version 1.0
usbus3 on ehci1
pci0: <serial bus, SMBus> at device 20.0 (no driver attached)
atapci0: <ATI IXP700/800 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xff00-0xff0f at device 20.1 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata1: <ATA channel> at channel 1 on atapci0
isab0: <PCI-ISA bridge> at device 20.3 on pci0
isa0: <ISA bus> on isab0
pcib3: <ACPI PCI-PCI bridge> at device 20.4 on pci0
pci3: <ACPI PCI bus> on pcib3
ohci2: <AMD SB7x0/SB8x0/SB9x0 USB controller> mem 0xfe6fc000-0xfe6fcfff irq 18 at device 22.0 on pci0
usbus4 on ohci2
ehci2: <AMD SB7x0/SB8x0/SB9x0 USB 2.0 controller> mem 0xfe6ff000-0xfe6ff0ff irq 17 at device 22.2 on pci0
usbus5: EHCI version 1.0
usbus5 on ehci2
amdtemp0: <AMD CPU On-Die Thermal Sensors> on hostb4
acpi_button0: <Power Button> on acpi0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: cannot reserve I/O port range
ctl: CAM Target Layer loaded
acpi_throttle0: <ACPI CPU Throttling> on cpu0
hwpstate0: <Cool`n'Quiet 2.0> on cpu0
ZFS filesystem version 5
ZFS storage pool version 28
Timecounters tick every 10.000 msec
ipfw2 (+ipv6) initialized, divert loadable, nat loadable, rule-based forwarding disabled, default to accept, logging disabled
iSCSI boot driver version 0.2.6
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 480Mbps High Speed USB v2.0
usbus2: 12Mbps Full Speed USB v1.0
usbus3: 480Mbps High Speed USB v2.0
usbus4: 12Mbps Full Speed USB v1.0
usbus5: 480Mbps High Speed USB v2.0
ugen0.1: <ATI> at usbus0
uhub0: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <ATI> at usbus1
uhub1: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
ugen2.1: <ATI> at usbus2
uhub2: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2
ugen3.1: <ATI> at usbus3
uhub3: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3
ugen4.1: <ATI> at usbus4
uhub4: <ATI OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4
ugen5.1: <ATI> at usbus5
uhub5: <ATI EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus5
(aprobe0:ahcich0:0:0:0): SETFEATURES ENABLE SATA FEATURE. ACB: ef 10 00 00 00 40 00 00 00 00 02 00
(aprobe0:ahcich0:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ahcich0:0:0:0): RES: 51 04 00 00 00 40 00 00 00 02 00
(aprobe0:ahcich0:0:0:0): Retrying command
(aprobe0:ahcich0:0:0:0): SETFEATURES ENABLE SATA FEATURE. ACB: ef 10 00 00 00 40 00 00 00 00 02 00
(aprobe0:ahcich0:0:0:0): CAM status: ATA Status Error
(aprobe0:ahcich0:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 04 (ABRT )
(aprobe0:ahcich0:0:0:0): RES: 51 04 00 00 00 40 00 00 00 02 00
(aprobe0:ahcich0:0:0:0): Error 5, Retries exhausted
uhub4: 4 ports with 4 removable, self powered
uhub0: 5 ports with 5 removable, self powered
uhub2: 5 ports with 5 removable, self powered
uhub5: 4 ports with 4 removable, self powered
uhub1: 5 ports with 5 removable, self powered
uhub3: 5 ports with 5 removable, self powered
ugen0.2: <vendor 0x1267> at usbus0
ums0: <vendor 0x1267 PS2+USB Mouse, class 0/0, rev 1.10/0.01, addr 2> on usbus0
ums0: 3 buttons and [XYZ] coordinates ID=0
ugen0.3: <Logitech> at usbus0
ukbd0: <Logitech USB Multimedia Keyboard, class 0/0, rev 1.10/0.70, addr 3> on usbus0
kbd0 at ukbd0
uhid0: <Logitech USB Multimedia Keyboard, class 0/0, rev 1.10/0.70, addr 3> on usbus0
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
ada0: <VB0250EAVER HPG7> ATA-8 SATA 2.x device
ada0: 300.000MB/s transfers (SATA 2.x, UDMA5, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 238475MB (488397168 512 byte sectors: 16H 63S/T 16383C)
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <ST3000DM001-1CH166 CC24> ATA-8 SATA 3.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada1: Previously was known as ad6
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <ST3000DM001-1CH166 CC24> ATA-8 SATA 3.x device
ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada2: Previously was known as ad8
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <ST3000DM001-1CH166 CC24> ATA-8 SATA 3.x device
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
ada3: Previously was known as ad10
SMP: AP CPU #1 Launched!
Timecounter "TSC-low" frequency 11699488 Hz quality 800
Trying to mount root from ufs:/dev/ufsid/50e0efb5bfee6516 [rw]...
pid 1829 (syslogd), uid 0: exited on signal 11
Interestingly syslogd is still exiting so there's no log files being generated, but that's more of a minor irritation compared to the problems we had before now. Any ideas on what could be causing this?

I'm going to re-add the extra 4gb of RAM and re-enable ACPI in the BIOS.

A huge huge thank you to everyone who responded, especially to Al on his patience and advice.

al562
Advanced User
Advanced User
Posts: 221
Joined: 12 Dec 2012 08:02
Location: New Jersey, U.S.A.
Contact:
Status: Offline

Re: ZFS can't mount file system any more- help!

#18

Post by al562 » 31 Jan 2013 04:59

Hi Jjbinx and Nikki,

First, very happy to hear you have your data back :D .
jjbinx wrote:GEOM_RAID5: Delete MILES raid
GEOM_RAID5: Destroy MILES raid
These messages at shutdown are normal, the RAID goes offline and is recreated again when the server is booted. I am surprised you guys just set it up and started using it without a reboot/burn in period.
jjbinx wrote:I decided to Delete the MILES (named) raid
This would try to remove GEOM metadata on the disks and so have the same effect as we were discussing earlier though using different means. If it failed we still could have tried dd, but I'm glad it wasn't necessary.
jjbinx wrote:Anyhow, the system now seems to be working.
That means it is now backup time. Please, please, please backup your data.

After the backup there are 3 things I recommend to make sure the server is bullet proof:
  1. Try to resolve the CAM errors from the log. These are probably related to your OS drive, Try replacing it, try a different cable, anything but do your best to get rid of them.
  2. Make sure you determine how your disk controller presents the drives to the OS. Some RAID controllers are capable of displaying the same device 2 different ways (this is the only way I can think it was possible for you to create the pool over the RAID). Determine which drives you want to use and ignore or remove the others.
  3. Since you've backed up your data, wipe your drives clean. Then properly create a new pool and copy your data back. Starting fresh like this will help minimize surprises i
jjbinx wrote:syslogd is still exiting so there's no log files being generated,
This should not be (is it possible the CAM errors are the cause), I don't recall ever seeing this be a problem. After you take care of the items listed above please start a new topic according to the Rules & Guidelines if this syslog issue continues.

Glad to help and nice to know you were able to fix it,
Al

Post Reply

Return to “ZFS (only!)”