This is the old XigmaNAS forum in read only mode,
it will taken offline by the end of march 2021!



I like to aks Users and Admins to rewrite/take over important post from here into the new fresh main forum!
Its not possible for us to export from here and import it to the main forum!

Dying SATA controller? (Ultra DMA CRC errors killing my raidZ2)

Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
m_seitz
Starter
Starter
Posts: 28
Joined: 01 Mar 2015 15:36
Status: Offline

Dying SATA controller? (Ultra DMA CRC errors killing my raidZ2)

Post by m_seitz »

My raidZ2 pool degraded after a disk suffered too many Ultra DMA CRC errors (118376!). To make sure that it was only a faulty cable, I took the disk offline and checked it in another PC. After that, I moved the disk back into my NAS (with a new cable) and set it online in the web interface. ZFS started resilvering the disk right away.
What puzzled me was the sheer speed of resilvering. My NAS consists of 6x 4 TB disks in one pool that is at 78 % capacity. I expected the resilvering to write 78 % of data to the "replaced" disk. Instead I got this:

Code: Select all

 scan: resilvered 16.6G in 0h56m with 0 errors on Thu Nov 19 19:54:49 2015
The web interface showed a throughput of 1 GB/s, probably not taking into account that only 16.6 GB were resilvered. But why were only 16.6 GB resilvered?

After some research on the net, I found out that I should have formatted the disk and used the "replace" command. I used the "online" command because I was hoping that ZFS would accept the disk as the same (already containing data) instead of a new "virgin" disk. But even in that case, wouldn't ZFS have to check the data and read ~3 TB of data?

I am running a scrub now ...
Last edited by m_seitz on 20 Nov 2015 23:37, edited 2 times in total.
My Nas
MB: Asus M5A78L-M/USB3 RAM: 32GB unbuffered ECC (4x Kingston KVR16E11/8)
CPU: AMD Phenom II X2 550 Storage: 6x 12TB HGST (crap, don't buy!) HUH721212ALN604 (raidZ2, one single pool)
PSU: Cooler Master V450S (450W) UPS: CyberPower CP1300EPFCLCD (USB via usbhid-ups)
OS: XigmaNAS 12.1 x64-embedded

m_seitz
Starter
Starter
Posts: 28
Joined: 01 Mar 2015 15:36
Status: Offline

Re: Infeasible resilvering time after disk replacement

Post by m_seitz »

OK, something is seriously wrong. Now another disk is reporting Ultra DMA CRC errors and the pool is degraded again.
Nice, one disk in an undefined state and another one faulted ... :-(

Could this be a dying SATA controller?

And more importantly, how can I restore my pool? Should I connect all disks to my other PC (which also has ECC RAM), boot NAS4Free and repair the pool? I am really afraid of loosing my data when another disk decides to drown in Ultra DMA CRC errors :shock:

Edit: I am currently setting up a new NAS4Free with some old disks and different cables to see if they would report Ultra DMA CRC errors too ...
My Nas
MB: Asus M5A78L-M/USB3 RAM: 32GB unbuffered ECC (4x Kingston KVR16E11/8)
CPU: AMD Phenom II X2 550 Storage: 6x 12TB HGST (crap, don't buy!) HUH721212ALN604 (raidZ2, one single pool)
PSU: Cooler Master V450S (450W) UPS: CyberPower CP1300EPFCLCD (USB via usbhid-ups)
OS: XigmaNAS 12.1 x64-embedded

m_seitz
Starter
Starter
Posts: 28
Joined: 01 Mar 2015 15:36
Status: Offline

Re: Dying SATA controller? (Ultra DMA CRC errors killing my raidZ2)

Post by m_seitz »

In case anyone is interested: A raidZ1 with 3 old disk showed that one of the SATA cables was faulty. Since they were all of the same type, I replaced all cables.

I connected the original 6 HDDs and backed up the data of my raidZ2 pool. For some reason, the second disk that failed due to Ultra DMA CRC errors was online without me having to do anything.
After the backup, a scrub verified that the pool is error free and all HDDs are working fine.

A lot of time wasted because of some faulty cables. Strange how such a low-tech part can break after so many months, without even touching it.
At least I learned a good deal about NAS4Free, ZFS and its resilence :-)
My Nas
MB: Asus M5A78L-M/USB3 RAM: 32GB unbuffered ECC (4x Kingston KVR16E11/8)
CPU: AMD Phenom II X2 550 Storage: 6x 12TB HGST (crap, don't buy!) HUH721212ALN604 (raidZ2, one single pool)
PSU: Cooler Master V450S (450W) UPS: CyberPower CP1300EPFCLCD (USB via usbhid-ups)
OS: XigmaNAS 12.1 x64-embedded

User avatar
JoseMR
Hardware & Software Guru
Hardware & Software Guru
Posts: 1058
Joined: 16 Apr 2014 04:15
Location: PR
Contact:
Status: Offline

Re: Dying SATA controller? (Ultra DMA CRC errors killing my raidZ2)

Post by JoseMR »

m_seitz wrote:Strange how such a low-tech part can break after so many months, without even touching it.
IMHO is hard for a SATA cable to get damaged, unless operating under extreme vibration conditions, though if operating under high humidity conditions, the contact joints will generally degrade, not only for sata cables but for any contact joints inside a computer even if gold plated.
System: FreeBSD 12 RootOnZFS Mirror, MB: Supermicro X8SI6-F, Xeon X3450, 16GB DDR3 ECC RDIMMs.
XigmaNAS RootOnZFS
Addons at GitHub
BastilleBSD
Boot Environments Intro
Resources Home Page

^nighthawk^
Starter
Starter
Posts: 23
Joined: 11 Sep 2014 10:02
Status: Offline

Re: Dying SATA controller? (Ultra DMA CRC errors killing my raidZ2)

Post by ^nighthawk^ »

I was about to reply to this, one of my new disks was reporting these errors and thankfully I was still in the process of setting up the new device and trying a myriad of things to test resilience and the viabilitiy of the setup. I had some errors occur on a single disk, that had passed both short and long smart tests previously but i had just transferred about 72GB of data to it.... so this struck me as being quite odd.

Two things solved the problem, first I checked the power cable connection, replaced the SATA cable on advice i had read elsewhere. Someone also mentioned that they had a similar problem and in fact their solution weirdly was to replace the USB stick that they were running the embedded installation off, now i forget exactly why this was the case...but as I was using quite an old stick and it wasn't in great condition I replaced that as well. Since then the errors have gone away.

I'm now looking for an easy way to have a redundant USB setup, where the config is saved back to both disks and they mirror each other... not found anything yet however. I suppose it isnt a hardship to reload the config, but If i have a lot of scripts in the future it might be more painful.

Glad you solved your problem though. :)

User avatar
JoseMR
Hardware & Software Guru
Hardware & Software Guru
Posts: 1058
Joined: 16 Apr 2014 04:15
Location: PR
Contact:
Status: Offline

Re: Dying SATA controller? (Ultra DMA CRC errors killing my raidZ2)

Post by JoseMR »

^nighthawk^ wrote:I'm now looking for an easy way to have a redundant USB setup, where the config is saved back to both disks and they mirror each other... not found anything yet however. I suppose it isnt a hardship to reload the config, but If i have a lot of scripts in the future it might be more painful.

Glad you solved your problem though. :)
I've created an unofficial installer for redundant solution, but for NAS4Free Full expert users HERE

But I think I will play with mirrored embedded usb sticks :D

Regards
System: FreeBSD 12 RootOnZFS Mirror, MB: Supermicro X8SI6-F, Xeon X3450, 16GB DDR3 ECC RDIMMs.
XigmaNAS RootOnZFS
Addons at GitHub
BastilleBSD
Boot Environments Intro
Resources Home Page

^nighthawk^
Starter
Starter
Posts: 23
Joined: 11 Sep 2014 10:02
Status: Offline

Re: Dying SATA controller? (Ultra DMA CRC errors killing my raidZ2)

Post by ^nighthawk^ »

JoseMR wrote: I've created an unofficial installer for redundant solution, but for NAS4Free Full expert users HERE

But I think I will play with mirrored embedded usb sticks :D

That would be excellent, I did read that thread but it didn't look like it was possible for embedded but I need to revisit, having the ability to have that redundancy saves a lot of hassle in pretty much all areas, in this case especially with usb storage being at a low price at the moment :)

m_seitz
Starter
Starter
Posts: 28
Joined: 01 Mar 2015 15:36
Status: Offline

Re: Dying SATA controller? (Ultra DMA CRC errors killing my raidZ2)

Post by m_seitz »

JoseMR wrote:IMHO is hard for a SATA cable to get damaged, unless operating under extreme vibration conditions, though if operating under high humidity conditions, the contact joints will generally degrade, not only for sata cables but for any contact joints inside a computer even if gold plated.
This would be the only possible explanation. My NAS is standing in the living room, where it is not exposed to humidity. I can only imagine that one of the wires inside the cable was not properly connected to its contact joint and became lose over time. That cable actually prevented the BIOS from recognising HDDs attached to it. The second failure must have been caused by me touching/moving the other SATA cables and thereby triggering a second bad contact to get exposed.
^nighthawk^ wrote:... Two things solved the problem, first I checked the power cable connection, replaced the SATA cable on advice i had read elsewhere. Someone also mentioned that they had a similar problem and in fact their solution weirdly was to replace the USB stick that they were running the embedded installation off, now i forget exactly why this was the case...but as I was using quite an old stick and it wasn't in great condition I replaced that as well. Since then the errors have gone away.
...
Glad you solved your problem though. :)
Oi, I will keep an eye on my USB stick and probably also get a new one as well.

All in all, very scary :mrgreen:, but it feels good having a healthy raidZ2 back and knowing that it survived a blow :D
My Nas
MB: Asus M5A78L-M/USB3 RAM: 32GB unbuffered ECC (4x Kingston KVR16E11/8)
CPU: AMD Phenom II X2 550 Storage: 6x 12TB HGST (crap, don't buy!) HUH721212ALN604 (raidZ2, one single pool)
PSU: Cooler Master V450S (450W) UPS: CyberPower CP1300EPFCLCD (USB via usbhid-ups)
OS: XigmaNAS 12.1 x64-embedded

m_seitz
Starter
Starter
Posts: 28
Joined: 01 Mar 2015 15:36
Status: Offline

Re: Dying SATA controller? (Ultra DMA CRC errors killing my raidZ2)

Post by m_seitz »

After some days of use, everything is still working fine ... aaaaaaaaand it's gone! Again.
Same disk, same Ultra DMA CRC error.
Running my 3 old tests disks again and trying to find out whether the motherboard is faulty or if it is the disk :-(
I wonder if joggling the cables will uncover a slack joint on the motherboard, or if it will create one...
My Nas
MB: Asus M5A78L-M/USB3 RAM: 32GB unbuffered ECC (4x Kingston KVR16E11/8)
CPU: AMD Phenom II X2 550 Storage: 6x 12TB HGST (crap, don't buy!) HUH721212ALN604 (raidZ2, one single pool)
PSU: Cooler Master V450S (450W) UPS: CyberPower CP1300EPFCLCD (USB via usbhid-ups)
OS: XigmaNAS 12.1 x64-embedded

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2438
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: Dying SATA controller? (Ultra DMA CRC errors killing my raidZ2)

Post by b0ssman »

Could you provide some information about your hardware components?

Gesendet von meinem D5803 mit Tapatalk
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

User avatar
JoseMR
Hardware & Software Guru
Hardware & Software Guru
Posts: 1058
Joined: 16 Apr 2014 04:15
Location: PR
Contact:
Status: Offline

Re: Dying SATA controller? (Ultra DMA CRC errors killing my raidZ2)

Post by JoseMR »

Also seek for bulging capacitors around the motherboard, they are the main cause for silent hardware instability, though I've replaced lots of them bad, but in good psychical shape, nonetheless, working with unstable computer hardware is a pita generally.
badcaps.png
You do not have the required permissions to view the files attached to this post.
System: FreeBSD 12 RootOnZFS Mirror, MB: Supermicro X8SI6-F, Xeon X3450, 16GB DDR3 ECC RDIMMs.
XigmaNAS RootOnZFS
Addons at GitHub
BastilleBSD
Boot Environments Intro
Resources Home Page

m_seitz
Starter
Starter
Posts: 28
Joined: 01 Mar 2015 15:36
Status: Offline

Re: Dying SATA controller? (Ultra DMA CRC errors killing my raidZ2)

Post by m_seitz »

Thanks for all the support!
I added the hardware configuration to my signature. All components are brand-new.
The motherboard is populated with solid capacitors. I know those can die as well, but it is more difficult to see. There is no electrolyte around any of them and they are not bulged.
Right now, I am waiting for the "bad" disk to fail again. Testing cables, plugs and stressing solder joints did not provoke errors.
My Nas
MB: Asus M5A78L-M/USB3 RAM: 32GB unbuffered ECC (4x Kingston KVR16E11/8)
CPU: AMD Phenom II X2 550 Storage: 6x 12TB HGST (crap, don't buy!) HUH721212ALN604 (raidZ2, one single pool)
PSU: Cooler Master V450S (450W) UPS: CyberPower CP1300EPFCLCD (USB via usbhid-ups)
OS: XigmaNAS 12.1 x64-embedded

User avatar
JoseMR
Hardware & Software Guru
Hardware & Software Guru
Posts: 1058
Joined: 16 Apr 2014 04:15
Location: PR
Contact:
Status: Offline

Re: Dying SATA controller? (Ultra DMA CRC errors killing my raidZ2)

Post by JoseMR »

Hi, as a side note I've fixed lots of misbehaving drives with a simple full ZERO fill with DD command, alternatively zero filling the first/last 100MB part of the disk like WD Tools does also worked for me.

Code: Select all

#!/bin/sh
# Quick Disk Wiper 1.0
printf "\033[31m%s\33[0m\n" "Warning: This will wipe the first and the last 100MB of the selected disk."
echo "What disk do you want to wipe?"
camcontrol devlist
echo "For example - ada0 :"
read disk
printf "\033[31m%s\33[0m\n" "OK, in 15 seconds I will destroy all data on $disk!"
echo "Press CTRL+C to abort!"
sleep 15
echo Disk wiping started!
diskinfo ${disk} | while read disk sectorsize size sectors other
do
        # Wipe the first 100MB.
        /bin/dd if=/dev/zero of=/dev/${disk} bs=${sectorsize} count=204800
        # Wipe the last 100MB.
        /bin/dd if=/dev/zero of=/dev/${disk} bs=${sectorsize} oseek=`expr $sectors - 204800` count=204800
done
System: FreeBSD 12 RootOnZFS Mirror, MB: Supermicro X8SI6-F, Xeon X3450, 16GB DDR3 ECC RDIMMs.
XigmaNAS RootOnZFS
Addons at GitHub
BastilleBSD
Boot Environments Intro
Resources Home Page

Post Reply

Return to “ZFS (only!)”