Page 1 of 1

ZFS - Repairing extemly slow

Posted: 20 Aug 2016 16:53
by redline
Hello
I have Nas4free 9.1 (Sandstorm) running and replaced for a few days a disk,
Before i had 3TB,3TB,3TB,2TB and after replacing the old 2TB ,#
i have now 4x3TB Western Digital Red (NAS) WD30EFRX.

Autoexpand was set to on, Replacing was no Problem.
Before and after the Resilvering i also scubed the Pool.

Now after 10 days uptime, i noticed that occasionally it takes longer to Access some files
over SMB from the Windows Client.

I could see in the log file error Messages about the new, replaced 3TB harddisk.

I started scrubbing again to see if maybe a problem is in the pool.
Til 75% ths scrub runs normal , but then the Status switched to "repair".
Now, around 6 hours later i see 77,58%.

Everything is in slow Motion. Onyl a few 31 "Megabytes" are reported til yet,
I don't know if i should wait til scrub is complete.
The expected time till finish is growing in the last hours.

The Access over the Windows Client is also in slow Motion.
The NAS itself is idleing and has no CPU load.

I don't know if i should replace the recently new harddisk as soon as possible
or if i shall look elsewhere.

pool: RZ1
state: ONLINE
scan: scrub in progress since Sat Aug 20 01:21:42 2016
6.22T scanned out of 8.02T at 117M/s, 4h28m to go
31.0M repaired, 77.58% done
config:

NAME STATE READ WRITE CKSUM
RZ1 ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0 (repairing)

errors: No known data errors



Over the Log File i see every few seconds these error Messages.
Hundreds of them in the last hours. I only clipped These two sequences as an example.


Aug 20 16:47:55 nas4free kernel: (ada3:ahcich3:0:0:0): Retrying command
Aug 20 16:47:55 nas4free kernel: (ada3:ahcich3:0:0:0): RES: 41 40 b7 b5 1c 40 01 01 00 00 00
Aug 20 16:47:55 nas4free kernel: (ada3:ahcich3:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Aug 20 16:47:55 nas4free kernel: (ada3:ahcich3:0:0:0): CAM status: ATA Status Error
Aug 20 16:47:55 nas4free kernel: (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 30 b5 1c 40 01 01 00 01 00 00
Aug 20 16:47:48 nas4free kernel: (ada3:ahcich3:0:0:0): Retrying command
Aug 20 16:47:48 nas4free kernel: (ada3:ahcich3:0:0:0): RES: 41 40 b0 b5 1c 40 01 01 00 00 00
Aug 20 16:47:48 nas4free kernel: (ada3:ahcich3:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Aug 20 16:47:48 nas4free kernel: (ada3:ahcich3:0:0:0): CAM status: ATA Status Error
Aug 20 16:47:48 nas4free kernel: (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 30 b5 1c 40 01 01 00 01 00 00
Aug 20 16:47:41 nas4free kernel: (ada3:ahcich3:0:0:0): Retrying command
Aug 20 16:47:41 nas4free kernel: (ada3:ahcich3:0:0:0): RES: 41 40 b0 b5 1c 40 01 01 00 00 00
Aug 20 16:47:41 nas4free kernel: (ada3:ahcich3:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Aug 20 16:47:41 nas4free kernel: (ada3:ahcich3:0:0:0): CAM status: ATA Status Error
Aug 20 16:47:41 nas4free kernel: (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 30 b5 1c 40 01 01 00 01 00 00
Aug 20 16:47:25 nas4free kernel: (ada3:ahcich3:0:0:0): Error 5, Retries exhausted
Aug 20 16:47:25 nas4free kernel: (ada3:ahcich3:0:0:0): RES: 41 40 a8 a5 1c 40 01 01 00 00 00
Aug 20 16:47:25 nas4free kernel: (ada3:ahcich3:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Aug 20 16:47:25 nas4free kernel: (ada3:ahcich3:0:0:0): CAM status: ATA Status Error
Aug 20 16:47:25 nas4free kernel: (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 58 a5 1c 40 01 01 00 01 00 00
Aug 20 16:47:17 nas4free kernel: (ada3:ahcich3:0:0:0): Retrying command
Aug 20 16:47:17 nas4free kernel: (ada3:ahcich3:0:0:0): RES: 41 40 af a5 1c 40 01 01 00 00 00
Aug 20 16:47:17 nas4free kernel: (ada3:ahcich3:0:0:0): ATA status: 41 (DRDY ERR), error: 40 (UNC )
Aug 20 16:47:17 nas4free kernel: (ada3:ahcich3:0:0:0): CAM status: ATA Status Error
Aug 20 16:47:17 nas4free kernel: (ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 58 a5 1c 40 01 01 00 01 00 00

Re: ZFS - Repairing extemly slow

Posted: 20 Aug 2016 17:06
by redline
I can see that all other harddisk have "Zero" at Raw Read Error Rate.
The new harddsik has 26369 entries.


SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 112 112 051 Pre-fail Always - 26369
3 Spin_Up_Time 0x0027 180 179 021 Pre-fail Always - 5991
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 42
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 292
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 1
193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 40
194 Temperature_Celsius 0x0022 121 119 000 Old_age Always - 29
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0

SMART Error Log Version: 1

Re: ZFS - Repairing extemly slow

Posted: 20 Aug 2016 17:23
by ernie
Hello redline,

For me the message 'CAM status: ATA Status Error' is a potential bad cable.

How is connected your HDD ? Do you use a backplane box ?
If yes, it will be good to stop the scrub and to connect directly the hdd on your motherboard .

If you don't use backplane box; do you use SATA cable with metalic clip ? If not, it could be a solution.

In anyway you can try to connect again your HHD (before you had to stop the scrub).

And as there is a potential bad connection, there is error of reading.

Don't forgot to backup before working on this.

Re: ZFS - Repairing extemly slow

Posted: 22 Aug 2016 12:00
by redline
Hello Ernie
I have no backplanebox but metalic Clips at the sata cables.
I also double checked the cabling.
After a second scrub try, the same happens. Til 75% everything is fine, but then the
error Messages apear in the log file again.

So, I purchased saturday , locally a new WD Red 3TB and replaced it.
After resilvering and a pause of a few hours, i scrubed again.

Now the scrub process runs through, without errors Logs.
Hopefully, the Problem is solved now. At least the Pool is fine without data errors.

The damaged disk, after this connected to the local Windows PC, is reported as 800 Giga Byte harddisk.
And the Western Digital Diagnostic tool came up with "too many bad sectors detected".
I try to reclame it. It's under 3 years from buying, laying around in the antistatic foil the whole time
and has only 90 hours uptime (smart entry)

thanx for your help
Redline

Re: ZFS - Repairing extemly slow

Posted: 22 Aug 2016 13:30
by ernie
Good news

Re: ZFS - Repairing extemly slow

Posted: 22 Aug 2016 16:40
by substr
It sounds like the HDD is failing/failed.

However, when your Windows PC says 800GB, that probably means your BIOS, hard disk controller, or the drivers for it, do not support 3TB or larger HDDs.