*New 12.1 series Release:
2019-11-08: XigmaNAS 12.1.0.4.7091 - released!

*New 11.3 series Release:
2019-10-19: XigmaNAS 11.3.0.4.7014 - released


We really need "Your" help on XigmaNAS https://translations.launchpad.net/xigmanas translations. Please help today!

Producing and hosting XigmaNAS costs money. Please consider donating for our project so that we can continue to offer you the best.
We need your support! eg: PAYPAL

ZFS Scrub taking (almost) forever

Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
JoyMonkey
NewUser
NewUser
Posts: 12
Joined: 05 Feb 2013 13:17
Status: Offline

ZFS Scrub taking (almost) forever

#1

Post by JoyMonkey » 05 Feb 2013 13:26

I imported a ZFS pool (5x2tb drives, raidz1) from FreeNAS 8.3 into NAS4Free 9.1.0.1.621 and figured it wouldn't hurt to give it a scrub, so I set it going last night and checked in on it this morning...

Code: Select all

  pool: raider
 state: ONLINE
  scan: scrub in progress since Mon Feb  4 21:31:33 2013
        77.2G scanned out of 5.12T at 2.34M/s, 628h38m to go
        260K repaired, 1.47% done
config:

	NAME                                            STATE     READ WRITE CKSUM
	raider                                          ONLINE       0     0     0
	  raidz1-0                                      ONLINE       0     0     0
	    gptid/26116fb6-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0  (repairing)
	    gptid/26971116-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
	    gptid/26fa2600-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
	    gptid/27b1c2b5-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
	    gptid/282b425b-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0

errors: No known data errors
77.2G scanned out of 5.12T at 2.34M/s, 628h38m to go :shock:

Something's obviously not right. My CPU usage is at 0% and memory usage is at 2% of 31972MiB, so it could definitely be working harder.
What should I be looking at to determine what could be keeping it so slow?

rostreich
Status: Offline

Re: ZFS Scrub taking (almost) forever

#2

Post by rostreich » 05 Feb 2013 13:48

One of the disks is faulty, i think. Maybe recoverable, maybe not. What does smart say?

When the heads try to read faulty sectors, they will redo it again and again and makes data throughput slow as hell.

Let it run.

What is the status now?

JoyMonkey
NewUser
NewUser
Posts: 12
Joined: 05 Feb 2013 13:17
Status: Offline

Re: ZFS Scrub taking (almost) forever

#3

Post by JoyMonkey » 05 Feb 2013 14:02

Thanks for the quick reply.

Now the scrub is no different...

Code: Select all

 pool: raider
 state: ONLINE
  scan: scrub in progress since Mon Feb  4 21:31:33 2013
        79.3G scanned out of 5.12T at 2.28M/s, 643h50m to go
        260K repaired, 1.51% done
config:

	NAME                                            STATE     READ WRITE CKSUM
	raider                                          ONLINE       0     0     0
	  raidz1-0                                      ONLINE       0     0     0
	    gptid/26116fb6-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0  (repairing)
	    gptid/26971116-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
	    gptid/26fa2600-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
	    gptid/27b1c2b5-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0
	    gptid/282b425b-5d0b-11e2-ac5a-b8975a2730dc  ONLINE       0     0     0

errors: No known data errors
Each drive's SMART info says its okay (SMART overall-health self-assessment test result: PASSED), so I don't think there's a faulty one (AFAIK). Here's the full SMART output...

Code: Select all

Device /dev/ada0 - WDC WD20EADS-00S2B0 01.00A01
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green
Device Model:     WDC WD20EADS-00S2B0
Serial Number:    WD-WCAVY1729302
LU WWN Device Id: 5 0014ee 2ae80d117
Firmware Version: 01.00A01
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Feb  5 07:53:15 2013 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		(42360) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 482) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x303f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   158   158   051    Pre-fail  Always       -       3164178
  3 Spin_Up_Time            0x0027   144   144   021    Pre-fail  Always       -       9783
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       145
  5 Reallocated_Sector_Ct   0x0033   158   158   140    Pre-fail  Always       -       336
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       22462
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       142
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       64
193 Load_Cycle_Count        0x0032   014   014   000    Old_age   Always       -       559499
194 Temperature_Celsius     0x0022   124   107   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   001   001   000    Old_age   Always       -       249
197 Current_Pending_Sector  0x0032   198   196   000    Old_age   Always       -       929
198 Offline_Uncorrectable   0x0030   199   199   000    Old_age   Offline      -       619
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       2487
200 Multi_Zone_Error_Rate   0x0008   001   001   000    Old_age   Offline      -       1454259

SMART Error Log Version: 1
Warning: ATA error count 1340 inconsistent with error log pointer 2

ATA Error Count: 1340 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1340 occurred at disk power-on lifetime: 22453 hours (935 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 32 8a 40 e0  Error: UNC at LBA = 0x00408a32 = 4229682

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 00 8a 40 e0 00      02:35:48.890  READ DMA
  ef 02 00 00 00 00 e0 00      02:35:48.867  SET FEATURES [Enable write cache]
  ef aa 00 00 00 00 e0 00      02:35:48.867  SET FEATURES [Enable read look-ahead]
  c6 00 10 00 00 00 e0 00      02:35:48.867  SET MULTIPLE MODE
  ef 03 42 00 00 00 e0 00      02:35:48.866  SET FEATURES [Set transfer mode]

Error 1339 occurred at disk power-on lifetime: 22453 hours (935 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 99 79 40 e0  Error: UNC at LBA = 0x00407999 = 4225433

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 00 40 79 40 e0 00      02:26:26.499  READ DMA
  c8 00 00 40 78 40 e0 00      02:26:10.458  READ DMA
  c8 00 40 00 78 40 e0 00      02:26:06.106  READ DMA

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Device /dev/ada1 - WDC WD20EADS-00S2B0 01.00A01
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green
Device Model:     WDC WD20EADS-00S2B0
Serial Number:    WD-WCAVY1753136
LU WWN Device Id: 5 0014ee 2ae839519
Firmware Version: 01.00A01
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Feb  5 07:53:36 2013 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		(42360) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 482) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x303f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   150   148   021    Pre-fail  Always       -       9466
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       147
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   069   069   000    Old_age   Always       -       23163
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       144
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       59
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       642567
194 Temperature_Celsius     0x0022   123   106   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   001   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Device /dev/ada2 - WDC WD20EARS-00S8B1 80.00A80
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD20EARS-00S8B1
Serial Number:    WD-WCAVY2657186
LU WWN Device Id: 5 0014ee 2595cdbd7
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Feb  5 07:53:46 2013 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		(39960) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 455) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3031)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   146   146   021    Pre-fail  Always       -       9666
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       132
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   069   069   000    Old_age   Always       -       22890
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       130
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       77
193 Load_Cycle_Count        0x0032   046   046   000    Old_age   Always       -       463009
194 Temperature_Celsius     0x0022   123   109   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       4
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       2
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       9

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Device /dev/ada3 - WDC WD20EARS-00S8B1 80.00A80
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD20EARS-00S8B1
Serial Number:    WD-WCAVY2606521
LU WWN Device Id: 5 0014ee 2aeb28f43
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Feb  5 07:53:46 2013 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		(38400) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 437) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3031)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   155   155   021    Pre-fail  Always       -       9208
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       129
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       22172
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       127
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       74
193 Load_Cycle_Count        0x0032   046   046   000    Old_age   Always       -       463067
194 Temperature_Celsius     0x0022   124   109   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Device /dev/ada4 - WDC WD20EARS-00S8B1 80.00A80
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (AF)
Device Model:     WDC WD20EARS-00S8B1
Serial Number:    WD-WCAVY2961990
LU WWN Device Id: 5 0014ee 259853670
Firmware Version: 80.00A80
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Tue Feb  5 07:53:46 2013 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)	Offline data collection activity
					was suspended by an interrupting command from host.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		(38400) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 437) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.
SCT capabilities: 	       (0x3031)	SCT Status supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   147   144   021    Pre-fail  Always       -       9608
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       135
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   069   069   000    Old_age   Always       -       22972
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       133
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       75
193 Load_Cycle_Count        0x0032   047   047   000    Old_age   Always       -       459318
194 Temperature_Celsius     0x0022   126   108   000    Old_age   Always       -       26
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Device /dev/da0 - pqi IntelligentStick 0.00
/dev/da0: Unknown USB bridge [0x3538:0x0054 (0x100)]
Please specify device type with the -d option.

Use smartctl -h to get a usage summary
I did add some tunings to /cf/boot/loader.conf (to try to utilize my 32gb of memory better). Could I have screwed up here?

Code: Select all

mfsroot_load="YES"
mfsroot_type="mfs_root"
mfsroot_name="/mfsroot"
hw.est.msr_info="0"
hw.hptrr.attach_generic="0"
kern.maxfiles="65536"
kern.maxfilesperproc="50000"
kern.cam.boot_delay="8000"
vfs.zfs.prefetch_disable="1"
autoboot_delay="3"
isboot_load="YES"
zfs_load="YES"

vm.kmem_size="28G"
vfs.zfs.arc_min="22G"
vfs.zfs.arc_max="24G"
vfs.zfs.prefetch_disable="0"
vfs.zfs.txg.synctime="2"
vfs.zfs.txg.timeout="5"
vfs.zfs.vdev.min_pending="1"
vfs.zfs.vdev.max_pending="1"

User avatar
raulfg3
Site Admin
Site Admin
Posts: 4978
Joined: 22 Jun 2012 22:13
Location: Madrid (ESPAÑA)
Contact:
Status: Offline

Re: ZFS Scrub taking (almost) forever

#4

Post by raulfg3 » 05 Feb 2013 14:22

google about vfs.zfs.prefetch_disable="1"
only disable for RAM<4GB if you have more that 4GB must be enable = vfs.zfs.prefetch_disable="0"
12.0.0.4 (revision 6766)+OBI on SUPERMICRO X8SIL-F 8GB of ECC RAM, 12x3TB disk in 3 vdev in RaidZ1 = 32TB Raw size only 22TB usable

Wiki
Last changes

jasch
experienced User
experienced User
Posts: 144
Joined: 25 Jun 2012 10:25
Location: Germany
Status: Offline

Re: ZFS Scrub taking (almost) forever

#5

Post by jasch » 05 Feb 2013 14:23

conf looking good only one mistake you have vfs.zfs.prefetch_disable="0"(wich is correct) but you also have
vfs.zfs.prefetch_disable="1" some lines before.
i dont now wich value now is active.
XigmaNAS 12.0.0.4 (6625)@PROXMOX 5.V - Supermicro X8DTH-6F | 2x Xeon L5640 | 96GB ECC | LSI 9210-8i|LSI 9500-8e|LSI 9201-16i | 40GBe IB Mellanox |

JoyMonkey
NewUser
NewUser
Posts: 12
Joined: 05 Feb 2013 13:17
Status: Offline

Re: ZFS Scrub taking (almost) forever

#6

Post by JoyMonkey » 05 Feb 2013 14:25

jasch wrote:conf looking good only one mistake you have vfs.zfs.prefetch_disable="0"(wich is correct) but you also have
vfs.zfs.prefetch_disable="1" some lines before.
i dont now wich value now is active.
'gah! :oops:
I'll delete vfs.zfs.prefetch_disable="1" , restart the box and see if it fixes things.
Thanks!

JoyMonkey
NewUser
NewUser
Posts: 12
Joined: 05 Feb 2013 13:17
Status: Offline

Re: ZFS Scrub taking (almost) forever

#7

Post by JoyMonkey » 05 Feb 2013 15:15

Oh boy. After deleting vfs.zfs.prefetch_disable="1" and restarting I get...
82.8G scanned out of 5.12T at 62.8K/s, (scan is slow, no estimated time)
That'll take years!

CPU usage is still at 0% and Memory usage at 1%. SMART says all drives pass.

I thought it might be helpful to verify that my loader.conf settings are what I set them to. So I ran
sysctl -a | grep kmem
and
sysctl -a | grep zfs

In loader.conf I have vm.kmem_size="28G", but my system says...
vm.kmem_map_free: 29924339712
vm.kmem_map_size: 131973120
vm.kmem_size_scale: 2
vm.kmem_size_max: 329853485875 [that's 307.2 gigabytes !!!]
vm.kmem_size_min: 0
vm.kmem_size: 30064771072 [that's 28 gigabytes , same as I set loader.conf]

Should I try setting vm.kmem_size_max to 28G also?
I might just remove all my tunings from loader.conf and see what the scrub speed is like before making any more changes.

here's the full output...

Code: Select all

nas4free:~# sysctl -a | grep kmem
vm.kmem_map_free: 29924339712
vm.kmem_map_size: 131973120
vm.kmem_size_scale: 2
vm.kmem_size_max: 329853485875
vm.kmem_size_min: 0
vm.kmem_size: 30064771072

nas4free:~# sysctl -a | grep zfs
1 PART ada4p2 1998251364352 512 i 2 o 2147549184 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b
1 PART ada3p2 1998251364352 512 i 2 o 2147549184 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b
1 PART ada2p2 1998251364352 512 i 2 o 2147549184 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b
1 PART ada1p2 1998251367936 512 i 2 o 2147549184 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b
1 PART ada0p2 1998251367936 512 i 2 o 2147549184 ty freebsd-zfs xs GPT xt 516e7cba-6ecf-11d6-8ff8-00022d09712b
z0xfffffe001d938b00 [shape=box,label="ZFS::VDEV\nzfs::vdev\nr#4"];
      <name>zfs::vdev</name>
            <type>freebsd-zfs</type>
            <type>freebsd-zfs</type>
            <type>freebsd-zfs</type>
            <type>freebsd-zfs</type>
            <type>freebsd-zfs</type>
vfs.zfs.l2c_only_size: 0
vfs.zfs.mfu_ghost_data_lsize: 0
vfs.zfs.mfu_ghost_metadata_lsize: 0
vfs.zfs.mfu_ghost_size: 0
vfs.zfs.mfu_data_lsize: 0
vfs.zfs.mfu_metadata_lsize: 73728
vfs.zfs.mfu_size: 288768
vfs.zfs.mru_ghost_data_lsize: 0
vfs.zfs.mru_ghost_metadata_lsize: 59392
vfs.zfs.mru_ghost_size: 59392
vfs.zfs.mru_data_lsize: 0
vfs.zfs.mru_metadata_lsize: 4957696
vfs.zfs.mru_size: 5107200
vfs.zfs.anon_data_lsize: 0
vfs.zfs.anon_metadata_lsize: 0
vfs.zfs.anon_size: 16384
vfs.zfs.l2arc_norw: 1
vfs.zfs.l2arc_feed_again: 1
vfs.zfs.l2arc_noprefetch: 1
vfs.zfs.l2arc_feed_min_ms: 200
vfs.zfs.l2arc_feed_secs: 1
vfs.zfs.l2arc_headroom: 2
vfs.zfs.l2arc_write_boost: 8388608
vfs.zfs.l2arc_write_max: 8388608
vfs.zfs.arc_meta_limit: 6442450944
vfs.zfs.arc_meta_used: 5775416
vfs.zfs.arc_min: 23622320128
vfs.zfs.arc_max: 25769803776
vfs.zfs.dedup.prefetch: 1
vfs.zfs.mdcomp_disable: 0
vfs.zfs.write_limit_override: 0
vfs.zfs.write_limit_inflated: 100574306304
vfs.zfs.write_limit_max: 4190596096
vfs.zfs.write_limit_min: 33554432
vfs.zfs.write_limit_shift: 3
vfs.zfs.no_write_throttle: 0
vfs.zfs.zfetch.array_rd_sz: 1048576
vfs.zfs.zfetch.block_cap: 256
vfs.zfs.zfetch.min_sec_reap: 2
vfs.zfs.zfetch.max_streams: 8
vfs.zfs.prefetch_disable: 0
vfs.zfs.mg_alloc_failures: 8
vfs.zfs.check_hostid: 1
vfs.zfs.recover: 0
vfs.zfs.txg.synctime_ms: 1000
vfs.zfs.txg.timeout: 5
vfs.zfs.vdev.cache.bshift: 16
vfs.zfs.vdev.cache.size: 0
vfs.zfs.vdev.cache.max: 16384
vfs.zfs.vdev.write_gap_limit: 4096
vfs.zfs.vdev.read_gap_limit: 32768
vfs.zfs.vdev.aggregation_limit: 131072
vfs.zfs.vdev.ramp_rate: 2
vfs.zfs.vdev.time_shift: 6
vfs.zfs.vdev.min_pending: 1
vfs.zfs.vdev.max_pending: 1
vfs.zfs.vdev.bio_flush_disable: 0
vfs.zfs.cache_flush_disable: 0
vfs.zfs.zil_replay_disable: 0
vfs.zfs.zio.use_uma: 0
vfs.zfs.snapshot_list_prefetch: 0
vfs.zfs.version.zpl: 5
vfs.zfs.version.spa: 28
vfs.zfs.version.acl: 1
vfs.zfs.debug: 0
vfs.zfs.super_owner: 0
security.jail.param.allow.mount.zfs: 0
security.jail.mount_zfs_allowed: 0
kstat.zfs.misc.xuio_stats.onloan_read_buf: 0
kstat.zfs.misc.xuio_stats.onloan_write_buf: 0
kstat.zfs.misc.xuio_stats.read_buf_copied: 0
kstat.zfs.misc.xuio_stats.read_buf_nocopy: 0
kstat.zfs.misc.xuio_stats.write_buf_copied: 0
kstat.zfs.misc.xuio_stats.write_buf_nocopy: 0
kstat.zfs.misc.zfetchstats.hits: 3990
kstat.zfs.misc.zfetchstats.misses: 97
kstat.zfs.misc.zfetchstats.colinear_hits: 0
kstat.zfs.misc.zfetchstats.colinear_misses: 97
kstat.zfs.misc.zfetchstats.stride_hits: 3945
kstat.zfs.misc.zfetchstats.stride_misses: 1
kstat.zfs.misc.zfetchstats.reclaim_successes: 45
kstat.zfs.misc.zfetchstats.reclaim_failures: 52
kstat.zfs.misc.zfetchstats.streams_resets: 0
kstat.zfs.misc.zfetchstats.streams_noresets: 45
kstat.zfs.misc.zfetchstats.bogus_streams: 0
kstat.zfs.misc.arcstats.hits: 8109
kstat.zfs.misc.arcstats.misses: 384
kstat.zfs.misc.arcstats.demand_data_hits: 0
kstat.zfs.misc.arcstats.demand_data_misses: 0
kstat.zfs.misc.arcstats.demand_metadata_hits: 2911
kstat.zfs.misc.arcstats.demand_metadata_misses: 61
kstat.zfs.misc.arcstats.prefetch_data_hits: 0
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 5198
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 323
kstat.zfs.misc.arcstats.mru_hits: 1218
kstat.zfs.misc.arcstats.mru_ghost_hits: 1
kstat.zfs.misc.arcstats.mfu_hits: 1693
kstat.zfs.misc.arcstats.mfu_ghost_hits: 2
kstat.zfs.misc.arcstats.allocated: 528
kstat.zfs.misc.arcstats.deleted: 32
kstat.zfs.misc.arcstats.stolen: 0
kstat.zfs.misc.arcstats.recycle_miss: 0
kstat.zfs.misc.arcstats.mutex_miss: 0
kstat.zfs.misc.arcstats.evict_skip: 0
kstat.zfs.misc.arcstats.evict_l2_cached: 0
kstat.zfs.misc.arcstats.evict_l2_eligible: 0
kstat.zfs.misc.arcstats.evict_l2_ineligible: 4096
kstat.zfs.misc.arcstats.hash_elements: 358
kstat.zfs.misc.arcstats.hash_elements_max: 358
kstat.zfs.misc.arcstats.hash_collisions: 0
kstat.zfs.misc.arcstats.hash_chains: 0
kstat.zfs.misc.arcstats.hash_chain_max: 0
kstat.zfs.misc.arcstats.p: 12884843008
kstat.zfs.misc.arcstats.c: 25769803776
kstat.zfs.misc.arcstats.c_min: 23622320128
kstat.zfs.misc.arcstats.c_max: 25769803776
kstat.zfs.misc.arcstats.size: 5775416
kstat.zfs.misc.arcstats.hdr_size: 113840
kstat.zfs.misc.arcstats.data_size: 5412352
kstat.zfs.misc.arcstats.other_size: 249224
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_read_bytes: 0
kstat.zfs.misc.arcstats.l2_write_bytes: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 0
kstat.zfs.misc.arcstats.l2_write_trylock_fail: 0
kstat.zfs.misc.arcstats.l2_write_passed_headroom: 0
kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0
kstat.zfs.misc.arcstats.l2_write_in_l2: 0
kstat.zfs.misc.arcstats.l2_write_io_in_progress: 0
kstat.zfs.misc.arcstats.l2_write_not_cacheable: 2
kstat.zfs.misc.arcstats.l2_write_full: 0
kstat.zfs.misc.arcstats.l2_write_buffer_iter: 0
kstat.zfs.misc.arcstats.l2_write_pios: 0
kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 0
kstat.zfs.misc.vdev_cache_stats.delegations: 0
kstat.zfs.misc.vdev_cache_stats.hits: 0
kstat.zfs.misc.vdev_cache_stats.misses: 0

JoyMonkey
NewUser
NewUser
Posts: 12
Joined: 05 Feb 2013 13:17
Status: Offline

Re: ZFS Scrub taking (almost) forever

#8

Post by JoyMonkey » 05 Feb 2013 16:30

So now the only tuning I've done to loader.conf is changing vfs.zfs.prefetch_disable="0" (was 1).
After restarting I'm letting the server site for a while.
At first it was scrubbing at about 70kb/s. After a few minutes that went up to 140kb/s.
Now, after an hour of uptime it's scrubbing at 1.02M/s, so it seems to be very slowly increasing in speed.

rostreich
Status: Offline

Re: ZFS Scrub taking (almost) forever

#9

Post by rostreich » 05 Feb 2013 19:07

Good. In the meantime, I did a scrub myself.

I have 1,5 TB data to scrub, running now at 85.9M/s. I think it´s really slow on small files.

But I still suggest you wait until it´s over. Then you change the drive to a new one, resilver and scrub. Watch for speed then and investigate the faulty disk on another computer with smart tools.

JoyMonkey
NewUser
NewUser
Posts: 12
Joined: 05 Feb 2013 13:17
Status: Offline

Re: ZFS Scrub taking (almost) forever

#10

Post by JoyMonkey » 05 Feb 2013 19:20

The thing is, I don't see a faulty drive. SMART says they're all good. The scrub has now slowed to 457K/s, which will take several months if it doesn't pick up.

Something's just not right. I may roll back to FreeNAS to see if it's still performing normally there.

rostreich
Status: Offline

Re: ZFS Scrub taking (almost) forever

#11

Post by rostreich » 05 Feb 2013 20:11

I looked deeper into your smart logs. Your first disk is dead!

Look:

Code: Select all

Warning: ATA error count 1340 inconsistent with error log pointer 2

Code: Select all

Error 1340 occurred at disk power-on lifetime: 22453 hours (935 days + 13 hours)
      When the command that caused the error occurred, the device was active or idle.

      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 51 00 32 8a 40 e0  Error: UNC at LBA = 0x00408a32 = 4229682

      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      c8 00 00 00 8a 40 e0 00      02:35:48.890  READ DMA
      ef 02 00 00 00 00 e0 00      02:35:48.867  SET FEATURES [Enable write cache]
      ef aa 00 00 00 00 e0 00      02:35:48.867  SET FEATURES [Enable read look-ahead]
      c6 00 10 00 00 00 e0 00      02:35:48.867  SET MULTIPLE MODE
      ef 03 42 00 00 00 e0 00      02:35:48.866  SET FEATURES [Set transfer mode]

    Error 1339 occurred at disk power-on lifetime: 22453 hours (935 days + 13 hours)
      When the command that caused the error occurred, the device was active or idle.

      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 51 00 99 79 40 e0  Error: UNC at LBA = 0x00407999 = 4225433

      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      c8 00 00 40 79 40 e0 00      02:26:26.499  READ DMA
      c8 00 00 40 78 40 e0 00      02:26:10.458  READ DMA
      c8 00 40 00 78 40 e0 00      02:26:06.106  READ DMA
Furthermore you have 'Green' Models. Those disk have a spindown command from the internal firmware. As a result they are not proper for usage in a NAS.

As you can see here:

Code: Select all

 193 Load_Cycle_Count        0x0032   014   014   000    Old_age   Always       -       559499
This is too much compared with the runtime:

Code: Select all

9 Power_On_Hours          0x0032   070   070   000    Old_age   Always       -       22462
There exist tools to switch off these internal spindowns so you can use it in a NAS.

BUT we found out your problem. ;) And as you can see, the error happened during the scrub. -> Actual runtime: 22462 hours, error happened at 22453 hours

JoyMonkey
NewUser
NewUser
Posts: 12
Joined: 05 Feb 2013 13:17
Status: Offline

Re: ZFS Scrub taking (almost) forever

#12

Post by JoyMonkey » 05 Feb 2013 20:21

Wow! Not sure how I missed that from the SMART report. I'll get that disk out stat.

For my purposes the green drives have served me well in other multi-disk arrays (mdadm raid5 devices), so I figured they'd be fine to use with NAS4Free. I'll look into modifying the internal spindown.
Thanks!

rostreich
Status: Offline

Re: ZFS Scrub taking (almost) forever

#13

Post by rostreich » 05 Feb 2013 20:41

Good luck in recovering ;)

fsbruva
Advanced User
Advanced User
Posts: 383
Joined: 21 Sep 2012 14:50
Status: Offline

Re: ZFS Scrub taking (almost) forever

#14

Post by fsbruva » 05 Feb 2013 21:10

Turning off power saving things will will force ataidle to prevent the disk from spinning down.

The other thing that makes scrubbing slower is not using datasets. Instead of creating folders at the root of the pool, you should create datasets. This speeds up the scrub significantly (33% faster in my case).

JoyMonkey
NewUser
NewUser
Posts: 12
Joined: 05 Feb 2013 13:17
Status: Offline

Re: ZFS Scrub taking (almost) forever

#15

Post by JoyMonkey » 05 Feb 2013 21:14

Luckily that drive is still 2 days within it's warranty expiration!

I have all my data backed up, so it's probably best to nuke the pool from orbit. It's the only way to be sure.

rostreich
Status: Offline

Re: ZFS Scrub taking (almost) forever

#16

Post by rostreich » 06 Feb 2013 10:36

fsbruva wrote:Turning off power saving things will will force ataidle to prevent the disk from spinning down.

The other thing that makes scrubbing slower is not using datasets. Instead of creating folders at the root of the pool, you should create datasets. This speeds up the scrub significantly (33% faster in my case).
I thought, this has nothing to do with the internal spindown???

Yeah with the datasets you're right.
Luckily that drive is still 2 days within it's warranty expiration!
Hehe, this is nice :D
I have all my data backed up, so it's probably best to nuke the pool from orbit. It's the only way to be sure.
If I were you, I would do the procedure to learn and see the procedure with replacing a faulty disk, because this won't be the last disk dying in your arms. :lol:

You can still nuke the pool after that. ;)

fsbruva
Advanced User
Advanced User
Posts: 383
Joined: 21 Sep 2012 14:50
Status: Offline

Re: ZFS Scrub taking (almost) forever

#17

Post by fsbruva » 06 Feb 2013 13:39

rostreich wrote:
fsbruva wrote:Turning off power saving things will will force ataidle to prevent the disk from spinning down.
I thought, this has nothing to do with the internal spindown???
My NAS uses 2 laptop drives, which have very aggressive internal power management schemes (read: parking the disks all the d@mn time). My solution was to set advanced power management to 254, and disable acoustic management. This prevented the disks from parking all the time, and it arrested my Load_Cycle_Count value.

rostreich
Status: Offline

Re: ZFS Scrub taking (almost) forever

#18

Post by rostreich » 06 Feb 2013 17:26

Whoa, this is nice to know! :D

JoeMonkey: What is your config with apm and aam? This could help to run your disks safe!


I have Hard disk standby time for 10 minutes, apm and aam disabled.

Onichan
Advanced User
Advanced User
Posts: 238
Joined: 04 Jul 2012 21:41
Status: Offline

Re: ZFS Scrub taking (almost) forever

#19

Post by Onichan » 06 Feb 2013 19:08

rostreich: You are correct in that the spindown is different than the head parking that WD Greens have. By default the green drives will park their head after 8 seconds of inactivity which is quite excessive and bad in a NAS environment. Though I haven't tried forcing power saving off to see if that affects the WD head parking, I just normally disable it using wdidle3.

JoyMonkey: I have used greens in my NAS's for quite a while too and they have always been fine, but I do recommend using wdidle3 to change the head parking to either the max 300 seconds or just disabling it. Somebody has made a UBCD with it included at http://www.jzab.de/content/wdidle-bootcd that you can use to adjust that or you can download the exe from WD http://support.wdc.com/product/download ... 09&sid=113 and make your own bootable disk.

rostreich
Status: Offline

Re: ZFS Scrub taking (almost) forever

#20

Post by rostreich » 06 Feb 2013 19:43

rostreich: You are correct in that the spindown is different than the head parking that WD Greens have. By default the green drives will park their head after 8 seconds of inactivity which is quite excessive and bad in a NAS environment. Though I haven't tried forcing power saving off to see if that affects the WD head parking, I just normally disable it using wdidle3.
After doing more investigation:

- For WD disks it's called IntelliPark. Special, because the heads are parked on a ramp and not only aside.
- hdparm has NO affect, so I think ataidle is the same
-wdidle3 is the best way to change the settings


@fsbruva

what are your disk models?

fsbruva
Advanced User
Advanced User
Posts: 383
Joined: 21 Sep 2012 14:50
Status: Offline

Re: ZFS Scrub taking (almost) forever

#21

Post by fsbruva » 06 Feb 2013 23:45

I have Toshiba drives. I forgot the OP had WD greens.

rostreich
Status: Offline

Re: ZFS Scrub taking (almost) forever

#22

Post by rostreich » 07 Feb 2013 00:12

Ah k, it is good to know. THX!

JoyMonkey
NewUser
NewUser
Posts: 12
Joined: 05 Feb 2013 13:17
Status: Offline

Re: ZFS Scrub taking (almost) forever

#23

Post by JoyMonkey » 22 Feb 2013 15:16

Well, the replacement drive arrived and I finally got around to popping it in and replacing the (now missing) drive with it. I couldn't figure out how to do that via the web UI, so I logged in via SSH and used the "zpool replace" command. It's finished 'resilvering' and the pool is no longer degraded. I started a scrub and it's just going to take about 6 hours, so I think everything is running like it should.

Thanks for all the help guys!


Edit: Oh, and I did end up using the UBCD with wdidle3 included to disable the head parking on all my WD green drives. I followed this guide to get the ISO installed on a USB stick easily, then booted with 1 drive attached at a time to get them all set. Hopefully the drives that have been in there a couple of years will live a while longer now!

rostreich
Status: Offline

Re: ZFS Scrub taking (almost) forever

#24

Post by rostreich » 23 Feb 2013 16:26

Nice :)
I couldn't figure out how to do that via the web UI, so I logged in via SSH and used the "zpool replace" command
It could be done with webgui, but on console it's better because you see immediately reports etc.

Post Reply

Return to “ZFS (only!)”