This is the old XigmaNAS forum in read only mode,
it will taken offline by the end of march 2021!



I like to aks Users and Admins to rewrite/take over important post from here into the new fresh main forum!
Its not possible for us to export from here and import it to the main forum!

S.M.A.R.T and ZFS

Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
^nighthawk^
Starter
Starter
Posts: 23
Joined: 11 Sep 2014 10:02
Status: Offline

S.M.A.R.T and ZFS

Post by ^nighthawk^ »

Hi all,

Log time user, first time poster :)

I woke up this morning to some dreaded "clicking" from a hard drive in my homebrew NAS. Now this machine is kept away from me and is not actually used a whole lot so this could have been going on for some time.

The setup is:
nas4free 9.2.0.1 - Shigawire (revision 972) - embedded (on USB).
I'm in a RaidZ1 (if thats what you call it nowadays?) with four disks. I was hoping to move the data off at some point and go to a clean RaidZ2 with 5 (or maybe 7 disks) later for more redundancy.


I'll be brief I ran some S.M.A.R.T checks on the disks using the smartctl command to perform an extended offline test, now i'm not an expert at using this by any means.

This was the status of the smartctl -l selftest for each drive.

Code: Select all

$ smartctl -l selftest /dev/ada0
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4836         -

Code: Select all

$ smartctl -l selftest /dev/ada1
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      5046         1142887794

Code: Select all

$ smartctl -l selftest /dev/ada2
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       70%      5043         727949262

Code: Select all

$ smartctl -l selftest /dev/ada3
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4832         -
Now as you can see two of the disks appear to be fine, but two are not happy.
I can still get data off my ZFS pool, but sometimes it is painful to access (typical of when a drive is failing and it can't be spun up properly) the problem is i'm not sure which disk is causing the issue.
As of right now i just want to protect the data if possible, so if there are some commands that will help be better diagnose this please let me know.
I suspect /dev/ada1 is the culprit from an earlier aborted test, but cannot confirm.

The zpool also reports itself as healthy/online.

I need to know of any commands that may help me determine which drive and protect the integrity of the data in the zpool.
If there are any specific zfs commands for this? I read an article on repairing bad blocks using smarttools, but i was very wary of doing what they suggested if zfs isn't aware of it.
Anyone has any ideas please let me know, thanks.

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2438
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: S.M.A.R.T and ZFS

Post by b0ssman »

a selftest is not really helpfull.
please post the smart values

you can get them from the screen
Diagnostics|Information|S.M.A.R.T.
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

^nighthawk^
Starter
Starter
Posts: 23
Joined: 11 Sep 2014 10:02
Status: Offline

Re: S.M.A.R.T and ZFS

Post by ^nighthawk^ »

Ok thanks, incoming wall of text.

I built this machine a few years ago on what was then FreeNAS, switched to nas4free not long after the buyout/takeover (whatever it was), only recently updated to this latest version from the 9.1 Sandstorm (672? iirc)

ADA0

Code: Select all

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F2 EG
Device Model:     SAMSUNG HD154UI
Serial Number:    S1Y6J1LS917007
LU WWN Device Id: 5 0024e9 0023022b3
Firmware Version: 1AG01118
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7, ATA8-ACS T13/1699-D revision 3b
Local Time is:    Thu Sep 11 09:52:40 2014 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		(18312) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 306) minutes.
Conveyance self-test routine
recommended polling time: 	 (  32) minutes.
SCT capabilities: 	       (0x003f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   071   071   011    Pre-fail  Always       -       9380
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       744
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       11428
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       4842
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       219
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   000    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   077   073   000    Old_age   Always       -       23 (Min/Max 17/25)
194 Temperature_Celsius     0x0022   077   070   000    Old_age   Always       -       23 (Min/Max 17/27)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       5182
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4836         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay
ADA1

Code: Select all

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F2 EG
Device Model:     SAMSUNG HD154UI
Serial Number:    S1Y6J1LS917000
LU WWN Device Id: 5 0024e9 00230225b
Firmware Version: 1AG01118
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7, ATA8-ACS T13/1699-D revision 3b
Local Time is:    Thu Sep 11 09:52:40 2014 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 121)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline
data collection: 		(20245) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 338) minutes.
Conveyance self-test routine
recommended polling time: 	 (  35) minutes.
SCT capabilities: 	       (0x003f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   099   099   051    Pre-fail  Always       -       1434
  3 Spin_Up_Time            0x0007   070   070   011    Pre-fail  Always       -       9660
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       745
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       12130
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       5057
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       219
 13 Read_Soft_Error_Rate    0x000e   099   099   000    Old_age   Always       -       1433
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   000    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       1433
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   077   072   000    Old_age   Always       -       23 (Min/Max 18/23)
194 Temperature_Celsius     0x0022   077   069   000    Old_age   Always       -       23 (Min/Max 18/26)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       3123
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       7
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      5046         1142887794
# 2  Offline             Interrupted (host reset)      80%      5041         -
# 3  Offline             Interrupted (host reset)      90%      5040         -
# 4  Conveyance offline  Completed: read failure       90%      5040         1142887794
# 5  Offline             Aborted by host               90%      5040         -
# 6  Offline             Aborted by host               90%      5040         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
ADA2

Code: Select all

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F2 EG
Device Model:     SAMSUNG HD154UI
Serial Number:    S1Y6J1LS917015
LU WWN Device Id: 5 0024e9 002302316
Firmware Version: 1AG01118
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7, ATA8-ACS T13/1699-D revision 3b
Local Time is:    Thu Sep 11 09:52:40 2014 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 119)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline
data collection: 		(19734) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 330) minutes.
Conveyance self-test routine
recommended polling time: 	 (  34) minutes.
SCT capabilities: 	       (0x003f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       2
  3 Spin_Up_Time            0x0007   071   071   011    Pre-fail  Always       -       9610
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       771
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       12244
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       5053
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       219
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       2
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   000    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       2
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   075   072   000    Old_age   Always       -       25 (Min/Max 18/25)
194 Temperature_Celsius     0x0022   075   067   000    Old_age   Always       -       25 (Min/Max 18/29)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       1359
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       70%      5043         727949262
# 2  Offline             Completed: read failure       70%      5037         727949262

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
ADA3

Code: Select all

=== START OF INFORMATION SECTION ===
Model Family:     SAMSUNG SpinPoint F2 EG
Device Model:     SAMSUNG HD154UI
Serial Number:    S1Y6J1LS917006
LU WWN Device Id: 5 0024e9 002302295
Firmware Version: 1AG01118
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7, ATA8-ACS T13/1699-D revision 3b
Local Time is:    Thu Sep 11 09:52:40 2014 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection: 		(19470) seconds.
Offline data collection
capabilities: 			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 ( 325) minutes.
Conveyance self-test routine
recommended polling time: 	 (  34) minutes.
SCT capabilities: 	       (0x003f)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   100   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0007   070   070   011    Pre-fail  Always       -       9790
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       829
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   253   253   051    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   100   100   015    Pre-fail  Offline      -       11699
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       4838
 10 Spin_Retry_Count        0x0033   100   100   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0012   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       279
 13 Read_Soft_Error_Rate    0x000e   100   100   000    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   000    Pre-fail  Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   075   072   000    Old_age   Always       -       25 (Min/Max 19/27)
194 Temperature_Celsius     0x0022   075   068   000    Old_age   Always       -       25 (Min/Max 19/29)
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       11797
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x000a   100   100   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4832         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2438
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: S.M.A.R.T and ZFS

Post by b0ssman »

ada1 is failing
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 1433
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 7
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 1

ada2 is failing as well
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 2
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 1

since your nas is only consisting of raidz1 and you have 2 failing drives, shut down the system now and buy 2 new drives.
these drives should be replaced as soon as possible. it is possible that you will have permanent damage to some files.
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2438
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: S.M.A.R.T and ZFS

Post by b0ssman »

also you should have smart monitoring with email notification enabled.
it would have warned you a lot earlier about your failing drives.
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

^nighthawk^
Starter
Starter
Posts: 23
Joined: 11 Sep 2014 10:02
Status: Offline

Re: S.M.A.R.T and ZFS

Post by ^nighthawk^ »

Thanks I will replace both but will try to replace just ada1 first. If i replace just that one disk and resilver/scrub import... I think this may leave most of the other data intact and retrievable.

I'm aware i might lose some data but is the above assumption correct?

Also, these drives are not really available anymore. Is there an issue with using some other disks, or possibly same brand but larger for compatibilites sake? Will ZFS be happy with that?

Thanks again in advance.

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2438
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: S.M.A.R.T and ZFS

Post by b0ssman »

you can use bigger drives.
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

^nighthawk^
Starter
Starter
Posts: 23
Joined: 11 Sep 2014 10:02
Status: Offline

Re: S.M.A.R.T and ZFS

Post by ^nighthawk^ »

If i can't bring the zpool online for whatever reason (mainly the failed drive).

I obviously can't offline the drive as the zpool is not yet imported.

Probably my mistake for rebooting at some point.

Should I just pull that disk and do an import, or replace that disk with another and then attempt an import?
Is there a specific command that deals with one "dead" disk, when having to reimport a zpool.

Most artciles i've found appear to have the zpool up and running before the disk died, well mine fails on zpool import as the ada1 disk fails.

Any ideas?

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2438
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: S.M.A.R.T and ZFS

Post by b0ssman »

if you cant bring the pool online then you might have a bigger problem.
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

^nighthawk^
Starter
Starter
Posts: 23
Joined: 11 Sep 2014 10:02
Status: Offline

Re: S.M.A.R.T and ZFS

Post by ^nighthawk^ »

Just wanted to say I recovered all the data bar one file and that file was negligible, took a while to check through though...

The failing disks have now been removed and I have re-setup ZFS as a stripe with the final two disks that were ok.

The data is also copied elsewhere now so nothing will be lost in future.

BiLUX
NewUser
NewUser
Posts: 7
Joined: 03 Dec 2014 23:54
Status: Offline

Re: S.M.A.R.T and ZFS

Post by BiLUX »

bossman, where does one acquire information on how and what values to watch? I see the wikipedia page, it seems to list all or most of them, but it does not give reference to how how large some of the numbers should become to start worrying.

Also, is there a script that can read the table and compare it to an older table to warn you? If not, how do I get that table out with something like ECHO $? >smart_log.txt so I can write one?

User avatar
b0ssman
Forum Moderator
Forum Moderator
Posts: 2438
Joined: 14 Feb 2013 08:34
Location: Munich, Germany
Status: Offline

Re: S.M.A.R.T and ZFS

Post by b0ssman »

basically if any of the values of
187 Reported_Uncorrect
197 Current_Pending_Sector
198 Offline_Uncorrectable
become higher than 0 i would not use that drive again.
Nas4Free 11.1.0.4.4517. Supermicro X10SLL-F, 16gb ECC, i3 4130, IBM M1015 with IT firmware. 4x 3tb WD Red, 4x 2TB Samsung F4, both GEOM AES 256 encrypted.

Post Reply

Return to “ZFS (only!)”