Page 1 of 1

APM, AAM, and S.M.A.R.T. Configuration

Posted: 01 Feb 2020 19:52
by jlagermann
After 3 years of FreeNAS/XigmaNAS, what I've learned the hard way about S.M.A.R.T. monitoring, APM and AAM.

Several months ago I had a multi drive failure and did not have any early warnings. There were multiple contributing factors:
  1. I had 6 older 3TB drives laying around I though I would get some use out of.
  2. My email provider changed to require TLS for authentication (instead of SSL) and I did not know. This caused email notifications to stop.
  3. I did not have any regular status reports being emailed. If I had, I would have noticed them stopping and looked into it.
It turns out, the older drives I added to my main pool (as a RAIDZ1 vdev) developed issue shortly after I installed them. For six months S.M.A.R.T. was generating errors and I never knew. S.M.A.R.T. was configured to send emails, including a TEST warning email on startup, and enable S.M.A.R.T. on all new devices when they are added. This system was up for over 300 days before I added the drives so startup emails are very rare for me.

Recommendation #1 - Email setup
  1. Ensure you setup Email right off the start. I now have my own internal SMTP gateway so I don't have to worry about something changing.
  2. Configure routine email reports. It doesn't really matter which reports you select, just select at least one. My system sends me a Status report every Sunday night.
  3. In your S.M.A.R.T. settings, make sure you include a good email address.
    Enabling this option will add the following to smartd.conf for each drive you enable.

    Code: Select all

    -m <email address> -M /etc/mail/smartdreport.sh
    If you also enabled the test email, it will add the following:

    Code: Select all

    -m <email address> -M /etc/mail/smartdreport.sh -M test
Recommendation #2 - General S.M.A.R.T. settings
  1. Power Mode set to Standy. If you are not using APM (Advanced Power Management), this setting does not make a difference. More on APM below.
  2. Enable S.M.A.R.T. monitoring for all new devices as they are added. If you already have devices added, you have to manually enable each one.
  3. Set the correct Temperature Monitoring settings. If you have different types of storage devices you will most likely have different temperature settings for them. For example, WD Red drives have a warning at 65deg C and critical at 85deg C while my nvme's have a warning at 84deg C and critical at 88deg C. (Nvme devices sit much closer to the MB and can get a lot hotter.)To find the manufacture recommended temperature limits you can run the following commands:

    Code: Select all

    smartctl -l scttemp /dev/ada0
    You will get output like the following for a WD Red drive.

    Code: Select all

    # smartctl -l scttemp /dev/ada0
    ....
    SCT Temperature History Version:     2
    Temperature Sampling Period:         1 minute
    Temperature Logging Interval:        2 minutes
    Min/Max recommended Temperature:      0/65 Celsius
    Min/Max Temperature Limit:           -41/85 Celsius
    Temperature History Size (Index):    478 (0)
    
  4. Create self-test schedules for all drives. Unfortunately, you have to enter them one at a time in the GUI (it would be a nice feature to be able to select multiple drives when creating a scheduled test). The following recommendations could also be used for SCRUBs, but that is another topic.
    • for non Pro grade SATA drives - I have found a lot of recommendations for daily short tests and weekly long tests
    • for Pro grade SATA or SAS drives - weekly short tests and monthly long tests seem to be the standard
  5. Enable Email Reports.
  6. Caution: If you enable "Send a TEST warning email on startup", you will get an email for every device you have S.M.A.R.T. enabled on every time smartd is restarted (12 devices = 12 separate emails!) That's probably not what you want.
Disk S.M.A.R.T. settings
S.M.A.R.T. logs directly to the storage device and in most cases, the log entries are one way and cannot be cleared. There are some entries that go up and down on their own, however, there is no manual way to change or reset them. Some of the attributes indicate issues with communications to and from the device, not necessarily with the device itself. For example attribute 199 UDMA_CRC_Error_Count, this normally indicates a bad cable. Once you replace the cable, there is no way to reset the counter, you just have to monitor for the counter to increase. When you create your smartd.conf entry, there is a way to do just that.
In addition to the S.M.A.R.T. settings, you can also configure APM and AAM features here. To find out what your drive is capable of you can run the following command.

Code: Select all

smartctl -g all /dev/ada0 (change the device name appropriately)
This will give you output like the following for a WD Red drive. As you can see, AAM and APM are not available on this drive.

Code: Select all

# smartctl -g all /dev/ada0
smartctl 7.0 2018-12-30 r4883 [FreeBSD 12.1-RELEASE amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Recommendation #3 - Disk S.M.A.R.T. settings
  1. Transfer mode - I don't know of any reason not to use Automatic
  2. Standby time - Frequent starting and stopping HDD's shortens their life expectancy. In standby mode, the disk stops spinning. Only change this setting if the drive will stay in standby mode long enough to make the power savings worth the extra wear and tear on the drive.
  3. Power Management - if you enable power management I don't recommend going below level 128, that will extend the life of the drive. Keep in mind, when a disk is in idle speed, it will have to spin back up to full speed before it can be accessed. That WILL add latency to the start of any read or write job. For WD drives, APM can be made available by placing a physical jumper on pins 3 & 4. Western Digital recommends leaving power management disabled for desktop/workstations and enabling it for enterprise environments and systems that support the appropriate ATA commands. https://support-en.wd.com/app/answers/d ... 0red%20pin
  4. Acoustic level - this feature should ALWAYS be Disabled. According to INCITS (formerly NCITS), AAM was declared an outdated feature in 2010. Seagate removed AAM capabilities in 2008 and WD began doing so in 2011.
  5. Activate S.M.A.R.T.
  6. Use the extra options - I want to know if attribute 199 (UDMA_CRC_Error_Count) increases so I add -R 199! the ! at the end marks it as a critical attribute. You can also use the -I to ignore changes for alerts, like temperature or hours.
  7. Because of a bug in the way the smartd.conf file is built, use cron jobs to run smart tests for nvme drives. The GUI will try to create the nvme smartd.conf entry as /dev/nvd* and it will cause the smartd service to stop.
To find more information about smartd.conf options, FreeBSD smartd.conf Manual Pages

There is a difference between smartd and smartctl; smartd is what you are configuring in the GUI and smartd.conf, smartctl is what you run at the CLI or with a cron job. The options are different.
To find more information about smartctl options, FreeBSD smartctl Manual Pages


If you find something wrong in this post, please let me know. If you have additional recommendations, please comment and maybe we can create some documentation for this topic.