Scrub speeds vary greatly between identical systems
Posted: 25 Nov 2013 15:18
Hello folks,
We have a setup as follows:
2 x Dell R-310 (16GB RAM each, 4 x SATA 2TB hdd each)
All drives are connected directly to the on-board SATA controller (no raid controller is installed)
nas4free 9.1.0.1 - Sandstorm (revision 636)
nas4free (primary) is using a raidz1 NFS exported dataset that connects to a XEN server pool as its main storage for running VM's
nas4free (secondary) is using a raidz1 NFS exported dataset that connects to nas4free primary to backup snapshots of nas4free primary
Both system's hardware and firmwares are identical (hard drives as all identical as well)
Drive tests indicate similar speeds on all 8 disks
On Sundays we scrub both pools and these are the results:
pool: primary-pool
state: ONLINE
scan: scrub repaired 0 in 2h18m with 0 errors on Sun Nov 24 06:18:23 2013
config:
NAME STATE READ WRITE CKSUM
primary-pool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
errors: No known data errors
pool: secondary-pool
state: ONLINE
scan: scrub repaired 0 in 0h22m with 0 errors on Sun Nov 24 16:22:27 2013
config:
NAME STATE READ WRITE CKSUM
secondary-pool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
errors: No known data errors
We can see that primary scrubs about 185GB in 138 minutes and secondary scrubs same amount in about 22 minutes. We are certainly aware that the primary has running VM's and the secondary is only scrubbing, however the time difference is quite high. More than 6 times longer to scrub the same size pool and amount of data on identical systems.
Certainly there will be a difference in scrub speeds. During scrubbing there is little or no VM access and snapshots are turned off as is replication. Does anyone else find this discrepancy to be odd? We have similar systems in a backup configuration for our company backups and the difference between scrubs is less than 2 times longer.
Our concern is that once we get into the terabyte range, the scrub times on the primary pool will become excessive. I know this topic has been beaten to death however everything I tried has yielded no positive effect. Does anyone experience similar scrub speeds? Is this what I should expect? Any pointers or shared experience with similar issues would be very much appreciated.
Thanks to the nas4free team, your product is excellent and a trusted part of our infrastructure.
Darren
We have a setup as follows:
2 x Dell R-310 (16GB RAM each, 4 x SATA 2TB hdd each)
All drives are connected directly to the on-board SATA controller (no raid controller is installed)
nas4free 9.1.0.1 - Sandstorm (revision 636)
nas4free (primary) is using a raidz1 NFS exported dataset that connects to a XEN server pool as its main storage for running VM's
nas4free (secondary) is using a raidz1 NFS exported dataset that connects to nas4free primary to backup snapshots of nas4free primary
Both system's hardware and firmwares are identical (hard drives as all identical as well)
Drive tests indicate similar speeds on all 8 disks
On Sundays we scrub both pools and these are the results:
pool: primary-pool
state: ONLINE
scan: scrub repaired 0 in 2h18m with 0 errors on Sun Nov 24 06:18:23 2013
config:
NAME STATE READ WRITE CKSUM
primary-pool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
errors: No known data errors
pool: secondary-pool
state: ONLINE
scan: scrub repaired 0 in 0h22m with 0 errors on Sun Nov 24 16:22:27 2013
config:
NAME STATE READ WRITE CKSUM
secondary-pool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
errors: No known data errors
We can see that primary scrubs about 185GB in 138 minutes and secondary scrubs same amount in about 22 minutes. We are certainly aware that the primary has running VM's and the secondary is only scrubbing, however the time difference is quite high. More than 6 times longer to scrub the same size pool and amount of data on identical systems.
Certainly there will be a difference in scrub speeds. During scrubbing there is little or no VM access and snapshots are turned off as is replication. Does anyone else find this discrepancy to be odd? We have similar systems in a backup configuration for our company backups and the difference between scrubs is less than 2 times longer.
Our concern is that once we get into the terabyte range, the scrub times on the primary pool will become excessive. I know this topic has been beaten to death however everything I tried has yielded no positive effect. Does anyone experience similar scrub speeds? Is this what I should expect? Any pointers or shared experience with similar issues would be very much appreciated.
Thanks to the nas4free team, your product is excellent and a trusted part of our infrastructure.
Darren