Low throughput via my HBA
Posted: 12 Dec 2014 17:54
I've got a performance problem and I'm looking for ideas.
System:
- Dell PE 2950 III, 32gb ECC, PCIe v.1.x
- Dell MD1000 disk expansion (15 disk SAS/SATA expander), with two controller modules, running in split mode (one controller takes 7 disks, the other takes 8, both controllers directly connected to my HBA).
- 15 x 1TB Hitachi enterprise SATA disk 7.2k rpm in the MD1000
- Dell SAS 6gbps HBA (same as LSI 9200-8e SAS2008), installed in a PCIe 1.x X8 slot, with two connections to the MD1000
- Dell Perc6/i RAID controller, with two Intel 320 series SSD, used for LOG and CACHE.
The problem:
Low throughput from the MD1000 to the system. Not just via ZFS - this issue is an IO issue that affects ZFS performance, not a ZFS performance issue.
I've done a series of tests from the NAS4Free console, and long story short, I cannot seem to get more than about 465MB/sec from the array during sequential reads, regardless of how many disks are being read.
My test approach is as follows:
Read the devices directly using dd, dumping output to /dev/null. No ZFS involvement, this is raw data/raw throughput.
As root:
To gradually reduce the load, I simply fg and kill off two dd's while observing the results.
Results:
With the test running I see individual device IO updated every two minutes. When reading all 15 HDDs, I see an average of 31 MB/sec per device. Since the MD1000 is split into 7 and 8 drives, one side had slightly higher throughput per drive (the half with 7 drives had higher per-drive read performance than the half with 8 drives). Then, I killed off two dd's at a time, resulting in one less drive taxing each half of the MD1000 and HBA interconnects. As I killed off dd's, I saw the per-drive read rate increase. In order to see the maximum sequential read rate of 80MB/sec per drive, I had to reduce the dd's to just three drives per side of the MD1000 (ie 3 drives per SAS connection to the HBA), which resulted in about 73MB/sec per drive. I CAN get it to 80+MB/sec /drive, but I'd have to drop to just two drives per MD1000 connection.
Regardless of the number of drives, the combined throughput remains about the same.
What next?! Where's the bottleneck?
I expected performance to be better, and to be able to max out each HDDs read/write capability. Each disk is capable of 80MB/sec sequential read individually, the interconnect should easily have headroom for this, it's got two SAS 3gbps X 4 connections, one to 7 disks one to 8 disks.
Because the max throughput remains the same regardless of the number of drives being read, I believe the issue is due to one of the following:
- the MD1000 is incapable of sustaining higher read throughput per controller
- the SAS 6gbps HBA (LSI 9200-8e SAS2008) is incapable of sustaining a higher transfer rate per connection, possibly due to PCIe 1.x, or maybe it's just not a good HBA
Does anyone have any suggestions for how I might go about troubleshooting this, or tuning things?
System:
- Dell PE 2950 III, 32gb ECC, PCIe v.1.x
- Dell MD1000 disk expansion (15 disk SAS/SATA expander), with two controller modules, running in split mode (one controller takes 7 disks, the other takes 8, both controllers directly connected to my HBA).
- 15 x 1TB Hitachi enterprise SATA disk 7.2k rpm in the MD1000
- Dell SAS 6gbps HBA (same as LSI 9200-8e SAS2008), installed in a PCIe 1.x X8 slot, with two connections to the MD1000
- Dell Perc6/i RAID controller, with two Intel 320 series SSD, used for LOG and CACHE.
The problem:
Low throughput from the MD1000 to the system. Not just via ZFS - this issue is an IO issue that affects ZFS performance, not a ZFS performance issue.
I've done a series of tests from the NAS4Free console, and long story short, I cannot seem to get more than about 465MB/sec from the array during sequential reads, regardless of how many disks are being read.
My test approach is as follows:
Read the devices directly using dd, dumping output to /dev/null. No ZFS involvement, this is raw data/raw throughput.
As root:
Code: Select all
iostat da0 da1 da2 da3 da4 da5 da6 120 &
sleep 1;
iostat da7 da8 da9 da10 da11 da12 da13 da14 120 &
for X in 0 7 1 8 2 9 3 10 4 11 5 12 6 13 14; do dd if=/dev/da$X of=/dev/null bs=1M & done
Results:
With the test running I see individual device IO updated every two minutes. When reading all 15 HDDs, I see an average of 31 MB/sec per device. Since the MD1000 is split into 7 and 8 drives, one side had slightly higher throughput per drive (the half with 7 drives had higher per-drive read performance than the half with 8 drives). Then, I killed off two dd's at a time, resulting in one less drive taxing each half of the MD1000 and HBA interconnects. As I killed off dd's, I saw the per-drive read rate increase. In order to see the maximum sequential read rate of 80MB/sec per drive, I had to reduce the dd's to just three drives per side of the MD1000 (ie 3 drives per SAS connection to the HBA), which resulted in about 73MB/sec per drive. I CAN get it to 80+MB/sec /drive, but I'd have to drop to just two drives per MD1000 connection.
Regardless of the number of drives, the combined throughput remains about the same.
What next?! Where's the bottleneck?
I expected performance to be better, and to be able to max out each HDDs read/write capability. Each disk is capable of 80MB/sec sequential read individually, the interconnect should easily have headroom for this, it's got two SAS 3gbps X 4 connections, one to 7 disks one to 8 disks.
Because the max throughput remains the same regardless of the number of drives being read, I believe the issue is due to one of the following:
- the MD1000 is incapable of sustaining higher read throughput per controller
- the SAS 6gbps HBA (LSI 9200-8e SAS2008) is incapable of sustaining a higher transfer rate per connection, possibly due to PCIe 1.x, or maybe it's just not a good HBA
Does anyone have any suggestions for how I might go about troubleshooting this, or tuning things?