Page 1 of 1

ZFS speed & memory issue

Posted: 17 Nov 2013 07:40
by Impact
My ZFS pool has a dead drive and has begun to run very slowly. I am currently running the 'zpool replace' task, but it is moving very very slowly. I think my ZFS arc tuning may be part of the problem, as the machine never seems to use more than 1GB of the system memory (8GB installed).

Could somebody please offer some insight? I am pulling out my hair trying to get it back into an operating state. Here is all the relevant information I could find:

top
CPU: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Mem: 6524K Active, 9928K Inact, 529M Wired, 256K Cache, 14M Buf, 7335M Free

/boot/loader.conf
kernel="kernel"
bootfile="kernel"
kernel_options=""
kern.hz="100"
# ZFS kernel tune
vm.kmem_size="7000M"
vfs.zfs.arc_min="5000M"
vfs.zfs.arc_max="6000M"
vfs.zfs.prefetch_disable="1"
vfs.zfs.zil_disable="0"
vfs.zfs.txg.timeout="10"
vfs.zfs.vdev.max_pending="3"
vfs.zfs.vdev.min_pending="6"

zpool status

Code: Select all

  pool: Tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 7h2m, 2.41% done, 285h59m to go
config:

	NAME           STATE     READ WRITE CKSUM
	Tank           ONLINE       0     0     0
	  raidz1       ONLINE       0     0     0
	    replacing  ONLINE       0     0     0
	      ad14     ONLINE      16     0     0
	      ad10     ONLINE       0     0     0  38.8G resilvered
	    ad15       ONLINE       0     0     0  62.9M resilvered
	    ad16       ONLINE       0     0     0  70.5M resilvered
	    ad17       ONLINE       0     0     0  62.8M resilvered
	    ad18       ONLINE       0     0     0  70.8M resilvered
	    ad20       ONLINE       0     0     0  62.9M resilvered

errors: No known data errors
uname -a
FreeBSD freenas 7.3-RELEASE-p7 FreeBSD 7.3-RELEASE-p7 #0: Sun Oct 9 05:11:39 JST 2011 aoyama@fbsd7.freenas.local:/usr/obj/freenas/usr/src/sys/FREENAS-amd64 amd64

Re: ZFS speed & memory issue

Posted: 17 Nov 2013 07:57
by b0ssman
vfs.zfs.prefetch_disable="1" should be "0"

Re: ZFS speed & memory issue

Posted: 17 Nov 2013 08:11
by Impact
Thank you for the quick reply, b0ssman. If I change the vfs.zfs.prefetch_disable setting, can I reboot while it is replacing the drive? Do you think that will increase the memory usage to speed things up?

Thanks!
Impact

Re: ZFS speed & memory issue

Posted: 17 Nov 2013 08:16
by b0ssman
i dont think it will impact the resilver speed, but i am not 100% sure.

2,4% in 7 hours is very slow.

Can you post the hardware and controller used?

Re: ZFS speed & memory issue

Posted: 17 Nov 2013 08:28
by substr
why is your max_pending less than your min_pending? Are the values reversed?

Re: ZFS speed & memory issue

Posted: 17 Nov 2013 08:31
by Impact
This same ZFS pool used to run at very usable speeds. I believe the machine used to sit at nearly 100% memory usage before I did an update and mucked with the ZFS settings. My best guess is that I messed something up, but I cannot figure out what. I know that ZFS tends to need a lot of memory to run quickly.

Here are the machine specs:
Motherboard: ASUS P5G43T-M Pro LGA 775 Intel G43 HDMI Micro ATX
Processor: Intel Celeron E3300 Wolfdale 2.5GHz LGA 775 65W Dual-Core
Drives: Western Digital WD AV-GP WD20EVDS 2TB 32MB Cache SATA 3.0Gb/s 3.5"
Controller: SYBA SD-SATA2-4IR PCI SATA II (3.0Gb/s) RAID Controller Card
Memory: G.SKILL Ripjaws Series 8GB (2 x 4GB) 240-Pin DDR3 SDRAM DDR3 1066 (PC3 8500)
Boot drive: Transcend 4GB Compact Flash (CF) Flash Card (connected by SYBA SD-CF-IDE-A IDE to Compact Flash Adapter)

Some of the drives are connected directly to the motherboard, others are connected through the PCI SATA II card.

Thanks,
Impact

Re: ZFS speed & memory issue

Posted: 17 Nov 2013 10:58
by ku-gew
Why are 38 GB resilvered in the new drive and some MB resilvered in the other drives too?

Since I never did a "replace" I ask: you seem to have a 5+1 RAIDZ1. Did you add the new drive, or did you remove the failed one?
If you didn't remove the failed one, I expect ZFS to try to use that one as well to resilver, making everything slower (if it is failing, reads take much longer).
Take the failed drive out and it may be faster.

And please keep us posted, this is interesting.

BTW: you have about 55% probability of having a read error in the GOOD drives during rebuild. If you keep the failing drive, that error will be corrected, if you take it out, the rebuild will be faster but then who knows what will be the outcome of an uncorretable read error during rebuild. If you are lucky, it will be skipped and you will lose a block, if you are unlucky... everything stops.

As info: formula to calculate the probability of an uncorrectable read error:
a= error probability (1E-14 for consumer drives, 1E-15 or 1E-16 for enterprise drives)
b= n-1 size of pool
c= size of each disk

probability= 1–(1-a)^(8*b*c)
a= 1E-14, b= 5, c= 2E12 (=2TB) and you get 55%.

Re: ZFS speed & memory issue

Posted: 17 Nov 2013 20:55
by Impact
ku-gew,

Thanks for the response. Yes, I have left the old drive in while replacing it with an identical drive that was already in the machine. I do suspect that this is slowing down the resilvering, however I still think that the memory usage is also involved.

substr,

The switched vfs.zfs.vdev.max_pending and vfs.zfs.vdev.min_pending values are definitely a problem. I wonder what affect they are having?

Today:

Code: Select all

  pool: Tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 20h32m, 10.75% done, 170h28m to go
config:

	NAME           STATE     READ WRITE CKSUM
	Tank           ONLINE       0     0     0
	  raidz1       ONLINE       0     0     0
	    replacing  ONLINE       0     0     0
	      ad14     ONLINE      16     0     0
	      ad10     ONLINE       0     0     0  173G resilvered
	    ad15       ONLINE       6     0     0  118M resilvered
	    ad16       ONLINE       0     0     0  130M resilvered
	    ad17       ONLINE       0     0     0  117M resilvered
	    ad18       ONLINE       0     0     0  131M resilvered
	    ad20       ONLINE       0     0     0  118M resilvered

errors: No known data errors

Thanks,
Impact

Re: ZFS speed & memory issue

Posted: 17 Nov 2013 22:05
by ku-gew
I never resilvered a RAIDZ1, but I can't see why memory should affect resilvering speed. Data are always taken from the drive during resilver and almost no memory is needed for the low-level process, it's just a bunch of XORs.
Try posting the first 3-5 outputs of "zpool iostat -v 5".

Re: ZFS speed & memory issue

Posted: 18 Nov 2013 00:41
by substr
If the switched values are having any effect, it would be to limit the number of I/Os that can be issued at once. There is a small chance it is the cause of slow performance.

Fixing the values and rebooting should be all it takes. The resilver will continue after the reboot.

You should be concerned why you have so many corrections over all the drives. Have you ever run a scrub before?

Re: ZFS speed & memory issue

Posted: 18 Nov 2013 21:53
by Impact
As requested, here are the first five results of the "zpool iostat -v 5" command. I have no idea if these are good numbers or not.

Code: Select all

                  capacity     operations    bandwidth
pool            used  avail   read  write   read  write
-------------  -----  -----  -----  -----  -----  -----
Tank           9.44T  1.47T     77      2  9.52M  12.8K
  raidz1       9.44T  1.47T     77      2  9.52M  12.8K
    replacing      -      -      0     80      0  1.94M
      ad14         -      -      0      0    292  1.65K
      ad10         -      -      0     50      0  1.94M
    ad15           -      -     38      1  1.94M  2.59K
    ad16           -      -     22      1  1.94M  2.84K
    ad17           -      -     22      1  1.94M  2.59K
    ad18           -      -     38      1  1.94M  2.85K
    ad20           -      -     38      1  1.94M  2.60K
-------------  -----  -----  -----  -----  -----  -----

                  capacity     operations    bandwidth
pool            used  avail   read  write   read  write
-------------  -----  -----  -----  -----  -----  -----
Tank           9.44T  1.47T      0      2      0  7.18K
  raidz1       9.44T  1.47T      0      2      0  7.18K
    replacing      -      -      0      2      0  1.50K
      ad14         -      -      0      0      0  1.50K
      ad10         -      -      0      3      0  5.99K
    ad15           -      -      0      1      0  1.30K
    ad16           -      -      0      3      0  5.79K
    ad17           -      -      0      4      0  5.09K
    ad18           -      -      0      4      0  5.59K
    ad20           -      -      0      4      0  5.19K
-------------  -----  -----  -----  -----  -----  -----

                  capacity     operations    bandwidth
pool            used  avail   read  write   read  write
-------------  -----  -----  -----  -----  -----  -----
Tank           9.44T  1.47T      0      0      0  1.40K
  raidz1       9.44T  1.47T      0      0      0  1.40K
    replacing      -      -      0      0      0    408
      ad14         -      -      0      0      0    408
      ad10         -      -      0      0      0      0
    ad15           -      -      0      0      0    408
    ad16           -      -      0      0      0      0
    ad17           -      -      0      0      0      0
    ad18           -      -      0      0      0      0
    ad20           -      -      0      0      0      0
-------------  -----  -----  -----  -----  -----  -----

                  capacity     operations    bandwidth
pool            used  avail   read  write   read  write
-------------  -----  -----  -----  -----  -----  -----
Tank           9.44T  1.47T      0      1      0  3.69K
  raidz1       9.44T  1.47T      0      1      0  3.69K
    replacing      -      -      0      1      0   1021
      ad14         -      -      0      0      0   1021
      ad10         -      -      0      0      0      0
    ad15           -      -      0      0      0    919
    ad16           -      -      0      0      0      0
    ad17           -      -      0      0      0      0
    ad18           -      -      0      0      0      0
    ad20           -      -      0      0      0      0
-------------  -----  -----  -----  -----  -----  -----

                  capacity     operations    bandwidth
pool            used  avail   read  write   read  write
-------------  -----  -----  -----  -----  -----  -----
Tank           9.44T  1.47T      0      1      0  5.59K
  raidz1       9.44T  1.47T      0      1      0  5.59K
    replacing      -      -      0      1      0  1.30K
      ad14         -      -      0      0      0  1.30K
      ad10         -      -      0      0      0      0
    ad15           -      -      0      0      0  1.20K
    ad16           -      -      0      0      0      0
    ad17           -      -      0      0      0      0
    ad18           -      -      0      0      0      0
    ad20           -      -      0      0      0      0
-------------  -----  -----  -----  -----  -----  -----
Also, here is the status today:

Code: Select all

  pool: Tank
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 45h29m, 19.24% done, 190h54m to go
config:

	NAME           STATE     READ WRITE CKSUM
	Tank           ONLINE       0     0     0
	  raidz1       ONLINE       0     0     0
	    replacing  ONLINE       0     0     0
	      ad14     ONLINE      16     0     0
	      ad10     ONLINE       0     0     0  310G resilvered
	    ad15       ONLINE       6     0     0  176M resilvered
	    ad16       ONLINE       0     0     0  194M resilvered
	    ad17       ONLINE       0     0     0  176M resilvered
	    ad18       ONLINE       0     0     0  195M resilvered
	    ad20       ONLINE       0     0     0  176M resilvered

errors: No known data errors
I am going to try correcting the loader.conf values (min_pending, max_pending, and prefetch_disable) and rebooting next. I will let you all know what happens.

Thanks for the help!
Impact

Re: ZFS speed & memory issue

Posted: 18 Nov 2013 22:30
by ku-gew
The first set of iostatus shows that on average since last boot your disks have some read/write operations per second, but in the following iostatuses you see there are practically no read/write operations per second: less then 5 per second, the disks can do at least 30 IOPs.
Result: it's not a matter of memory, as I thought, it's a matter of timeouts. Disconnect (o remove from pool, it's enough) the broken one and see if it gets faster. It should.
However, I have NO IDEA what will happen if there are failed checksums in the other drives (and apparently you have a lot of them).

So, basically get over it and wait for the resilver to complete at this speed, there is nothing you can do to speed up the process.
Next time do frequent (weekly) scrubs on the pool and then you will be able to disconnect the failing disk without too much fear. You should also set up daily "short selftest" and weekly "long selftest" on each drive, NAS4free makes it very easy. Just do not let the long selftest and the scrub overlap (they both take up to 1 day to complete).

Re: ZFS speed & memory issue

Posted: 18 Nov 2013 22:32
by ku-gew
Is ad10 or ad14 failing?

Re: ZFS speed & memory issue

Posted: 18 Nov 2013 22:48
by Impact
ku-gew,

ad14 is the drive that is failing. Thank you for the help, I will wait it out and see what happens when it finishes and I can safely remove ad14.

Thanks,
Impact