Page 1 of 1

SLOG'ing What am I doing wrong?

Posted: 20 Jun 2016 10:20
by Manxmann
Hey Folks,

After many hours of Googling I'm at a loss as to the problem I'm having so thought it was time to ask those better informed than myself :)

So first off let me describe my home lab environment:

N4F Host:
HP DL160 G6 (4x3.5" Chassis)
2 x Intel L5630 (4core 8thred) CPU's
68Gb RAM ECC
1 x Quad Port Broadcom Gig NIC (Broadcom NetXtreme Gigabit Ethernet, ASIC rev. 0x5719001) MTU 9000
1 x Quad Port SATA 3 Adapter (Marvell 88SE9230 AHCI SATA controller)

Discs:
ada1 Corsair Force LS SSD SandForce Driven SSDs 57242MB 14308168000101670012 Solid State Device 6.0 Gb/s Available, Enabled ahcich2 Marvell 88SE9230 AHCI SATA controller 30 °C ONLINE
pass2 Marvell Console 1.01 Marvell Console 1.01 MB n/a Unknown 150.000MB/s Unavailable ahcich7 Marvell 88SE9230 AHCI SATA controller n/a ONLINE
ada2 ST33000651AS Seagate Barracuda XT 2861589MB Z290YTG3 7200 rpm 3.0 Gb/s Available, Enabled ahcich8 Intel ICH10 AHCI SATA controller 31 °C ONLINE
ada3 ST33000651AS Seagate Barracuda XT 2861589MB Z294XQCY 7200 rpm 3.0 Gb/s Available, Enabled ahcich9 Intel ICH10 AHCI SATA controller 31 °C ONLINE
ada4 ST33000651AS Seagate Barracuda XT 2861589MB Z290X0CS 7200 rpm 3.0 Gb/s Available, Enabled ahcich10 Intel ICH10 AHCI SATA controller 32 °C ONLINE
ada5 ST33000651AS Seagate Barracuda XT 2861589MB Z294QWMV 7200 rpm 3.0 Gb/s Available, Enabled ahcich11 Intel ICH10 AHCI SATA controller 31 °C ONLINE
da0 SanDisk Cruzer Facet 1.27 SanDisk Cruzer Facet 1.27 7634MB 4C530102011102101190 Unknown 40.000MB/s Unavailable umass-sim0 Intel 82801JI (ICH10) USB 2.0 controller USB-A n/a ONLINE

The 4x Seagate drives are connected to the on-board ICH10 Sata3 controller and the SSD is connected to the PCI-e add-in card.

ZFS:

pool: RAIDZ
state: ONLINE
scan: scrub repaired 0 in 6h26m with 0 errors on Sun Jun 12 04:17:10 2016
config:

NAME STATE READ WRITE CKSUM
RAIDZ ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada4 ONLINE 0 0 0
ada3 ONLINE 0 0 0
ada5 ONLINE 0 0 0

2 x DS

iSCSI / comp - lz4 / dedup - no / sync - disabled
NFS / comp - lz4 / dedup - no / sync - disabled

Client:

DL385 G6 running XenServer 6.5
4 x iSCSI LUN's Multipath enabled

330000000eeafaaf9 dm-0 FreeBSD,iSCSI DISK
size=4.0T features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 16:0:0:0 sdj 8:144 active ready running
|- 15:0:0:0 sdf 8:80 active ready running
`- 14:0:0:0 sdb 8:16 active ready running

All extents are file based and located within the iSCSI DS.

Switch:
Netgear GS724T
VLAN's 12/22/32 carry respective iSCSI traffic
Jumbo frames enabled

So again this is a 'Home Lab' environment, it is not intended as a mission critical tool instead it is a learning/pre-testing tool for my work.

Having read everything I can find about using SLOG/L2ARC to improve ZFS performance I thought I'd give it a try as running sync=disabled in a prod environment would be a no go! Knowing the SATA2 ports on offer from the Intel chipset would be an issue I've added a Marvel based PCI-e x2 SATA host card. To this I've connected a single 60G SSD,

A quick disktool test indicates this SSD gives me a transfer speed of around 350MBps, so certainly faster than the spinning rust and with a much higher IOPS rate.

From the CMD line I created two GPT partitions on the SSD 1x20G 1x34G. I then added partition 1 as a LOG device and partition 2 as a L2ARC to my pool.

I then changed the sync configuration of the DS to Sync = Always.

Now the problem.

From a Windows Server 2012 R2 VM running Crystal Disk 5.1.x I see the following performance:

No SLOG/Sync = Disabled
2G (x9 test)
4k Seq Q32 - Read 316MBs / Write 210MBs
4k Random Q32 - Read 91MBs / Write 72MBs
4k Seq - Read 166MBs / Write 146MBs
4k Random - Read 7MBs / Write 6MBs

SLOG/Sync = Always
2G (x9 test)
4k Seq Q32 - Read 310MBs / Write 11MBs
4k Random Q32 - Read 89MBs / Write 7MBs
4k Seq - Read 162MBs / Write 6MBs
4k Random - Read 6MBs / Write 1MBs

As you can see write performance has dropped through the floor! Obviously I expected to see a performance hit moving from Sync=Disabled to Sync=Always but nothing like this.

Advice / suggestions would be much appreciated.

Re: SLOG'ing What am I doing wrong?

Posted: 20 Jun 2016 10:53
by Parkcomm
when you say before SLOG/Sync do you mean before slog and before sync or do you mean before slog but with sync. Please post all three:
no slog async
no slog sync
with slog sync

Also you only need about 4G ZIL on you SLOG - anything else should be used for L2ARC

Re: SLOG'ing What am I doing wrong?

Posted: 20 Jun 2016 11:52
by Manxmann
Ah yep just adding the missing piece now.

So NO SLOG/ Sync = Always i.e. ZIL is now on array.

2G (x9 test)
4k Seq Q32 - Read 271MBs / Write 5MBs
4k Random Q32 - Read 84MBs / Write 0.3MBs
4k Seq - Read 143MBs / Write 3MBs
4k Random - Read 6MBs / Write 0.5MBs

So even more horrid numbers, however it does show that the SLOG is doubling ZIL performance.

Which then begs the question does anyone else have numbers for before/after adding a slog for sync=always writes?

I appreciate 4 SATA spindles will always provide poor performance so was hoping to see a MUCH bigger improvement fro a 400+ MBs SSD, very confused.

Re: SLOG'ing What am I doing wrong?

Posted: 20 Jun 2016 12:21
by Parkcomm
You can see why I asked. The external is doing its job pretty well, the problem is with the underlying async handshake.

So just to explain what the Zil is doing - when DS writes to the disk it writes to the Zil AND it begins writing to the disk. The Zil receives the file much faster and sends a handshake back to the initiator, which sends the next packet - but the is not even caching, it is merely storing the data in the case that there is a power outage. So the Zil is merely speeding up the handshaking in normal operation but for a sequential data you are doing two writes not one.

Your sync transfers look horrendous - I'll see if I can pull my own performance data (no idea if or where I stored it)

Re: SLOG'ing What am I doing wrong?

Posted: 20 Jun 2016 13:10
by Parkcomm
Using iozone w NFS share

Code: Select all

iozone -s 2048 -t2
...

Code: Select all

	File size set to 2048 kB
	Command line used: iozone -s 2048 -t2
	Output is in kBytes/sec
	Time Resolution = 0.000001 seconds.
	Processor cache size set to 1024 kBytes.
	Processor cache line size set to 32 bytes.
	File stride size set to 17 * record size.
	Throughput test with 2 processes
	Each process writes a 2048 kByte file in 4 kByte records
async

Code: Select all

	Children see throughput for  2 initial writers 	=  129044.68 kB/sec
	Parent sees throughput for  2 initial writers 	=   70725.98 kB/sec
	Min throughput per process 			=   32599.87 kB/sec 
	Max throughput per process 			=   96444.81 kB/sec
	Avg throughput per process 			=   64522.34 kB/sec
	Min xfer 					=     748.00 kB
sync

Code: Select all

	Children see throughput for  2 initial writers 	=   21803.33 kB/sec
	Parent sees throughput for  2 initial writers 	=   15886.63 kB/sec
	Min throughput per process 			=    7470.52 kB/sec 
	Max throughput per process 			=   14332.81 kB/sec
	Avg throughput per process 			=   10901.66 kB/sec
	Min xfer 					=    1252.00 kB
Note - the performace hit is significant even with a SLOG AND I am in the middle of a resilver which may impact actual speeds.

Re: SLOG'ing What am I doing wrong?

Posted: 20 Jun 2016 13:50
by Manxmann
Thanks for the reply. Hopefully I do understand what a SLOG is doing, i.e. by using a 'fast' device to write the ZIL to the SLOG it means the storage system can report back the write has completed 'faster' to the client and hence the client can then send the next store request. In the meantime the final write of the data continues to the RAIDZ array. So providing a sudo decoupling of the Sync notification to the actual commit. In the case of a power failure/system failure the SLOG is read (only time it happens) to replay the transactions confirmed as being committed to the client and any missing entries from the POOL committed. (http://www.freenas.org/blog/zfs-zil-and ... mystified/)

So I thought I'd see my SLOG quickly fill up during the client write operation and then slowly drain as the commits are made to the 'rust' and the SLOG purged.

So I'm guessing that somewhere in ZFS there's must be a flag that says 'Woah thats enough SLOG'ing lets stop now and wait for the rust'? i.e. a maximum SLOG/ZIL size?

From all this I thought SLOG/ZIL should perform at least at the speed of the SSD, the only issue becomes with SLOG sizing i.e. if the SLOG device fills to capacity before the rust can write the data to permanent storage and hence drain the SLOG then performance tanks as if the SLOG did not exist.

Re: SLOG'ing What am I doing wrong?

Posted: 20 Jun 2016 15:14
by Parkcomm
The ZIL is flushed periodically (I believe every five seconds) - its not really a leaky bucket and won't continue to fill up. So if you have a single 1G link the external ZIL can only hold .625GB.

Also the external ZIL has a write penalty of two write per transaction, in addition to your standard RAID writes (e.g. 2 writes for a mirror).

So if you are doing continuous transfers you will see a handshake (burst) every five seconds instead of many continuous handshake, however as you can see there is a significant penalty for handshaking.

The good news is the sequential writes do not go through the ZIL (if sync=standard, if sync=always then sequential writes also go through the Zil). This means that large file transfers go through at the higher rate above, but database transaction are slower, but reliable.

I found the best way to test the ZIL was to implement a BAMP stack application (or several) and try them with the ZIL in the SLOG - and without. You may find snappier response when the ZIL is in place (not very scientific). Note however MySQL has its own caches which reduce the importance of the ZIL if implemented correctly.

Re: SLOG'ing What am I doing wrong?

Posted: 20 Jun 2016 17:47
by Manxmann
K got it, I think. It's a combination of the short cycle 'flush' operation, my 'sync=always' and the nature of the writes I'm performing that are the issue, heavy! Guess I wasn't thinking 4th dimensionally.

If I understand this correctly you get to 'enjoy' the full SSD speed only so long as your writes do not exceed 5s in duration as otherwise the SLOG effectively stalls, while it waits for the slow rust to complete its commits, before accepting new ZIL writes.

Why you would choose to implement a high speed logging solution such as the SLOG and then adopt a full 'flush' mechanism over say a FIFO ladder (simplistic I realise)? I guess if you're going to do a full flush choosing a short cycle time at least means there's less waiting for the slow disks but it severely hampers the usefulness.

Which nicely brings this in line with your last observation. So a SLOG really is targeted at a very specific operating window which my iSCSI target use falls well outside.

BTRFS here I come :) (Maybe not)

Thanks again, at least I now understand a little better.

Re: SLOG'ing What am I doing wrong?

Posted: 20 Jun 2016 23:56
by Parkcomm
Please note I still think your sync speeds are pretty abysmal. My ratio sync to async is 6:1 yours is in the 10-20:1, not that could be tool you used or it could be that you initially start at a higher transfer speed, but it looks pretty bad.

btw - what is your use case?

Re: SLOG'ing What am I doing wrong?

Posted: 04 Jul 2016 21:57
by Manxmann
Sorry for the late reply.

My NAS is used 100% as a iSCSI target for a XenServer host (Now running XenServer 7.0)

On the XenServer I have a number of VM ranging from Win2k12r2 AD & Exchange, various Debian / Centos servers for mail, FW, Load balancing, Owncloud through to OpenBSD running Java & Serviio for DLNA media serving.

However at 'work' I'm about to deploy several multi node XenServer 7 pools running iSCSI via 10G ethernet to a shared storage solution. Hosted on these will be various online gaming platforms utilising everything from Linux/Windows running MySql/Postgresql/Couch/Erlang etc etc.

So far I've spec'd up the white box servers, networking and firewall infrastructure but instead of heading down the EMC/Nimble/Netgear route for a iSCSI storage solution wondered if NAS4FREE + High spec SuperMicro storage server would be a smarter choice.

In general our game platforms are not overly disc i/o bound, instead everything is held in memory. Disc i/o being generally logging along with account and configuration data. So absolute write speed isn't critical but sync commits on the DB server VM discs is highly desirable.

My home lab is intended to simulate this setup in order to gauge any likely performance problems. I've just taken delivery of a couple of Intel 10G SFP+ cards which I'll use to upgrade my storage connection over the next couple of days.

P.S. I have a 2nd DL160 G6, this one being the 8x2.5" Chassis, I'm considering upping the number of spindles in the array, this will better match the production units, by buying 8 x WD 2.5" RED 1TB drives and moving over to this host for my NAS. From peoples experience how does doubling the number of spindles affect performance?

Re: SLOG'ing What am I doing wrong?

Posted: 05 Jul 2016 11:59
by Parkcomm
Write speed - not much


Sent from my iPhone using Tapatalk