*New 11.3 series Release:
2019-10-19: XigmaNAS 11.3.0.4.7014 - released

*New 12.0 series Release:
2019-10-05: XigmaNAS 12.0.0.4.6928 - released!

*New 11.2 series Release:
2019-09-23: XigmaNAS 11.2.0.4.6881 - released!

We really need "Your" help on XigmaNAS https://translations.launchpad.net/xigmanas translations. Please help today!

Producing and hosting XigmaNAS costs money. Please consider donating for our project so that we can continue to offer you the best.
We need your support! eg: PAYPAL

ZFS + iSCSI requires sync=always?

iSCSI over TCP/IP.
Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
rkagerer
NewUser
NewUser
Posts: 5
Joined: 20 Feb 2013 07:13
Status: Offline

ZFS + iSCSI requires sync=always?

#1

Post by rkagerer » 06 Jul 2013 14:28

Hi everyone,

I've been testing the use of NAS4Free to provide a high-speed, reliable datastore for VMware ESXi. So far it's been going really well, but I've run into a possible snag with the iSCSI target implementation. After troubleshooting and searching around Google, I was dismayed to find indications that istgt doesn't honour the protocol's SYNCHRONIZE CACHE commands. This presents a reliability concern, i.e. risk of data loss in the case of power outage, UPS failure, kernel panic, accidental hard-reset, etc.

I'm surprised there isn't more information out there about this (e.g. bug reports, big warning labels), and want to reach out and ask if anyone else is familiar with the issue and can comment on it?

I've included some more detail below...

My first suspicions arose while running benchmark tests which I expected to utilize the ZIL - e.g. 4KB random writes - and eventually I wrote a small test program to investigate further, which forces lots of small, synchronous writes marked with "write-through" as well as disk flushes. When monitoring with "zpool iostat -v 1" I saw zero ZIL activity when sync=standard. It didn't matter if the target's Write Cache setting (under Services | iSCSI Target | Target | Edit) was on or off. I'm still wondering if there's something else I need to do to see an impact from that checkbox.

I exported the whole zpool to Nexenta (based on OpenIndiana), performed identical tests, and found the ZIL activity was as expected (provided their iSCSI writeback cache option is off). I've tried to investigate in NAS4Free and disable cache settings down through the whole stack*, and it still seems to me like the issue is in the iSCSI target. I found a few posts that concur, and suggestions that running iSCSI + istgt with sync=standard is essentially unsafe:

http://forums.freebsd.org/showthread.php?t=31716#5
http://forums.freebsd.org/showthread.php?t=38961#3
http://forums.freebsd.org/showthread.php?t=31694

I was really surprised these few mentions were all I could find about an issue that feels to me warrants a bit more attention. i.e. It's tough to engineer a truly reliable system if the storage subsystem lies about whether your write has been committed to persistent storage.

I know I can set "sync=always" to work around the problem, but that seems like overkill as it will treat ALL writes as synchronous rather than just those requested by the application. It cut my write performance by roughly a third, even with an SSD ZIL.

I realize I could also use NFS instead of iSCSI, but the write performance I'm seeing there is even worse than iSCSI with sync=always.

I've also heard the FreeBSD foundation is working on an iSCSI kernel implementation, but it won't be available until later this year at the earliest, and I'd rather not be the first gineau pig on that bandwagon.

Also considered trying out the older iscsi-target port to see if it has the same issue - has anyone played with that at all?

Is anyone aware of whether it might be possible to get istgt modified to pass the SYNCHRONIZE CACHE command through to ZFS? (I'd be happy to make a donation toward someone's time on that...)

Any comments or advice would be appreciated.

* My vague understanding is the stack looks something like this:
My Application --> .NET File Buffers --> Windows File Buffers --> Windows disk driver buffers (i.e. "write cache" setting for the disk) --> VMware (which I confirmed never caches) --> iSCSI --> ZFS

** For completeness, here's a related discussion re Nexenta: http://www.nexentastor.org/boards/1/topics/2078

User avatar
Lee Sharp
Advanced User
Advanced User
Posts: 255
Joined: 13 May 2013 21:12
Contact:
Status: Offline

Re: ZFS + iSCSI requires sync=always?

#2

Post by Lee Sharp » 07 Jul 2013 02:49

I have been going at this from the other end and trying to improve NFS performance with ESXi... I have only had reasonable performance with vfs.zfs.cache_flush_disable="1" or iSCSI. Looking at the documentation, the risk of data loss with iSCSI or vfs.zfs.cache_flush_disable="1" is rather low, and the risk of data corruption is non-existant...

Not trying to say what way to go, but just giving a pep talk to a fellow brother in arms actually trying to figure out what is really happening with NFS. :)

rkagerer
NewUser
NewUser
Posts: 5
Joined: 20 Feb 2013 07:13
Status: Offline

Re: ZFS + iSCSI requires sync=always?

#3

Post by rkagerer » 08 Jul 2013 04:51

I'm able to recreate the issue consistently and have captured it on video: http://youtu.be/e3SKXYI2H-E

Here's a run-down. The first thing I show is that the "write cache" option for istgt is turned off. I've tested with it on and off (and confirmed the setting via istgt trace logs) and it makes no difference to this issue.

Next I show that sync=standard for my first test. In this mode I would expect typical writes to be cached in memory before being written to disk, and those sent as write-through (i.e. Force Disk Access bit is on in the iSCSI op) to be written to the ZIL before the command returns.

In my guest VM, I start a program that appends numbers continuously to a text file. It opens the file with the FileOptions.WriteThrough flag set, which is supposed to indicate down the stack that write caching should be bypassed. For good measure, it also calls the FlushFileBuffers API after every write. It doesn't display the sequence number until this call returns.

Then I forcibly reboot the NAS4Free box. You can see my Windows VM freezes, and the last confirmed write was #18304.

After rebooting both VM's, I open the file and see that only the writes up to #10994 were persisted to disk. The most recent ~7000 writes (~50 kB) leading up to the power outage event were lost.

I would expect this behavior for "normal" writes, but not when I'm explicitly requesting write-through and flushing caches after each one.

Next I set sync=always and repeat the same test. This time you see that every single write, right up until the power outage at #8814, are persisted to disk. I think this may also suggest that in the former case, the writes are not getting lost before they reach the NAS4Free box (since if so, I would expect them to be lost here too).

I realize I haven't definitively isolated the problem - there are a lot of components in the "stack" between my test application and the disks - but I have shown that the stack as a whole does not seem to be operating as I would expect. Also I believe this is an ESXi + FreeBSD issue, not a NAS4Free one.

Below is the source code for my test app:

Code: Select all

using System;
using System.IO;
using System.Runtime.InteropServices;
using System.ComponentModel;

namespace ZilTest2 {
  class Program {

    [DllImport("kernel32", SetLastError = true)]
    private static extern bool FlushFileBuffers(IntPtr handle);

    const string filename = @"C:\test.txt";

    static void Main(string[] args) {
      FileOptions opts = FileOptions.DeleteOnClose | FileOptions.WriteThrough;
      long i = 0;
      if (File.Exists(filename)) File.Delete(filename);
      using (var stream = new StreamWriter(File.Create(filename, 4096, opts))) {
        stream.AutoFlush = true;
        while (true) {
          stream.WriteLine(i);

          #pragma warning disable 618,612 // disable stream.Handle deprecation warning.
          if (!FlushFileBuffers(((FileStream)stream.BaseStream).Handle))   // Flush OS file cache to disk.
          #pragma warning restore 618,612
            {
              Int32 err = Marshal.GetLastWin32Error();
              throw new Win32Exception(err, "Win32 FlushFileBuffers returned error for " + ((FileStream)stream.BaseStream).Name);
            }

          Console.WriteLine(i);

          i++;
        }
      }
    }
  }
}


rkagerer
NewUser
NewUser
Posts: 5
Joined: 20 Feb 2013 07:13
Status: Offline

Re: ZFS + iSCSI requires sync=always?

#4

Post by rkagerer » 08 Jul 2013 05:29

I've also reached out to Daisuke Aoyama who is the maintainer for istgt and, incidentally, also a project lead for NAS4Free. I've pasted the bulk of that email here to keep documentation for this issue together.

I’ve been testing NAS4Free as a ZFS datastore for ESXi virtual machines. Overall it’s been going well, and has even out-performed Nexenta in some benchmarks. But I’ve run into a concern where “cache-bypass” flags sent from applications in my VM aren’t being honoured all the way down the stack. I’m trying to pinpoint where they’re being lost, and have detailed my findings so far [here].

Since my initial post, I’ve become a bit more skeptical of whether the problem is actually in istgt, or if ESXi simply isn’t sending appropriate iSCSI commands. I’d appreciate some help in isolating the culprit.

I dived into the source code for istgt, and also monitored output with trace set to “all” on my NAS4Free box. I notice the handler for SBC_SYNCHRONIZE_CACHE_10 should emit trace text, but I haven’t seen anything containing the word “SYNCHRONIZE” in the output. Is it correct to believe that means these commands aren’t getting sent?

Next, I wondered if perhaps ESXi is instead sending WRITE commands with the “Force Unit Access” bit set as a way to commit them. I see the trace output from write commands, but am unsure how to identify whether the FUA or FUA_NV bits are set. Do I need to modify and recompile the source code to do this?

Do you know if many other people are using ZFS in conjunction with istgt to provide datastores for ESXi?

neptunus
experienced User
experienced User
Posts: 86
Joined: 11 Jun 2013 08:50
Status: Offline

Re: ZFS + iSCSI requires sync=always?

#5

Post by neptunus » 22 Jul 2013 08:13

Is there any new news?

justin
Starter
Starter
Posts: 19
Joined: 22 Jul 2013 15:10
Status: Offline

Re: ZFS + iSCSI requires sync=always?

#6

Post by justin » 28 Jul 2013 08:31

also interested
Justin
- NAS4Free 9.1.0.1 x64-full 804 | x64-full on Intel(R) Xeon(R) CPU E5620 @ 2.40GHz | 98271MiB RAM | X x YTB WD ZFS mirror stripping compressed, Z x YTB WD ZFS zraid2 | 2 SSD ZIL, 1 SSD LOG
- NAS4Free 9.1.0.1 x64-full 804 | x64-full on Intel(R) Xeon(R) CPU E5620 @ 2.40GHz | 98271MiB RAM | Z x YTB WD ZFS zraid2

User avatar
STAMSTER
experienced User
experienced User
Posts: 80
Joined: 23 Feb 2014 15:58
Status: Offline

Re: ZFS + iSCSI requires sync=always?

#7

Post by STAMSTER » 26 Jan 2015 11:33

Any improvement on this topic in latest Nas4Free / FreeBSD builds, which do have a latest istgt version too?
rIPMI

User avatar
STAMSTER
experienced User
experienced User
Posts: 80
Joined: 23 Feb 2014 15:58
Status: Offline

Re: ZFS + iSCSI requires sync=always?

#8

Post by STAMSTER » 11 Apr 2016 04:07

UP.
Any news on istgt and latest N4F / FreeBSD editions?
rIPMI

Lord Crc
Starter
Starter
Posts: 52
Joined: 17 Jun 2013 12:29
Status: Offline

Re: ZFS + iSCSI requires sync=always?

#9

Post by Lord Crc » 28 Aug 2016 22:25

I've read the istgt source code, and as far as I can see (assuming zvol's are RAW devices in istgt's world as opposed to "file devices"), istgt does call fsync() upon receiving the SYNCRHONIZE_CACHE command.

Will be testing if the fsync() on the zvol is honored when sync=standard.

Post Reply

Return to “iSCSI (Internet Small Computer Systems Interface)”