This is the old XigmaNAS forum in read only mode,
it will taken offline by the end of march 2021!



I like to aks Users and Admins to rewrite/take over important post from here into the new fresh main forum!
Its not possible for us to export from here and import it to the main forum!

ZFS volume connected to a Hyper-V VM via iSCSI (I/O errors, slowdowns, istgt errors)

iSCSI over TCP/IP.
Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
User avatar
zoom
NewUser
NewUser
Posts: 2
Joined: 30 Apr 2015 16:07
Contact:
Status: Offline

ZFS volume connected to a Hyper-V VM via iSCSI (I/O errors, slowdowns, istgt errors)

Post by zoom »

Hello all,

I've been using NAS4free for a few years now at home and am very happy with it. So far, it is the most robust storage solution I have used. After this positive experience I decided to install a NAS4free box in a small company and use iSCSI to connect volumes to two servers. However, I encountered a problem and have been unable to resolve it yet. This will be a longer post, because I really took the time to troubleshoot it and would like to provide as much detail as possible.

TL;DR for those who don't like long posts: I have a ZFS volume connected via iSCSI to a Hyper-V virtual machine and it experiences some erros (I/O errors, slowdowns). For example, I can even resize a NTFS partition created on that volume (operation ends with error "The request could not be performed because of an I/O device error."). How can I diagnose what such operation fails?

Environment

NAS4free box
• NAS4free version 9.3.0.2 (revision 1391)
• HW is a desktop computer based on Core 2 Duo E7200 (2.53 GHz), 8 GB RAM, 4× 3TB WD Red (WD30EFRX), 2× 2TB WD Green (WD20EARS)
• 4× WD Red in RAID-Z2 with 2 volumes connected via iSCSI
• 2× WD Green in Mirror with 1 volume connected via iSCSI

Main server
• SuperMicro 1U server with 4-core Xeon E3-1230 (3.20 GHz), 24 GB RAM, 2× Samsung 840 Pro SSD (RAID1 for system), 2x Intel 730 SSD (RAID1 for other data)
• It handles the usual in a domain network (Active Directory, DNS, WSUS, Hyper-V with several VMs)

Backup server
• Desktop Fujitsu server with 4-core Xeon E3-1226 v3 (3.30 GHz), 12 GB RAM, 2× Samsung 850 Pro (RAID1 for system storage)

Here is a scheme I made to better illustrate how is everything interconnected.
iscsi_scheme.png
To better explain what each ZFS volume does:
BackupVolServers is used by the Backup server to store backups from servers. It uses a backup software to create system state backups from all the physical servers (even some not in the picture) and also some virtual machine servers.
ShareVol is a volume in the same Virtual Device as BackupVolServers and is connected to one of the VMs (marked by red X). It is used to provide shared data to the network (SMB, Windows Folder Sharing). This is causing some trouble and this is why I created this post.
BackupVolWorkstations is also used by the Backup server and stores backups from workstations.


The problem

The main issue here is that ther backup software on the Backup server fails to back up the Main server. The backup process is interrupted and never finishes. It has something to do with Volume Shadow Copy failure. I spent several weeks trying to resolve the issue. I tried various methods I found on Microsoft forums, from increasing VSS timeouts to resizing partitions to making sure all VSS writers are up and operational. Nothing helped.

After these weeks, I pinpointed the issue. It fails when trying to create a Volume Shadow Copy of the Virtual Machine 4 (the red X). By snooping around, I found out that it is very likely related to the iSCSI connected volume. Here is why I think so:
• The VSS operation fails with error "The backup operation that started at [...] has failed because the Volume Shadow Copy Service operation to create a shadow copy of the volumes being backed up failed with following error code '2155348129'." Error 2155348129 is not a very specific errors and leads to many troubleshooting attempts all of which I tried as described above.
• When I try to resize the iSCSI drive partition, I get error "The request could not be performed because of an I/O device error."
• When writing to the shared storage (i.e. from network via Virtual Machine 4 to the iSCSI connected volume), after a while the writing slows down to 0-2 MB/s and the computer slows to a crawl. When you for example try to upload a file via FTP, FTP server disconnects you because it is not responsive.
• NAS4free System log contains many errors:

Code: Select all

istgt[58394]: istgt_lu_disk.c:6277:istgt_lu_disk_execute: ***ERROR*** lu_disk_lbwrite() failed 
istgt[58394]: istgt_lu_disk.c:4168:istgt_lu_disk_lbwrite: ***ERROR*** lu_disk_write() failed 
There is something wrong with that specific iSCSI connected volume, but I don't know why. All disks are Online, SMART is not reporting any issues, ZFS scrub completed successfully and repaired 0 errors.


The interesting part

This wasn't like this from the very beginning. When I built the storage from scratch, I copied several hundreds GB of data there with no slowdown. Also the backup software was able to backup the whole Main server. It somehow started to fail after some time (I know this sounds like "I didn't do anything, it broke itself").

I'd be tempted to say I did something wrong, but ShareVol and BackupVolServers volumes are on the same VDev and one experiences the issue while the other does not. At this point I am not sure where the problem is. Perhaps in iSCSI, perhaps in some configuration, or maybe the fact that the traffic goes through a Hyper-V machine. I haven't tried to connect the storage to a physical machine yet (it requires moving many services). Perhaps when we debug the issue why I cannot even resize the volume, things would fix, but I am not sure how to resolve this.

I am not even sure anyone will be able to help me with this, but any input is appreciated.
Thank you.
You do not have the required permissions to view the files attached to this post.

User avatar
daoyama
Developer
Developer
Posts: 394
Joined: 25 Aug 2012 09:28
Location: Japan
Status: Offline

Re: ZFS volume connected to a Hyper-V VM via iSCSI (I/O errors, slowdowns, istgt errors)

Post by daoyama »

How to configure ZFS? Did you use compression on ZFS volume or dataset??
If you need compression, lz4 and dedup OFF is recommended.
If you use dataset, Access time OFF and dedup OFF.
NAS4Free 10.2.0.2.2115 (x64-embedded), 10.2.0.2.2258 (arm), 10.2.0.2.2258(dom0)
GIGABYTE 5YASV-RH, Celeron E3400 (Dual 2.6GHz), ECC 8GB, Intel ET/CT/82566DM (on-board), ZFS mirror (2TBx2)
ASRock E350M1/USB3, 16GB, Realtek 8111E (on-board), ZFS mirror (2TBx2)
MSI MS-9666, Core i7-860(Quad 2.8GHz/HT), 32GB, Mellanox ConnectX-2 EN/Intel 82578DM (on-board), ZFS mirror (3TBx2+L2ARC/ZIL:SSD128GB)
Develop/test environment:
VirtualBox 512MB VM, ESXi 512MB-8GB VM, Raspberry Pi, Pi2, ODROID-C1

User avatar
zoom
NewUser
NewUser
Posts: 2
Joined: 30 Apr 2015 16:07
Contact:
Status: Offline

Re: ZFS volume connected to a Hyper-V VM via iSCSI (I/O errors, slowdowns, istgt errors)

Post by zoom »

Hello, sorry for the late answer, I have been really busy with many other things that just drop on my head out of nowhere. This week, I will be troubleshooting this storage again more actively.

To answer your questions:
• I do not use Datasets, I use Pools and Volumes (those volumes are then mounted as a HDD via iSCSI).
• There is no compression and no deduplication used on any Volume.
• All 3 Volumes have these options common: Compression-OFF, Dedup-OFF, Sync-standard

Here is the ZFS information output:

Code: Select all

NAME                	    USED  AVAIL  REFER  MOUNTPOINT
VD0/BackupVolServers       4.13T   694G  3.45T  -
VD0/ShareVol               1.06T      0  1.06T  -
VD1/BackupVolWorkstations  1.75T  60.7G  1.72T  -
 
ZFS volume properties 
NAME                      PROPERTY              VALUE                  SOURCE
VD0/BackupVolServers      type                  volume                 -
VD0/BackupVolServers      creation              Mon Mar  9 20:31 2015  -
VD0/BackupVolServers      used                  4.13T                  -
VD0/BackupVolServers      available             694G                   -
VD0/BackupVolServers      referenced            3.45T                  -
VD0/BackupVolServers      compressratio         1.00x                  -
VD0/BackupVolServers      reservation           none                   default
VD0/BackupVolServers      volsize               4T                     local
VD0/BackupVolServers      volblocksize          8K                     -
VD0/BackupVolServers      checksum              on                     default
VD0/BackupVolServers      compression           off                    local
VD0/BackupVolServers      readonly              off                    default
VD0/BackupVolServers      copies                1                      default
VD0/BackupVolServers      refreservation        4.13T                  local
VD0/BackupVolServers      primarycache          all                    default
VD0/BackupVolServers      secondarycache        all                    default
VD0/BackupVolServers      usedbysnapshots       0                      -
VD0/BackupVolServers      usedbydataset         3.45T                  -
VD0/BackupVolServers      usedbychildren        0                      -
VD0/BackupVolServers      usedbyrefreservation  694G                   -
VD0/BackupVolServers      logbias               latency                default
VD0/BackupVolServers      dedup                 off                    local
VD0/BackupVolServers      mlslabel                                     -
VD0/BackupVolServers      sync                  standard               local
VD0/BackupVolServers      refcompressratio      1.00x                  -
VD0/BackupVolServers      written               3.45T                  -
VD0/BackupVolServers      logicalused           2.36T                  -
VD0/BackupVolServers      logicalreferenced     2.36T                  -
VD0/BackupVolServers      snapshot_limit        none                   default
VD0/BackupVolServers      snapshot_count        none                   default

VD0/ShareVol              type                  volume                 -
VD0/ShareVol              creation              Tue Mar 10  0:28 2015  -
VD0/ShareVol              used                  1.06T                  -
VD0/ShareVol              available             0                      -
VD0/ShareVol              referenced            1.06T                  -
VD0/ShareVol              compressratio         1.00x                  -
VD0/ShareVol              reservation           none                   default
VD0/ShareVol              volsize               1T                     local
VD0/ShareVol              volblocksize          8K                     -
VD0/ShareVol              checksum              on                     default
VD0/ShareVol              compression           off                    default
VD0/ShareVol              readonly              off                    default
VD0/ShareVol              copies                1                      default
VD0/ShareVol              refreservation        1.03T                  local
VD0/ShareVol              primarycache          all                    default
VD0/ShareVol              secondarycache        all                    default
VD0/ShareVol              usedbysnapshots       0                      -
VD0/ShareVol              usedbydataset         1.06T                  -
VD0/ShareVol              usedbychildren        0                      -
VD0/ShareVol              usedbyrefreservation  0                      -
VD0/ShareVol              logbias               latency                default
VD0/ShareVol              dedup                 off                    default
VD0/ShareVol              mlslabel                                     -
VD0/ShareVol              sync                  standard               default
VD0/ShareVol              refcompressratio      1.00x                  -
VD0/ShareVol              written               1.06T                  -
VD0/ShareVol              logicalused           741G                   -
VD0/ShareVol              logicalreferenced     741G                   -
VD0/ShareVol              snapshot_limit        none                   default
VD0/ShareVol              snapshot_count        none                   default

VD1/BackupVolWorkstations  type                  volume                 -
VD1/BackupVolWorkstations  creation              Mon Mar 23 19:22 2015  -
VD1/BackupVolWorkstations  used                  1.75T                  -
VD1/BackupVolWorkstations  available             60.7G                  -
VD1/BackupVolWorkstations  referenced            1.72T                  -
VD1/BackupVolWorkstations  compressratio         1.00x                  -
VD1/BackupVolWorkstations  reservation           none                   default
VD1/BackupVolWorkstations  volsize               1.70T                  local
VD1/BackupVolWorkstations  volblocksize          8K                     -
VD1/BackupVolWorkstations  checksum              on                     default
VD1/BackupVolWorkstations  compression           off                    local
VD1/BackupVolWorkstations  readonly              off                    default
VD1/BackupVolWorkstations  copies                1                      default
VD1/BackupVolWorkstations  refreservation        1.75T                  local
VD1/BackupVolWorkstations  primarycache          all                    default
VD1/BackupVolWorkstations  secondarycache        all                    default
VD1/BackupVolWorkstations  usedbysnapshots       0                      -
VD1/BackupVolWorkstations  usedbydataset         1.72T                  -
VD1/BackupVolWorkstations  usedbychildren        0                      -
VD1/BackupVolWorkstations  usedbyrefreservation  28.5G                  -
VD1/BackupVolWorkstations  logbias               latency                default
VD1/BackupVolWorkstations  dedup                 off                    local
VD1/BackupVolWorkstations  mlslabel                                     -
VD1/BackupVolWorkstations  sync                  standard               local
VD1/BackupVolWorkstations  refcompressratio      1.00x                  -
VD1/BackupVolWorkstations  written               1.72T                  -
VD1/BackupVolWorkstations  logicalused           1.71T                  -
VD1/BackupVolWorkstations  logicalreferenced     1.71T                  -
VD1/BackupVolWorkstations  snapshot_limit        none                   default
VD1/BackupVolWorkstations  snapshot_count        none                   default
I also noticed a newer version of NAS4free has been released (9.3.0.2.1480). I have already updated at home (so far no problems) and will in the next few days on that company's box to see if the situation improves.

drnicolas
Advanced User
Advanced User
Posts: 180
Joined: 15 Aug 2013 14:03
Location: Wiesbaden, Germany
Status: Offline

Re: ZFS volume connected to a Hyper-V VM via iSCSI (I/O errors, slowdowns, istgt errors)

Post by drnicolas »

Have you ever tried with "Sync" OFF? I can remember having issues with backup Exec using an iSCSI connected volume for storage until SYNC was OFF
HP Proliant N54L - Bios Mod -16GB non-ECC-RAM - ZFS RAIDZ1 (3x3TB) - 1VM running XigmaNAS 11.2.0.4.6026

User avatar
Lee Sharp
Advanced User
Advanced User
Posts: 251
Joined: 13 May 2013 21:12
Contact:
Status: Offline

Re: ZFS volume connected to a Hyper-V VM via iSCSI (I/O errors, slowdowns, istgt errors)

Post by Lee Sharp »

drnicolas wrote:Have you ever tried with "Sync" OFF? I can remember having issues with backup Exec using an iSCSI connected volume for storage until SYNC was OFF
Really good way to break your filesystem. Very not recomended. However, "vfs.zfs.cache_flush_disable 1 " in the loader.conf can make a big difference to VMware.

Post Reply

Return to “iSCSI (Internet Small Computer Systems Interface)”