I've been using NAS4free for a few years now at home and am very happy with it. So far, it is the most robust storage solution I have used. After this positive experience I decided to install a NAS4free box in a small company and use iSCSI to connect volumes to two servers. However, I encountered a problem and have been unable to resolve it yet. This will be a longer post, because I really took the time to troubleshoot it and would like to provide as much detail as possible.
TL;DR for those who don't like long posts: I have a ZFS volume connected via iSCSI to a Hyper-V virtual machine and it experiences some erros (I/O errors, slowdowns). For example, I can even resize a NTFS partition created on that volume (operation ends with error "The request could not be performed because of an I/O device error."). How can I diagnose what such operation fails?
Environment
NAS4free box
• NAS4free version 9.3.0.2 (revision 1391)
• HW is a desktop computer based on Core 2 Duo E7200 (2.53 GHz), 8 GB RAM, 4× 3TB WD Red (WD30EFRX), 2× 2TB WD Green (WD20EARS)
• 4× WD Red in RAID-Z2 with 2 volumes connected via iSCSI
• 2× WD Green in Mirror with 1 volume connected via iSCSI
Main server
• SuperMicro 1U server with 4-core Xeon E3-1230 (3.20 GHz), 24 GB RAM, 2× Samsung 840 Pro SSD (RAID1 for system), 2x Intel 730 SSD (RAID1 for other data)
• It handles the usual in a domain network (Active Directory, DNS, WSUS, Hyper-V with several VMs)
Backup server
• Desktop Fujitsu server with 4-core Xeon E3-1226 v3 (3.30 GHz), 12 GB RAM, 2× Samsung 850 Pro (RAID1 for system storage)
Here is a scheme I made to better illustrate how is everything interconnected. To better explain what each ZFS volume does:
• BackupVolServers is used by the Backup server to store backups from servers. It uses a backup software to create system state backups from all the physical servers (even some not in the picture) and also some virtual machine servers.
• ShareVol is a volume in the same Virtual Device as BackupVolServers and is connected to one of the VMs (marked by red X). It is used to provide shared data to the network (SMB, Windows Folder Sharing). This is causing some trouble and this is why I created this post.
• BackupVolWorkstations is also used by the Backup server and stores backups from workstations.
The problem
The main issue here is that ther backup software on the Backup server fails to back up the Main server. The backup process is interrupted and never finishes. It has something to do with Volume Shadow Copy failure. I spent several weeks trying to resolve the issue. I tried various methods I found on Microsoft forums, from increasing VSS timeouts to resizing partitions to making sure all VSS writers are up and operational. Nothing helped.
After these weeks, I pinpointed the issue. It fails when trying to create a Volume Shadow Copy of the Virtual Machine 4 (the red X). By snooping around, I found out that it is very likely related to the iSCSI connected volume. Here is why I think so:
• The VSS operation fails with error "The backup operation that started at [...] has failed because the Volume Shadow Copy Service operation to create a shadow copy of the volumes being backed up failed with following error code '2155348129'." Error 2155348129 is not a very specific errors and leads to many troubleshooting attempts all of which I tried as described above.
• When I try to resize the iSCSI drive partition, I get error "The request could not be performed because of an I/O device error."
• When writing to the shared storage (i.e. from network via Virtual Machine 4 to the iSCSI connected volume), after a while the writing slows down to 0-2 MB/s and the computer slows to a crawl. When you for example try to upload a file via FTP, FTP server disconnects you because it is not responsive.
• NAS4free System log contains many errors:
Code: Select all
istgt[58394]: istgt_lu_disk.c:6277:istgt_lu_disk_execute: ***ERROR*** lu_disk_lbwrite() failed
istgt[58394]: istgt_lu_disk.c:4168:istgt_lu_disk_lbwrite: ***ERROR*** lu_disk_write() failed The interesting part
This wasn't like this from the very beginning. When I built the storage from scratch, I copied several hundreds GB of data there with no slowdown. Also the backup software was able to backup the whole Main server. It somehow started to fail after some time (I know this sounds like "I didn't do anything, it broke itself").
I'd be tempted to say I did something wrong, but ShareVol and BackupVolServers volumes are on the same VDev and one experiences the issue while the other does not. At this point I am not sure where the problem is. Perhaps in iSCSI, perhaps in some configuration, or maybe the fact that the traffic goes through a Hyper-V machine. I haven't tried to connect the storage to a physical machine yet (it requires moving many services). Perhaps when we debug the issue why I cannot even resize the volume, things would fix, but I am not sure how to resolve this.
I am not even sure anyone will be able to help me with this, but any input is appreciated.
Thank you.


