*New 11.4 series Release:
2020-07-03: XigmaNAS 11.4.0.4.7633 - released!

*New 12.1 series Release:
2020-04-17: XigmaNAS 12.1.0.4.7542 - released


We really need "Your" help on XigmaNAS https://translations.launchpad.net/xigmanas translations. Please help today!

Producing and hosting XigmaNAS costs money. Please consider donating for our project so that we can continue to offer you the best.
We need your support! eg: PAYPAL

[SOLVED] Disk Corruption on Ubuntu VM

VirtualBox, VM config and HDD images.
Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
Tweak
NewUser
NewUser
Posts: 8
Joined: 11 Oct 2018 05:36
Location: Phoenix, AZ
Status: Offline

[SOLVED] Disk Corruption on Ubuntu VM

#1

Post by Tweak »

Friends--
I need help, but could not find this same symptom/failure anywhere on the forum.

I have a VM running under XigmaNAS, and it has Ubuntu Server 18-04.3 LTS installed.
It provides some services on my home network, and has been running flawlessly (and updating without incident) for more than a year... until recently.

I just updated XigmaNAS from 11.2 to 12.1 as the host OS.
[I believe this update also included an update to the VirtualBox version included in XigmaNAS... This might be part of the problem.]
After the Host OS update, I re-started the VM and logged into it to update the Guest OS (Ubuntu 18-04.3).
However, I have a significant problem every time the apt (dpkg) tries to unpack the linux-headers package for the update.
It freezes at that point, and then, after some number of minutes, with the VM's taskmeter climbing to 4 or 5 or <gulp!> 10 or more, (varies between 3-15 minutes), I will get error messages on the screen that there is an I/O error on /dev/sda (the virtual harddisk for the VM).
When this happens, I have to "force quit" (via a sudo kill command for the VBoxHeadless task) and re-start the VM.
This has happened consistently, for at least 20 attempts to "resurrect" the VM and get it updated.
Strangely, the VM will run happily, without error for at least 2 hours, without giving an I/O error. The error happens 100% of the time, when I try to update the Guest OS, once the apt update gets to the point of unpacking the linux-headers package. (?!?!?)
Since the VM's /dev/sda is just a file on the host's ZFS pool, I am struggling to figure out how to diagnose whether it is a filesystem corruption INSIDE the VM, or if it is a filesystem corruption on the Host OS, or if it is a potential impending mechanical failure of one of the drives in the pool.
(All of the SMART reports show that the constituent HDDs in the ZFS pool on the Host machine are A-OK.)

Can any of the experts in the Forum groups help me with some troubleshooting tips/techniques/procedures, and/or some advice on how to get the VM root disk healthy again?

Many thanks, in advance!

Cheers,
Mike
Last edited by Tweak on 08 Dec 2019 02:39, edited 2 times in total.

cookiemonster
Advanced User
Advanced User
Posts: 276
Joined: 23 Mar 2014 02:58
Location: UK
Status: Offline

Re: Disk Corruption on Ubuntu VM

#2

Post by cookiemonster »

No expert on this but my thoughts are that to XigmaNas the VM is just a file, probably a vdi or vmdk type. Inside there will be all the files the VM requires. If there is a problem with that internal filesystem, then that could be the problem and unless ZFS knows of a problem via the checksums against memory or pool state, then is none the wiser.
In light of that I would probably do first with the VM switched off, do a zfs scrub.
Then I would start the VM on single user mode and force a fsck; followed by a restart from within ubuntu once finished.
Please note I'm not suggesting you do that, just sharing my thoughts. Maybe someone more experienced on these will chime in.
Main: Xigmanas 11.2.0.4 x64-full-RootOnZFS as ESXi VM with 24GB memory.
Main Host: Supermicro X8DT3 Memory: 72GB ECC; 2 Xeon E5645 CPUs; Storage: (HBA) - LSI SAS 9211-4i with 3 SATA x 1 TB in raidZ1, 1 x 3 TB SAS drive as single stripe, 3 x 4 TB SAS drives in raidZ1.
Spare1: HP DL360 G7; 6 GB ECC RAM; 1 Xeon CPU; 5 x 500 GB disks on H210i
Backup1: HP DL380 G7; 24 GB ECC RAM; 2 Xeon E5645 CPUs; 8 x 500 GB disks on IBM M1015 flashed to LSI9211-IT

Tweak
NewUser
NewUser
Posts: 8
Joined: 11 Oct 2018 05:36
Location: Phoenix, AZ
Status: Offline

Re: Disk Corruption on Ubuntu VM

#3

Post by Tweak »

@cookiemonster--
Thanks for the quick reply!

I have the same mental concept as you explained in your post. (I think that's a good sign <for me>.)
I have a concern, though, because the VM can update *other* packages (via apt), and have NO problems [i.e., NOT crash out with disk I/O errors]. For some reason, it is only with the "linux-headers-4.15.0-72" that **always** makes it crash.
I even forced a removal of the .deb file and then re-downloaded it for another try to install -- trying to rule out corrupted/flawed source file. Still, no luck.

Because these troubles began after I upgraded the XigmaNAS system (to the 12.1 release), I am trying to discern if there's any kind of problem introduced in the VirtualBox OSE, which could lead to memory overflows/faults ... which might manifest as thought there is a "hardware" fault {like a buggy disk I/O controller, for instance} in the virtualized environment for the VM.

Does that make sense??

>> Also, since I am more of a Deb/Buntu and Arch kinda guy...and am NOT conversant in BSD...can you give me a quick "how to" for accomplishing your recommendation of "...do a zfs scrub"??

Again, many thanks, my friend!!

Cheers,
Mike
XigmaNAS 12.1.0.4 - Ingva embedded on SanDisk Ultra USB
HP Z400, 2x Xeon W3565, 24 GB ECC, 4x 4TB WD Red (WD40EFRX) in ZFS RAIDz1 pool
SAMBA/CIFS, AFP (Time Machine), NFS, Web server, and 2x VMs
Rsync to off-board 10TB HDD backup dive.

cookiemonster
Advanced User
Advanced User
Posts: 276
Joined: 23 Mar 2014 02:58
Location: UK
Status: Offline

Re: Disk Corruption on Ubuntu VM

#4

Post by cookiemonster »

Hi. I'm thinking that if the VM bombs out in the same place that the likely problem is internal to its filesystem, not in XigmaNas. Or at least that's what it makes me think. For the scrub: Disks > ZFS > Tools > Scrub a pool > Start, chose pool, Next.
And then I would continue troubleshooting from Ubuntu.
I'm sure a dev will correct me but for the upgrade of XN, the VM files would not have been touched. The OS upgrade happened around it, I think the upgrade is a red herring. Would it be feasible to export it an import it into another host to see if it happens there too?
Main: Xigmanas 11.2.0.4 x64-full-RootOnZFS as ESXi VM with 24GB memory.
Main Host: Supermicro X8DT3 Memory: 72GB ECC; 2 Xeon E5645 CPUs; Storage: (HBA) - LSI SAS 9211-4i with 3 SATA x 1 TB in raidZ1, 1 x 3 TB SAS drive as single stripe, 3 x 4 TB SAS drives in raidZ1.
Spare1: HP DL360 G7; 6 GB ECC RAM; 1 Xeon CPU; 5 x 500 GB disks on H210i
Backup1: HP DL380 G7; 24 GB ECC RAM; 2 Xeon E5645 CPUs; 8 x 500 GB disks on IBM M1015 flashed to LSI9211-IT

cookiemonster
Advanced User
Advanced User
Posts: 276
Joined: 23 Mar 2014 02:58
Location: UK
Status: Offline

Re: Disk Corruption on Ubuntu VM

#5

Post by cookiemonster »

Ah, I re-read your post. I see what you're saying the update to new version of VirtualBox OSE. Fair point. Reminds me, there was a user on the forum that posted that after his upgrade to one of these recent releases he had to recreate his VMs. It might be somethign else though.
Also the .deb file for a headers update... I would try from ap-get install, although that's probably what you did.
And me too, I'm more Ubuntu than FreeBSD.
Main: Xigmanas 11.2.0.4 x64-full-RootOnZFS as ESXi VM with 24GB memory.
Main Host: Supermicro X8DT3 Memory: 72GB ECC; 2 Xeon E5645 CPUs; Storage: (HBA) - LSI SAS 9211-4i with 3 SATA x 1 TB in raidZ1, 1 x 3 TB SAS drive as single stripe, 3 x 4 TB SAS drives in raidZ1.
Spare1: HP DL360 G7; 6 GB ECC RAM; 1 Xeon CPU; 5 x 500 GB disks on H210i
Backup1: HP DL380 G7; 24 GB ECC RAM; 2 Xeon E5645 CPUs; 8 x 500 GB disks on IBM M1015 flashed to LSI9211-IT

Tweak
NewUser
NewUser
Posts: 8
Joined: 11 Oct 2018 05:36
Location: Phoenix, AZ
Status: Offline

Re: Disk Corruption on Ubuntu VM

#6

Post by Tweak »

@cookiemonster--

Thank you, again, for sharing some of your mental horsepower with me for this problem. Much appreciated, out here in the dark!!

I ran a "scrub" as you described [Scrub a Pool -> select my main ZFS storage pool]. It returned with "Command execution was successful."
>> I hope that means "the outcome was positive (no errors)," and not merely that the command had to 'fix' anything while it was running (per fsck).

Thanks for taking the time to re-read my previous reply.
I can't figure out why the *intermittent* failure mode -- only with the one LARGE file (the linux-headers .deb) -- and not a constant I/O error stream.
That's why I have concern that it's a *software* hiccup (in the OSE) masquerading as a <virtual> hardware hiccup.

Yes...I am using the 'vanilla' upgrade path (sudo apt update && sudo apt upgrade && sudo apt full-upgrade).
I have to execute a "dpkg -r --force-remove-reinstreq" every time I resurrect the VM, just to get the file locks and ghost 'handles' cleaned up.
No fun.

Frankly -- and I throw up in my mouth a little bit when I think about this -- I've got a Win Server 2010 VM running rock-steady as a guest in an Ubuntu host.
I was trying to work my way toward fully divesting of the MS-Win ecosystem, now that my wife is migrated to Apple (Linux-like ... AMEN!).
...but I'm still leery of pulling the plug on my AD and net-management services from the Win machine, if I can't keep the VM guest running through an unnattended-upgrades cycle.
(&%*#@!)

It hurts my heart to hear that others have had significant headaches. (Yes...I should have a pristine backup...but I don't have one, because it was still a "Work In Progress," and I had taken a fairly -erm- 'meandering' path to the limited success I had already achieved. I hate to see it disappear.

All the same, I truly appreciate your insights and help! I hope you have a fantastic holiday!!

Cheers,
Mike
XigmaNAS 12.1.0.4 - Ingva embedded on SanDisk Ultra USB
HP Z400, 2x Xeon W3565, 24 GB ECC, 4x 4TB WD Red (WD40EFRX) in ZFS RAIDz1 pool
SAMBA/CIFS, AFP (Time Machine), NFS, Web server, and 2x VMs
Rsync to off-board 10TB HDD backup dive.

cookiemonster
Advanced User
Advanced User
Posts: 276
Joined: 23 Mar 2014 02:58
Location: UK
Status: Offline

Re: Disk Corruption on Ubuntu VM

#7

Post by cookiemonster »

Just a thought here. You could install the previous version of XN on a spare USB stick just to attempt to rule out the new VirtualBox version on the new XN update.
I take dmesg nor file system utilities show any abnormalities.? A bit out there but could it be possible to create a dump device and start a core dump to analyse? That is beyond my capabilities to guide you on but when I've done it once before and for the time available it was faster to reinstall and restore backup of personal files. It was a physical machine and the HD was failing. I created a block device backup with clonezilla to get some files. All in all faster than learning to debug core dumps.
Main: Xigmanas 11.2.0.4 x64-full-RootOnZFS as ESXi VM with 24GB memory.
Main Host: Supermicro X8DT3 Memory: 72GB ECC; 2 Xeon E5645 CPUs; Storage: (HBA) - LSI SAS 9211-4i with 3 SATA x 1 TB in raidZ1, 1 x 3 TB SAS drive as single stripe, 3 x 4 TB SAS drives in raidZ1.
Spare1: HP DL360 G7; 6 GB ECC RAM; 1 Xeon CPU; 5 x 500 GB disks on H210i
Backup1: HP DL380 G7; 24 GB ECC RAM; 2 Xeon E5645 CPUs; 8 x 500 GB disks on IBM M1015 flashed to LSI9211-IT

Tweak
NewUser
NewUser
Posts: 8
Joined: 11 Oct 2018 05:36
Location: Phoenix, AZ
Status: Offline

Re: Disk Corruption on Ubuntu VM

#8

Post by Tweak »

@cookiemonster--

Well...I don't know how I 'stumbled' upon this solution, but this is how I got past the hurdle:

I made another VM (from within the webfront [phpvirtualbox/index.html]), and when I set it up I tried to start it and build a new 18-04.3 LTS server instance.
I noticed that there was an error message that popped-up about how the USB function was not available -- because the VirtualBox Extensions were not installed (in the OSE).
^^ THAT caught my attention. So, I disabled the USB in the control window, and re-started.
Nonetheless, I built another instance and got it up and running/updated to the latest kernel without incident.

** Note, I have been managing/running the extant VMs through the VBoxManage CLI, since I had been using scripts to execute on certain triggers.

As I was working to get all of the same packages/software loaded onto the new instance, I took a moment when BOTH machines were in a "down" (powered off) state, and I checked both of the 'showvminfo --details' outputs from the VBoxManage (CLI) interface.
I saw that the *problem* VM had the USB function *enabled* in its machine header...so I thought that might be a problem (based on the above error message).
Through the CLI interface, I disabled the USB function on the original VM (the one which had been causing the problem).

Voila -- on the next reboot (after executing a dpkg cleaning/re-homing), the apt update/upgrade cycle went through smoothly.
Yay!!!
The 'old' VM has been running smoothly for the last 8 hours.

Easy, peasy, lemon-squeezy.

Whew! (All of the work is still intact.)

Hopefully, maybe this thread might help someone else who faces the same enigma in the future, when upgrading the XigmaNAS OS (and, therefore, the VBox OSE).

...now...I just wish I could remember how I got the USB function enabled in the FIRST place (over a year ago).
I must've found some way to hack the Extensions into the OSE, even though they're not available via conventional/default installation.
>> Any thoughts on my last question?

Again...MANY THANKS for sticking with me through several days' discernment and muddling-about in the OSE inquiry...!

Best regards,
Mike
XigmaNAS 12.1.0.4 - Ingva embedded on SanDisk Ultra USB
HP Z400, 2x Xeon W3565, 24 GB ECC, 4x 4TB WD Red (WD40EFRX) in ZFS RAIDz1 pool
SAMBA/CIFS, AFP (Time Machine), NFS, Web server, and 2x VMs
Rsync to off-board 10TB HDD backup dive.

netware5
experienced User
experienced User
Posts: 136
Joined: 31 Jan 2017 21:39
Location: Sofia, BULGARIA
Status: Offline

Re: [SOLVED] Disk Corruption on Ubuntu VM

#9

Post by netware5 »

I am using Ubuntu Server 16.04 LTS VM under Xigmanas. The VM provides print services to my home network via USB connected printer. So I definitely use the USB function. No issues detected during last two years. According to my memory the "USB function" has been added by devs on some later stage. Before that I tried to play with Vbox guest extensions, but without success. I really don't remember the whole story, but it is clear that currently I run my VM with USB function enabled and this does not affect the VM update process.
XigmaNAS 12.1.0.4 - Ingva (rev.7542) embedded on HP Proliant Microserver Gen8, Xeon E3-1265L, 16 GB ECC, 2x4TB WD Red ZFS Mirror

cookiemonster
Advanced User
Advanced User
Posts: 276
Joined: 23 Mar 2014 02:58
Location: UK
Status: Offline

Re: [SOLVED] Disk Corruption on Ubuntu VM

#10

Post by cookiemonster »

That was a really good spot @Tweak, and thanks for sharing your findings.
Main: Xigmanas 11.2.0.4 x64-full-RootOnZFS as ESXi VM with 24GB memory.
Main Host: Supermicro X8DT3 Memory: 72GB ECC; 2 Xeon E5645 CPUs; Storage: (HBA) - LSI SAS 9211-4i with 3 SATA x 1 TB in raidZ1, 1 x 3 TB SAS drive as single stripe, 3 x 4 TB SAS drives in raidZ1.
Spare1: HP DL360 G7; 6 GB ECC RAM; 1 Xeon CPU; 5 x 500 GB disks on H210i
Backup1: HP DL380 G7; 24 GB ECC RAM; 2 Xeon E5645 CPUs; 8 x 500 GB disks on IBM M1015 flashed to LSI9211-IT

Tweak
NewUser
NewUser
Posts: 8
Joined: 11 Oct 2018 05:36
Location: Phoenix, AZ
Status: Offline

Re: [SOLVED] Disk Corruption on Ubuntu VM

#11

Post by Tweak »

@netware5--

Your use-case is (for me) "the exception that proves the rule." :D
netware5 wrote:
08 Dec 2019 13:00
I am using Ubuntu Server 16.04 LTS VM under Xigmanas. The VM provides print services to my home network via USB connected printer. So I definitely use the USB function. No issues detected during last two years. According to my memory the "USB function" has been added by devs on some later stage. Before that I tried to play with Vbox guest extensions, but without success. I really don't remember the whole story, but it is clear that currently I run my VM with USB function enabled and this does not affect the VM update process.
I, too, had **somehow** gotten to a position where USB functionality was initially enabled in a previous iteration, via some -- now-forgotten -- admin "rain dance."
I did not mean to imply that correlation determined causation. :P

I only intended to highlight how/where I spotted a difference between the environment my VM was running under in the previous (11.2) OSE and the current (12.1) OSE.
When I changed the VM attributes to *match* what is 'required' under the 12.1 OSE (as presented in the php API webfront), the disk I/O errors {on VM device ata3, where my SATA hdd image is mounted} went away, and the updates proceeded without incident.

It solved *my* problem ..... and, maybe, it could be the "hidden gem" that might help someone else solve theirs.

Thanks for your reply! I appreciate hearing all the ways that others are using their servers (and VMs)! 8-)

Cheers, and Happy Holidays to you! :)

Mike
XigmaNAS 12.1.0.4 - Ingva embedded on SanDisk Ultra USB
HP Z400, 2x Xeon W3565, 24 GB ECC, 4x 4TB WD Red (WD40EFRX) in ZFS RAIDz1 pool
SAMBA/CIFS, AFP (Time Machine), NFS, Web server, and 2x VMs
Rsync to off-board 10TB HDD backup dive.

disgustipated
NewUser
NewUser
Posts: 5
Joined: 24 Sep 2012 17:40
Status: Offline

Re: [SOLVED] Disk Corruption on Ubuntu VM

#12

Post by disgustipated »

I'm seeing what I believe to be the same thing, I had applied the latest update in hopes of correcting an issue on the main xigmanas drive, all of that went ok but now the VMs are not working properly. I went so far as to resilver two new drives on the zfs mirror i have my vms living on. The error I was getting was
failed command: READ FPDMA QUEUED
and a few other errors indicating I/O issues within the VM.
I first scrubbed the zfs mirror, then replaced the drives. still same thing.
I could write to the mirror fine by creating files and folders
then I found this post https://www.virtualbox.org/ticket/8311 where one of the comments mentions zfs and how they set the drive to fixed rather than dynamic and that seemed to help.
I'm currently doing this with the clonemedium command and will report back if it works for me

Tweak
NewUser
NewUser
Posts: 8
Joined: 11 Oct 2018 05:36
Location: Phoenix, AZ
Status: Offline

Re: [SOLVED] Disk Corruption on Ubuntu VM

#13

Post by Tweak »

@disgustipated

Hmmmm.....I will be interested to hear if this works for you.

I had also read some various sentiments that ZFS was the "hidden culprit" behind some of the hiccups with VM disk I/O under VirtualBox.
One of the things I saw recommended elsewhere was to modify the R/W cache flushing under VirtualBox.

There is a command in VBoxManage (CLI) which allows you to set a lower R/W cache flushing interval.
It goes something like this:
sudo VBoxManage setextradata <VM Name> "VBoxInternal/Devices/ahci/0/LUN#0/Config/FlushInterval" 1000
  • Replace <VM Name> with the UUID or vmname of the VM you want to modify. (...dropping the angle brackets)
  • YES, the quotation marks *are* used in the command.
  • You need to make sure that you select the correct I/O bus, channel number, and device number (the "ahci/0/LUN#0" part) correctly.
  • You can select whatever numeral you would like for the end of the command.
    Note: A smaller number flushes more frequently, i.e., is more robust (but at a slight performance penalty).
>> You can find out the correct names/numerals for Bus, Channel, and Device by using the command:
sudo VBoxManage showvminfo <VM Name> --details

In my view, unless you are trying to support some high-intensity server workloads or providing high-disk-utilization services, you don't need to fret too much about the miniscule performance penalty that results from more frequent I/O cache flushing.
[And if you really have a workload that is SOOOO intense, you might want to run it as a dedicated hardware device, or at least, as a virtualized instance on bare-metal hypervisor (ESXi).]

Just my $0.02.

Good luck, and please give a follow-up post, when you complete the clonemedium process.

Cheers!
XigmaNAS 12.1.0.4 - Ingva embedded on SanDisk Ultra USB
HP Z400, 2x Xeon W3565, 24 GB ECC, 4x 4TB WD Red (WD40EFRX) in ZFS RAIDz1 pool
SAMBA/CIFS, AFP (Time Machine), NFS, Web server, and 2x VMs
Rsync to off-board 10TB HDD backup dive.

disgustipated
NewUser
NewUser
Posts: 5
Joined: 24 Sep 2012 17:40
Status: Offline

Re: [SOLVED] Disk Corruption on Ubuntu VM

#14

Post by disgustipated »

So, I dont think the clonemedium worked completely.
Steps I took:
cloned to fixed disk
removed the dynamic disk, added the fixed
things seemed to be good, I had no issues... but then i realized I had made a snapshot in november and my fixed disk data was of the vdi not the differencing disk.... looking back now from the end i cant explain why this seemed to work properly
i didnt want to go through the last two months of upgrading and running what I had on the vm so i wanted to try to find another solution and use the most recent data... (home assistant)
realizing the snapshot nonsense i made a backup of the fixed disk vdi that seemed to work and then deleted the snapshot so the differencing would be merged in. I also had to remove a 2mb differencing disk that appeared i assume because of the other disk being attached and then removed and the original disk being attached again.
I then booted up with the merged disk which was still dynamic and got i/o errors again
I then made another fixed disk clone of the merged disk
After adding it as the drive I was able to log in but then i/o messages such as "failed command: FLUSH CACHE" appeared.
turned to the google and found this post https://bbs.archlinux.org/viewtopic.php?id=210100 which mentioned the "use i/o host cache" check box in the storage controller options in vbox on the vm. I checked this box and everything appears to be working good so far.
I do not think that going to the fixed disk resolved the issue but im not going back at this point so far since it appears to be working, I think it was the use host i/o cache, however i cannot recall if this had been checked prior to the update.

my case may also be a corner case as I did have some sort of corruption as before the update zpool status was complaining about the snapshot file on this vm, prior to replacing both drives ( by replacing one at a time and letting them resilver) i did attempt a zpool scrub which appeared to have resolved any concern zpool status had with the file.

anyway so far this is resolved for me... will report back if things go south again.

edit: i re-read your post and might explore with the cache flushing some more after this host i/o cache checkbox appears to have fixed it so far, interesting stuff. i do not feel that my workload on the vm should be very intense at all. I'm only running a couple maintenance scripts for certbot, a python vdev for home assistant, and an instance of rabbitmq

Tweak
NewUser
NewUser
Posts: 8
Joined: 11 Oct 2018 05:36
Location: Phoenix, AZ
Status: Offline

Re: [SOLVED] Disk Corruption on Ubuntu VM

#15

Post by Tweak »

@disgustipated

Good luck!
I'm glad you've gotten it working now.

I will continue to follow your progress.....

Cheers
XigmaNAS 12.1.0.4 - Ingva embedded on SanDisk Ultra USB
HP Z400, 2x Xeon W3565, 24 GB ECC, 4x 4TB WD Red (WD40EFRX) in ZFS RAIDz1 pool
SAMBA/CIFS, AFP (Time Machine), NFS, Web server, and 2x VMs
Rsync to off-board 10TB HDD backup dive.

Post Reply

Return to “VM|VirtualBox”