This is the old XigmaNAS forum in read only mode,
it will taken offline by the end of march 2021!



I like to aks Users and Admins to rewrite/take over important post from here into the new fresh main forum!
Its not possible for us to export from here and import it to the main forum!

Kernel panic upon disconnecting drive from RAID-Z

Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
Crazor
NewUser
NewUser
Posts: 2
Joined: 19 Jul 2017 14:52
Status: Offline

Kernel panic upon disconnecting drive from RAID-Z

Post by Crazor »

I've got a test setup on a Dell PowerEdge R530 as follows:
* ESXi booted from USB
* NAS4Free 11.0.0.4 VM stored on USB
* Dell PERC H730 Mini connected to the VM via PCI Passthrough, no RAID (i.e. straight HBA)
* 3 4TB HDDs connected to the HBA
* ZFS pool on one raidz1 vdev consisting of those 3 HDDs

I've read a lot of articles regarding running NAS4Free/FreeNAS in a VM and decided that PCI Passthrough is the way to go for me. I've settled on NAS4Free because FreeNAS didn't want to boot with the HBA connected to the VM. Everything works great, and I'm quite liking NAS4Free's web interface.

I simulated the worst case by disconnecting a drive from it's bay. To my surprise, the NAS4Free VM got rebooted, but the array came up fine, notifying me of the degraded status. I reconnected the drive, onlined the pool, everything got resilvered pretty quickly and the report now says that the pool is healty.

Why did NAS4Free reboot though? I had hoped that a disconnected/failing drive event would just send an alert and continue normal operation.
Last edited by Crazor on 21 Jul 2017 10:58, edited 1 time in total.

Crazor
NewUser
NewUser
Posts: 2
Joined: 19 Jul 2017 14:52
Status: Offline

Re: VM reboots upon disconnecting drive from RAID-Z

Post by Crazor »

So I repeated the exercise, watching the console and catching a glimpse of a kernel panic before it reset. I found the following in the system.log:

Code: Select all

Jul 21 10:51:07 fileserver kernel: mfi0: I/O error, cmd=0xfffffe0000eaa630, status=0xc, scsi_status=0                  
Jul 21 10:51:07 fileserver kernel: mfi0: mfi0: 2585 (88286s/0x0002/info) - Removed: PD 02(e0x20/s2) Info: enclPd=20, scsiType=0, portMap=02, sasAddr=4433221106000000,0000000sense error 0, sense_key 0, asc 0, ascq 0                        
Jul 21 10:51:07 fileserver kernel: 000000000               
Jul 21 10:51:07 fileserver kernel: mfisyspd2: hard error cmd=write 1479121720-1479121783                               
Jul 21 10:51:07 fileserver kernel:                         
Jul 21 10:51:07 fileserver kernel:                         
Jul 21 10:51:07 fileserver kernel: Fatal trap 12: page fault while in kernel mode                                      
Jul 21 10:51:07 fileserver kernel: cpuid = 1; apic id = 02 
Jul 21 10:51:07 fileserver kernel: fault virtual address        = 0x8                                                  
Jul 21 10:51:07 fileserver kernel: fault code           = supervisor read data, page not present                       
Jul 21 10:51:07 fileserver kernel: instruction pointer  = 0x20:0xffffffff80b1cfab                                      
Jul 21 10:51:07 fileserver kernel: stack pointer                = 0x28:0xfffffe022fece930                              
Jul 21 10:51:07 fileserver kernel: frame pointer                = 0x28:0xfffffe022fece980                              
Jul 21 10:51:07 fileserver kernel: code segment         = base 0x0, limit 0xfffff, type 0x1b                           
Jul 21 10:51:07 fileserver kernel: = DPL 0, pres 1, long 1, def32 0, gran 1                                            
Jul 21 10:51:07 fileserver kernel: processor eflags     = interrupt enabled, resume, IOPL = 0                          
Jul 21 10:51:07 fileserver kernel: current process              = 12 (irq258: mfi0)                                    
Jul 21 10:51:07 fileserver kernel: trap number          = 12                                                           
Jul 21 10:51:07 fileserver kernel: panic: page fault       
Jul 21 10:51:07 fileserver kernel: cpuid = 1               
Jul 21 10:51:07 fileserver kernel: KDB: stack backtrace:   
Jul 21 10:51:07 fileserver kernel: #0 0xffffffff80c2f637 at kdb_backtrace+0x67                                         
Jul 21 10:51:07 fileserver kernel: #1 0xffffffff80be44e2 at vpanic+0x182                                               
Jul 21 10:51:07 fileserver kernel: #2 0xffffffff80be4353 at panic+0x43                                                 
Jul 21 10:51:07 fileserver kernel: #3 0xffffffff81131c61 at trap_fatal+0x351                                           
Jul 21 10:51:07 fileserver kernel: #4 0xffffffff81131e53 at trap_pfault+0x1e3                                          
Jul 21 10:51:07 fileserver kernel: #5 0xffffffff811313fc at trap+0x26c                                                 
Jul 21 10:51:07 fileserver kernel: #6 0xffffffff81114ee1 at calltrap+0x8                                               
Jul 21 10:51:07 fileserver kernel: #7 0xffffffff806b55b6 at mfi_tbolt_complete_cmd+0x1b6                               
Jul 21 10:51:07 fileserver kernel: #8 0xffffffff806b537d at mfi_intr_tbolt+0x9d                                        
Jul 21 10:51:07 fileserver kernel: #9 0xffffffff80b9e1cf at intr_event_execute_handlers+0x20f                          
Jul 21 10:51:07 fileserver kernel: #10 0xffffffff80b9e436 at ithread_loop+0xc6                                         
Jul 21 10:51:07 fileserver kernel: #11 0xffffffff80b9ad35 at fork_exit+0x85                                            
Jul 21 10:51:07 fileserver kernel: #12 0xffffffff8111541e at fork_trampoline+0xe                                       
Jul 21 10:51:07 fileserver kernel: Uptime: 1d0h31m1s   
First I thought that maybe there was active swap space on the drive, but I checked and I had not set up any swap yet.

Post Reply

Return to “ZFS (only!)”