Page 1 of 1

Power Failure & FSCK Fail -- SoftRAID UFS

Posted: 13 Jun 2019 05:17
by JHM001
Hi, we have a UFS software RAID setup as a fileserver on a LAN, 2 X 3 TB WD Red drives running a fairly recent version of NAS4Free/XigmaNAS (total almost 10 years, starting with NAS4Free). On a power failure the system will not boot.

GOAL: Save data and acquire functioning XigmaSAS system again.

INCIDENT DATA

(ada1:ahcich1:0:0:0) Error 5, Retries exhausted
GEOM_MIRROR: Request failed (error=5), ada1(READ(offset- . . . , length= . . .)
. . . Error reading journal block nnnnn
. . . Unexpected SU+J Inconsistency
. . . Internal Error: Got To reply()
. . . . Unexpected Soft Udpate Inconsistency: RUN FSCK Manually.

EXPLORE SYSTEM

gpart show --> gives raw partitions AND this . . . in case it is helpful

40 5860533088 mirror/RaidXY GPT (2.7T)
40 5860533080 1 freebsd-ufs (2.7T)
5860533080 8 - free - (4.0K)

geom disk list --> gives ada0 and ada1
cat /etc/fstab --> gives /dev/da0p2 /cf ufs ro 1 1

ATTEMPT FIXES


fsck on anything does nothing -- everything is "clean"
fsck_ufs does nothing

YOUR ADVICE

* What would be the step-by-step? If we make another XigmaNAS USB, can we rebuild from there? is the problem on the USB key?
* Are there some commands you'd like me to run to find out something?

Thanks much!!

JHM

Re: Power Failure & FSCK Fail -- SoftRAID UFS

Posted: 13 Jun 2019 09:32
by ms49434
JHM001 wrote:
13 Jun 2019 05:17
Hi, we have a UFS software RAID setup as a fileserver on a LAN, 2 X 3 TB WD Red drives running a fairly recent version of NAS4Free/XigmaNAS (total almost 10 years, starting with NAS4Free). On a power failure the system will not boot.

GOAL: Save data and acquire functioning XigmaSAS system again.

INCIDENT DATA

(ada1:ahcich1:0:0:0) Error 5, Retries exhausted
GEOM_MIRROR: Request failed (error=5), ada1(READ(offset- . . . , length= . . .)
. . . Error reading journal block nnnnn
. . . Unexpected SU+J Inconsistency
. . . Internal Error: Got To reply()
. . . . Unexpected Soft Udpate Inconsistency: RUN FSCK Manually.

EXPLORE SYSTEM

gpart show --> gives raw partitions AND this . . . in case it is helpful

40 5860533088 mirror/RaidXY GPT (2.7T)
40 5860533080 1 freebsd-ufs (2.7T)
5860533080 8 - free - (4.0K)

geom disk list --> gives ada0 and ada1
cat /etc/fstab --> gives /dev/da0p2 /cf ufs ro 1 1

ATTEMPT FIXES


fsck on anything does nothing -- everything is "clean"
fsck_ufs does nothing

YOUR ADVICE

* What would be the step-by-step? If we make another XigmaNAS USB, can we rebuild from there? is the problem on the USB key?
* Are there some commands you'd like me to run to find out something?

Thanks much!!

JHM
The FreeBSD manpages are a very good source of information: GEOM-Mirror
gmirror status will show you the status of each member disk and it tells you if your data is compromised.
Please read and follow the forum rules, it will help you to get better answers to your questions: Forum Rules

Re: Power Failure & FSCK Fail -- SoftRAID UFS

Posted: 13 Jun 2019 13:33
by JHM001
ms49434: thanks for the note. If you would be so kind, I don't see what Forum rules I violated? I have more than 10 posts, it was an informative subject header, and in fact I was already reading the GEOM man pages. I also searched for other answers: there is a lot of power-failure driven GEOM-related discussion in various places on the web - but the problem seemed very specific to XigmaNAS. Anyway, per separate reply, total success for doing nothing, via "rebuilding provider finished". :)

Re: Power Failure & FSCK Fail -- SoftRAID UFS

Posted: 13 Jun 2019 13:36
by JHM001
Pleased to report TOTAL SUCCESS. For "doing nothing". Left monitor hooked up to machine overnight, and in morning was greeted with "# GEOM_MIRROR: Device NAME: rebuilding provider ada0 finished."

Rebooted and everything worked. If I get a chance I'll find out more about what went on over night.

Re: Power Failure & FSCK Fail -- SoftRAID UFS

Posted: 13 Jun 2019 13:51
by ms49434
JHM001 wrote:
13 Jun 2019 13:33
ms49434: thanks for the note. If you would be so kind, I don't see what Forum rules I violated? I have more than 10 posts, it was an informative subject header, and in fact I was already reading the GEOM man pages. I also searched for other answers: there is a lot of power-failure driven GEOM-related discussion in various places on the web - but the problem seemed very specific to XigmaNAS. Anyway, per separate reply, total success for doing nothing, via "rebuilding provider finished". :)
Just an example
a) XigmaNAS version, platform (Embedded/Full/LiveCD), and revision number.
vs
fairly recent version of NAS4Free/XigmaNAS (total almost 10 years, starting with NAS4Free).

Re: Power Failure & FSCK Fail -- SoftRAID UFS

Posted: 13 Jun 2019 15:22
by JHM001
Another update -- in fact one of the two RAID mirror disks was "not a consumer", and the RAID was degraded. Need to "forget" (scary, but does NOT apply to RAID, only non-functional disks) and then "insert". "Status" will show progress. These things can all be done either from XigmaNAS GUI or a shell via console. Currently looks like a 10 hour rebuild job for 3 TB.

Question: Would ZFS have better protected against power failure? (And yes there is a UPS, but currently not triggering a shutdown. That needs to be configured.)

Re: Power Failure & FSCK Fail -- SoftRAID UFS

Posted: 13 Jun 2019 15:30
by JHM001
XigmaNAS 11.1.0.4 x64-embedded on Intel Core2Duo E6750 on a Dell Optiplex 755, 6 GB non-ECC RAM, GEOM Software RAID-1 mirror on 2X WD Red 3.0 TB HD, CIFS, SSH, VirtualBox

Re: Power Failure & FSCK Fail -- SoftRAID UFS

Posted: 13 Jun 2019 15:47
by raulfg3
JHM001 wrote:
13 Jun 2019 15:22

Question: Would ZFS have better protected against power failure? (And yes there is a UPS, but currently not triggering a shutdown. That needs to be configured.)
YES. it's really good for this, not specifically designed for, but it's robust enough to suffer a power faillure and do not corrupt data.

Re: Power Failure & FSCK Fail -- SoftRAID UFS

Posted: 14 Jun 2019 12:32
by JoseMR
JHM001 wrote:
13 Jun 2019 15:22
Hi, we have a UFS software RAID setup as a fileserver on a LAN, 2 X 3 TB WD Red drives running a fairly recent version of NAS4Free/XigmaNAS (total almost 10 years, starting with NAS4Free). On a power failure the system will not boot.
...
Question: Would ZFS have better protected against power failure? (And yes there is a UPS, but currently not triggering a shutdown. That needs to be configured.)
Hello, you are not the only one having major UFS corruption/problems with power failures, ungraceful shutdowns etc., been telling about to just use tuned ZFS from quite some time, but unfortunately the FUD about the ZFS/ECC strict requirement that spread the Web drives new XigmaNAS users to take the wrong decision when choosing the right filesystem for their valuable data storage, which is hands down ZFS.


Some references that could helps others to take the right direction before deploy serious data storage on UFS filesystems:
Recent XigmaNAS boot problem after power failure(Supports RootOnZFS)
pfSense boot problems after power failure (Since v2.4 ships ZFS)
Stability of UFS and ZFS on FreeBSD
OPNsence Power outages tolerance(Supports ZFS)
And a lot more about the topic around the Web



Some professional/authority advice about the ZFS and ECC strict requirements for reference.
Matthew Ahrens

Allan Jude:

JRS Systems:

Just to be clear, I do recommend using ECC for anything serious about data storage regardless of the filesystem of choice, though it is certainly not mandatory, p.s I also recommend RootOnZFS platform if server high availability and reliability is a concern.

Regards

Re: Power Failure & FSCK Fail -- SoftRAID UFS

Posted: 14 Jun 2019 17:38
by JHM001
JoseMR -- Super thanks for your notes on ZFS. Ironically about four years ago after reading the ECC-paranoia material I switch off ZFS and back to UFS! And currently I did in fact read some of the same material you are sharing -- and have concluded two things:

1) ZFS is fine with non-ECC RAM
2)ZFS does NOT need more RAM than I can put in the current box (8GB for 3TB ZFS RAID1).

You point about sharing this information is a good one. Super thanks for reinforcing the analysis.

Plan now is to convert back to ZFS.

JHM