*New 12.1 series Release:
2020-04-17: XigmaNAS - released

*New 11.3 series Release:
2020-04-16: XigmaNAS - released!

We really need "Your" help on XigmaNAS https://translations.launchpad.net/xigmanas translations. Please help today!

Producing and hosting XigmaNAS costs money. Please consider donating for our project so that we can continue to offer you the best.
We need your support! eg: PAYPAL

HAST + ZFS replace failed drive

Highly Available Storage.
Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
Posts: 14
Joined: 05 Feb 2014 18:06
Status: Offline

HAST + ZFS replace failed drive


Post by Manxmann »

Hi Folks,

I'm experimenting around with HAST for storage HA, so far so good. I can switch master roles / fail and entire host, present storage via iSCSI (Block or file) and NFS and everything seems to work exactly as planned :)

Config as follows:

6x Discs configured as HAST0 - HAST5

File system is ZFS 3 x Mirror HAST0/1, 2/3, 4/5 and 1 POOL across the 3 VDEV devices.

Today I manually failed device HAST3 (pulled drive) and replaced it with a blank device.

A number of entries appeared in the log showing the drive failure and replacement was detected (at a hardware level) and hastd notices an error with the drive :

hastd[82438]: [hast3] (primary) Remote node acts as primary for the resource and not as secondary.

Now the confusion, ZFS rightly doesn't see the failure HAST is using drive HAST3 from the backup server. On the 'master' and 'secondary' issuing a HASTCTL status shows everything as good?!? :

nas1: /etc# hastctl status
Name Status Role Components
hast0 complete primary /dev/da0
hast1 complete primary /dev/da1
hast2 complete primary /dev/da2
hast3 complete primary /dev/da3
hast4 complete primary /dev/da4
hast5 complete primary /dev/da5

nas2: /etc# hastctl status
Name Status Role Components
hast3 complete secondary /dev/da3

Why does NAS2 not show itself as HAST3 primary?

I know the drive has failed, I pulled it, but HAST doesn't give me a clue that the drive is down other than the above log message.

So thinking I would need to init the new disk and assign it as secondary, allow the mirror to complete then assign it as primary I did this on the 'master' (server with replaced drive):

hastctl role init hast3
hastctl create hast3
hastctl role secondary hast3

At this point the the status of hast3 changed to Degraded however no replication between hosts occurred and now both servers listed HAST3 as secondary.

Issuing hastctl role primary hast3 on the the secondary server didn't do anything in the end I did and init/create/primary on the secondary device which restored the HAST config but destroyed the ZFS data forcing a scrub to recover the drive.

So I know I've done this all wrong, can anyone elaborate on the correct process to replace a drive protected by HAST and re-create the sync without destroying the filesystem data.


Post Reply

Return to “HAST”