*New 11.3 series Release:
2019-10-05: XigmaNAS 11.3.0.4.6928 - released, 11.2 series are soon unsupported!

*New 12.0 series Release:
2019-10-05: XigmaNAS 12.0.0.4.6928 - released!

*New 11.2 series Release:
2019-09-23: XigmaNAS 11.2.0.4.6881 - released!

We really need "Your" help on XigmaNAS https://translations.launchpad.net/xigmanas translations. Please help today!

Producing and hosting XigmaNAS costs money. Please consider donating for our project so that we can continue to offer you the best.
We need your support! eg: PAYPAL

HAST + ZFS replace failed drive

Highly Available Storage.
Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
Manxmann
Starter
Starter
Posts: 16
Joined: 05 Feb 2014 18:06
Status: Offline

HAST + ZFS replace failed drive

#1

Post by Manxmann » 02 Jan 2018 15:01

Hi Folks,

I'm experimenting around with HAST for storage HA, so far so good. I can switch master roles / fail and entire host, present storage via iSCSI (Block or file) and NFS and everything seems to work exactly as planned :)

Config as follows:

6x Discs configured as HAST0 - HAST5

File system is ZFS 3 x Mirror HAST0/1, 2/3, 4/5 and 1 POOL across the 3 VDEV devices.

Today I manually failed device HAST3 (pulled drive) and replaced it with a blank device.

A number of entries appeared in the log showing the drive failure and replacement was detected (at a hardware level) and hastd notices an error with the drive :

hastd[82438]: [hast3] (primary) Remote node acts as primary for the resource and not as secondary.

Now the confusion, ZFS rightly doesn't see the failure HAST is using drive HAST3 from the backup server. On the 'master' and 'secondary' issuing a HASTCTL status shows everything as good?!? :

nas1: /etc# hastctl status
Name Status Role Components
hast0 complete primary /dev/da0 10.10.10.2
hast1 complete primary /dev/da1 10.10.10.2
hast2 complete primary /dev/da2 10.10.10.2
hast3 complete primary /dev/da3 10.10.10.2
hast4 complete primary /dev/da4 10.10.10.2
hast5 complete primary /dev/da5 10.10.10.2

nas2: /etc# hastctl status
Name Status Role Components
...
hast3 complete secondary /dev/da3 10.10.10.1
...

Why does NAS2 not show itself as HAST3 primary?

I know the drive has failed, I pulled it, but HAST doesn't give me a clue that the drive is down other than the above log message.

So thinking I would need to init the new disk and assign it as secondary, allow the mirror to complete then assign it as primary I did this on the 'master' (server with replaced drive):

hastctl role init hast3
hastctl create hast3
hastctl role secondary hast3

At this point the the status of hast3 changed to Degraded however no replication between hosts occurred and now both servers listed HAST3 as secondary.

Issuing hastctl role primary hast3 on the the secondary server didn't do anything in the end I did and init/create/primary on the secondary device which restored the HAST config but destroyed the ZFS data forcing a scrub to recover the drive.

So I know I've done this all wrong, can anyone elaborate on the correct process to replace a drive protected by HAST and re-create the sync without destroying the filesystem data.

Tar

Post Reply

Return to “HAST”