Typically in the same manner as with ext4/ldiskfs backed filesystems - using Corosync or
similar HA software. Care needs to be taken to STONITH the failed node so that you
don't have multiple pools importing the same pool (which can lead to almost immediate
corruption).
The MMP feature that we use for ldiskfs to detect and prevent multiple nodes concurrently
is not available for ZFS yet, though there are some possible ways to get this on the
cheap. If you "dd" the überblocks (128KB at 4MB offset, IIRC) and checksum it,
then sleep and repeat, the checksum should obviously not change.
Cheers, Andreas
On May 5, 2014, at 9:42, Andrew Holway
<andrew.holway(a)gmail.com> wrote:
Hello,
How would you manage failover for MDS / OSS for Lustre / ZFS?
Thanks,
Andrew