ZZX
Background#
There are a lot of open source automated ZFS replication solutions out there, and most of them approach the problem from a “backup” point of view. However, treating ZFS like a backup tool downplays a lot of its strengths, which can be used to create a powerful replication system.
Instead of yet another backup tool or snapshot job scheduler, I wanted a replication system capable of:
-
Bi-directional configuration. That is to say, if a machine is sending ZFS streams, it should be able to start receiving streams back if so configured (effectively a poor man’s HA, a form of warm failover.)
-
Direct sending of ZFS streams over TCP, bypassing SSH. This is necessary when you want to allow an untrusted machine that is periodically online (such as a laptop) to push ZFS streams to a file server. Also perfect for a SAN or otherwise encrypted network path.
-
Sending or “sharing” of a single ZFS snapshot to more than one destination.
-
Simple to understand and easy to use: automated snapshot names should be easy for a human administrator to digest, for example.
-
Bonus #1: The ability to mount a replicated read-only filesystem in an atomic manner.
-
Bonus #2: No-activity detection, obviating the need to take and send a snapshot at all since nothing changed since the last one.
Solutions#
-
zrepl was, for what it’s worth, the best of them all. However, it didn’t allow bi-directional configuration. You could hack this in by taking the daemon offline, automating a config rewrite to reverse roles, and then bring it back up. This is, of course, a fairly dirty thing to do. No sharing of snapshots either, and it ended up littering my filesystem with hundreds of held, stray snapshots (because of a lack of auto-cleanup, I’m guessing).
-
Sanoid in conjunction with Syncoid was a close second place (with its “insecure-direct-connection” option), but was extremely obtuse (not only in syntax and configuration, but with prolix snapshot names as well!) Didn’t tick any of my other checkboxes.
-
zfs-auto-backup, znapzend, zrep, and zxfer all have a hard dependency on SSH. I do appreciate the simplicity of some of these early automation scripts, though.
A quick nod to drbd: it performed best of all as a HA solution despite having nothing to do with ZFS. If you need fast, instantaneous block-level bi-directional replication, this is the way to go.
ZZX#
Introducing ZZX, a simple ZFS replication daemon that can:
-
Allow any member of a trusted constellation of machines to become read-write master, and then allow push or pull to or from any of the other machines. The rest of the members of the constellation become read-only standbys, but ready to take over read-write if needed. Split brain is prevented by confirming all active members are read-only before the new master enables read-write.
-
Use a simple TCP protocol, inclusive of authentication (but not encryption), to send and receive ZFS streams and the associated metadata. This allows maximum performance without clogging up your syslog with hundreds of replication logins via SSH.
-
Take a single snapshot and send it as an incremental stream to any number of peers. (No more @machine1, @machine2, @machine3, et al snapshots).
-
Run without any arguments, featuring a config file with an easy to understand JSON block-like syntax. Snapshot names are simply UNIX time_t after a @z prefix.
-
Receive a read-only filesystem and mount it in an atomic manner so that received changes are available to users immediately.
-
Automatically clean up old, stray snapshots (both on the receiving and sending side).
-
Detect if there have been no changes since the last snapshot, and if so, skip taking and sending a new snapshot since it is redundant.