ZZX
Background#
There are a lot of open source automated ZFS replication solutions out there, and most of them approach the problem from a “backup” point of view. However, treating ZFS like a backup tool downplays a lot of its strengths, which can be used to create a powerful replication system.
Instead of yet another backup tool or snapshot job scheduler, I wanted a replication system capable of:
-
Bi-directional configuration. That is to say, if a machine is sending ZFS streams, it should be able to start receiving streams back if so configured (effectively a poor man’s HA, a form of warm failover.)
-
Direct sending of ZFS streams over TCP, bypassing SSH. This is necessary when you want to allow an untrusted machine that is periodically online (such as a laptop) to push ZFS streams to a file server. Also perfect for a SAN or otherwise encrypted network path.
-
Sending or “sharing” of a single ZFS snapshot to more than one destination.
-
Simple to understand and easy to use: automated snapshot names should be easy for a human administrator to digest, for example.
-
Bonus: The ability to mount a replicated read-only filesystem in an atomic manner.
Solutions#
-
zrepl was pretty good, except that it didn’t allow bi-directional configuration. You could hack it by taking the daemon offline, automating a config rewrite to reverse roles, and then bring it back up. This is, of course, a fairly dirty thing to do. No sharing of snapshots either, and given the number of errors in my syslog, it was fairly fragile in general.
-
Sanoid in conjunction with Syncoid was a close second place (with its “insecure-direct-connection” option), but was extremely obtuse (not only in syntax and configuration, but with prolix snapshot names as well!) Didn’t tick any of my other checkboxes.
-
zfs-auto-backup, znapzend, zrep, and zxfer all have a hard dependency on SSH. I do appreciate the simplicity of some of these early automation scripts, though.
ZZX#
Introducing ZZX, a simple ZFS replication daemon that can:
-
Allow any member of a trusted constellation of machines to become read-write master, and then allow push or pull to or from any of the other machines. The rest of the members of the constellation become read-only standbys, but ready to take over read-write if needed. Split brain is prevented by confirming all active members are read-only before the new master enables read-write.
-
Use a simple TCP protocol, inclusive of authentication (but not encryption), to send and receive ZFS streams and the associated metadata. This allows maximum performance without clogging up your syslog with hundreds of replication logins.
-
Take a single snapshot and send it as an incremental stream to any number of peers. (No more @machine1, @machine2, @machine3, et al snapshots).
-
Run without any arguments, featuring an easy to understand JSON configuration syntax. Snapshot names are simply UNIX time_t after a @z prefix.
-
Receive a read-only filesystem and mount it in a manner so that received changes are available to users immediately.