YantrikDB Cluster Setup — Raft Replication, mTLS, Witness HA

YantrikDB Server has hardened clustering (v0.8.x — substrate-batch, live on a 3-node Proxmox cluster with multiple tenants) with:

openraft consensus — proper Raft (leader election, log replication, snapshots). Replaces raft-lite from the v0.5.x line. Automatic failover in seconds, chaos-tested (leader kill, network partition, kill-9 mid-write).
Mutation commit log — total-ordered, content-addressed substrate behind every write (RFC 010-A). Tombstone-shaped mutations from day one for forget/audit.

v0.8.13 ships PR 6.4 handler migration: the four hot-path HTTP write endpoints (/v1/remember, /v1/remember/batch, /v1/forget, /v1/relate) now route through the durable commit log. On a leader, writes flow through openraft consensus; on every node, the state-machine apply path materializes engine state via record_with_rid / tombstone_with_rid / upsert_entity_edge_with_id. The cosmetic-openraft regression is structurally closed.

Empirical RYW validation on a real 2-node cluster is still pending. Until that test passes, multi-node deployments should still follow the interim cluster-routing runbook for safety — point clients at the leader URL first; writes auto-fallback on 503 not-leader. The runbook becomes obsolete when end-to-end RYW empirically passes on the homelab cluster.

Spec: docs/rfcs/rfc_010_pr6_write_path_migration.md. v0.8.14 follow-ups: PR 6.7 chunked snapshot (memory-bounded) + PR 6.8 backfill admin tool (migrating pre-commit-log engine rows).

Cluster mTLS — mutually-authenticated, encrypted cluster transport (RFC 014-A). Self-signed CA + per-node certs; rotate without downtime.
Witness daemon — safe HA with only 2 data nodes
Control plane replication — tokens and databases sync to followers within 30 seconds
Wire protocol versioning — prevents silent drift during rolling upgrades
Read-only enforcement — followers reject writes and return the current leader’s address so clients redirect
Multi-database — each database replicates independently
Per-tenant admission control — quotas (max memories, batch size, RPS) + circuit breaker, fail-degraded-conservative
Online backups — POST /v1/admin/snapshot with BLAKE3 checksums

Topology

Recommended setup: 2 voters + 1 witness.

┌──────────────────┐  heartbeats   ┌──────────────────┐
│  data node 1     │ ◄───────────▶ │  data node 2     │
│  (voter)         │  oplog sync   │  (voter)         │
│  full storage    │               │  full storage    │
└────────┬─────────┘               └────────┬─────────┘
         │                                  │
         │     ┌──────────────────┐         │
         └────▶│  witness         │◄────────┘
               │  (vote-only)     │
               │  ~10 MB RAM      │
               └──────────────────┘

The witness is a tiny daemon (~3 MB binary, no disk storage) whose only job is to vote in elections. It breaks ties so 2 data nodes can run safe HA without needing a 3rd full node.

This is the same pattern as Azure SQL (witness instance), MongoDB (arbiter), Redis Sentinel, and MariaDB Galera (garbd).

Step-by-step setup

1. On node1, generate config

yantrikdb cluster init \
  --node-id 1 \
  --output /etc/yantrikdb.toml \
  --data-dir /var/lib/yantrikdb \
  --peers 192.168.1.2:7440 \
  --witnesses 192.168.1.3:7440

Output:

config written to /etc/yantrikdb.toml

cluster_secret: ydb_cluster_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(use this as the auth token from any client to access the default database)

Save the cluster_secret. You’ll need it on every other node and as the auth token from clients.

2. On node2, generate config with the same secret

yantrikdb cluster init \
  --node-id 2 \
  --output /etc/yantrikdb.toml \
  --data-dir /var/lib/yantrikdb \
  --peers 192.168.1.1:7440 \
  --witnesses 192.168.1.3:7440 \
  --secret <PASTE_SECRET_FROM_NODE1>

3. Create the database on each voter

yantrikdb db --data-dir /var/lib/yantrikdb create default

4. On node3, start the witness

yantrikdb-witness \
  --node-id 99 \
  --port 7440 \
  --cluster-secret <PASTE_SECRET_FROM_NODE1> \
  --state-file /var/lib/yantrikdb-witness/state.json

The witness needs no database, no config file, no embedding model — just the secret and a state file.

5. Start the voters

On node1 and node2:

yantrikdb serve --config /etc/yantrikdb.toml

After ~5 seconds, an election runs and one voter becomes leader.

6. Verify

yql --host 192.168.1.1 -t <cluster_secret>
yantrikdb> \cluster

  node #1 — Leader
  term: 1
  leader: 1
  healthy: yes | writable: yes
  quorum: 2

+---------+-------------------+---------+-----------+------+----------+
| node_id | addr              | role    | reachable | term | last_seen|
+---------+-------------------+---------+-----------+------+----------+
| 2       | 192.168.1.2:7440  | voter   | ✓         | 1    | 0.5s ago |
| 99      | 192.168.1.3:7440  | witness | ✓         | 1    | 0.5s ago |
+---------+-------------------+---------+-----------+------+----------+

Test failover

Kill the leader (Ctrl+C or systemctl stop yantrikdb).

Within 5–10 seconds:

The other voter detects missed heartbeats
Runs an election
The witness grants its vote
The follower promotes itself to leader

curl -s http://192.168.1.2:7438/v1/cluster | jq .role
# "Leader"

When the old leader rejoins, it sees the higher term and demotes itself to follower automatically.

Failure modes

Failure	Behavior
Leader voter dies	Other voter + witness elect new leader in <10s
Follower voter dies	Leader keeps writing (still has quorum with witness)
Witness dies	Both voters keep going, no new elections allowed
Witness + follower die	Leader becomes read-only (no quorum)
Network partition isolates a voter	Isolated voter loses quorum, becomes read-only
All nodes die	Restart any node — it loads persistent state, rejoins cluster

Manual failover

To force a specific node to become leader (e.g. for maintenance):

yantrikdb cluster promote --url http://192.168.1.2:7438 -t <cluster_secret>

This triggers an election from that node.

Cluster authentication

When clustering is enabled, the cluster_secret doubles as a master Bearer token that works on any node in the cluster:

TOKEN=ydb_cluster_xxxxxxxx...

# This works whether node1 or node2 is leader
curl http://192.168.1.1:7438/v1/stats -H "Authorization: Bearer $TOKEN"
curl http://192.168.1.2:7438/v1/stats -H "Authorization: Bearer $TOKEN"

Per-node tokens (created with yantrikdb token create) still work for fine-grained access.

Configuration reference

Full [cluster] section:

[cluster]
node_id = 1                    # unique integer per node
role = "voter"                 # voter | read_replica | witness | single
cluster_port = 7440            # peer-to-peer port
heartbeat_interval_ms = 1000   # leader → follower heartbeat rate
election_timeout_ms = 5000     # follower → candidate transition delay
cluster_secret = "ydb_cluster_..."
replication_mode = "async"     # async (default) or sync

[[cluster.peers]]
addr = "192.168.1.2:7440"
role = "voter"

[[cluster.peers]]
addr = "192.168.1.3:7440"
role = "witness"

How replication works

Under the hood, every write is recorded as an oplog entry with a hybrid logical clock (HLC) timestamp. Followers continuously pull new ops from the leader and apply them locally via the same CRDT semantics that the engine already uses.

Add-wins set for memories (UUIDv7 keys, no collisions)
LWW for graph edges (HLC tiebreaker)
Set-union for consolidation
Forget always wins (tombstones are absolute)

This means the cluster converges naturally even after network partitions — there’s no manual conflict resolution needed.

For a deeper dive, see the Raft-lite design.