Multi-node setup

A CrateDB cluster consists of two or more CrateDB instances—called nodes—running on different hosts. Together, these nodes form a single, distributed database.

CrateDB nodes communicate with each other using a custom binary transport protocol, which uses byte-serialized Plain Old Java Objects (POJOs). This communication occurs on a dedicated transport port, which must be open and reachable from all participating nodes.

Table of Contents


Cluster State Management

Each node in the cluster maintains a copy of the cluster state, but only the master node can make changes to it during runtime.

The cluster state contains:

  • Global settings for the cluster

  • A list of discovered nodes and their status

  • Schemas of all tables

  • Shard routing information, including the status and locations of primary and replica shards

When the master node updates the cluster state, it publishes the changes to all nodes and waits for acknowledgments before proceeding with the next update.


Master Node Election

In a CrateDB cluster, only one node can act as the master at any given time. The master is responsible for updating the cluster state and coordinating operations across the cluster.

The cluster only becomes available to serve requests when a master has been successfully elected and is able to maintain contact with a quorum of nodes from the voting configuration.

🗳️ Master Election Process

  • By default, all nodes are eligible to become master.

  • You can configure a node to opt out of master eligibility via a setting (node.master: false).

  • Master election is performed among master-eligible nodes.

CrateDB uses a quorum-based voting system to elect the master:

Quorum = floor(voting_configuration_size / 2) + 1

To become or stay master, a node must be able to communicate with a quorum of nodes from the voting configuration.

🧩 Voting Configuration

The voting configuration is the list of nodes that participate in master elections. It is:

  • Automatically managed by CrateDB

  • Persisted as part of the cluster state

  • Dynamically updated when nodes join or leave the cluster

  • Never shrinks below 3 nodes, to preserve fault tolerance

Rules for updating the voting configuration:

  • If the number of master-eligible nodes is odd, all are included.

  • If the number is even, one node is excluded to avoid split-brain.

  • The minimum voting configuration size is 3. Even if a node goes offline, CrateDB keeps the previous 3-node config and requires a quorum of 2 from those nodes.

🔍 Quorum and Cluster Availability

A CrateDB cluster remains available only if the current master can reach a quorum of the nodes from the most recent voting configuration.

If quorum is lost:

  • The master node steps down

  • The cluster becomes unavailable

  • No SQL queries or write operations can be performed until quorum is restored

🛠️ Infrastructure Maintenance Example

Let’s say you have a 5-node cluster where all nodes are master-eligible:

Nodes: 1, 2, 3, 4, 5
Current master: Node 1

1. You shut down Node 5

  • Voting configuration adjusts: Nodes 1–4

  • Quorum = 3

  • Node 1 can still reach 2, 3, 4 → ✅ Quorum met

2. You shut down Node 4

  • Voting configuration adjusts: Nodes 1–3

  • Quorum = 2

  • Node 1 can still reach 2 and 3 → ✅ Quorum met

3. You shut down Node 3

  • Only Nodes 1 and 2 remain online

  • Voting configuration is still Nodes 1, 2, 3

  • Node 1 can only reach Node 2 → ❌ Quorum not met

  • Node 1 steps down → cluster becomes unavailable

🔄 Recovery from Quorum Loss

To bring the cluster back online, you must:

✅ Restart Node 3 → Nodes 1, 2, 3 are online → quorum = 2 → cluster recovers

❌ Simply bringing back nodes not in the current voting config (e.g., Nodes 4 and 5) is not enough. They don’t count toward quorum unless the cluster first regains quorum and reconfigures itself.


Discovery

CrateDB uses a discovery module to manage:

  • Finding new nodes

  • Adding nodes to the cluster

  • Removing nodes that leave

Figure 1: Node discovery process: Node n3 joins an existing cluster with nodes n1 (master) and n2. State update is parallelized.

Discovery Steps

  1. Bootstrap: Each node needs a list of potential peer addresses when starting. This can be configured statically or dynamically (e.g., via DNS SRV records, EC2 API).

  2. Ping: The node pings all addresses. Any node that replies provides:

    • Its cluster information

    • The current master node

    • Its own node name

  3. Join Request:

    • The joining node sends a request to the master node

    • The master verifies and adds the node to the cluster state

  4. Cluster State Propagation:

    • The updated state is published to all nodes

    • All nodes now share knowledge of the full cluster

To reset the cluster state, use the crate-node CLI tool. Be careful—this may cause data loss.


Networking

CrateDB uses a full-mesh topology, where every node connects to every other node.

Figure 2: One-way full mesh topology in a 5-node cluster. Each line represents a one-way connection.

Characteristics

  • Every node maintains one-way connections to every other node

  • This ensures:

    • Shortest possible paths

    • High reliability

  • But it also introduces a scaling limitation: The number of connections grows quadratically with the number of nodes:

c=n×(n−1)c = n \times (n - 1)c=n×(n−1)

Where:

  • c = number of connections

  • n = number of nodes

Last updated