Multi-node setup

A CrateDB cluster consists of two or more CrateDB instances—called nodes—running on different hosts. Together, these nodes form a single, distributed database.

CrateDB nodes communicate with each other using a custom binary transport protocol, which uses byte-serialized Plain Old Java Objects (POJOs). This communication occurs on a dedicated transport port, which must be open and reachable from all participating nodes.

Cluster State Management

Each node in the cluster maintains a copy of the cluster state, but only the master node can make changes to it during runtime.

The cluster state contains:

Global settings for the cluster
A list of discovered nodes and their status
Schemas of all tables
Shard routing information, including the status and locations of primary and replica shards

When the master node updates the cluster state, it publishes the changes to all nodes and waits for acknowledgments before proceeding with the next update.

Master Node Election

In a CrateDB cluster, only one node can act as the master at any given time. The master is responsible for updating the cluster state and coordinating operations across the cluster.

The cluster only becomes available to serve requests when a master has been successfully elected and is able to maintain contact with a quorum of nodes from the voting configuration.

🗳️ Master Election Process

By default, all nodes are eligible to become master.
You can configure a node to opt out of master eligibility via a setting (node.master: false).
Master election is performed among master-eligible nodes.

CrateDB uses a quorum-based voting system to elect the master:

Quorum = floor(voting_configuration_size / 2) + 1

To become or stay master, a node must be able to communicate with a quorum of nodes from the voting configuration.

🧩 Voting Configuration

The voting configuration is the list of nodes that participate in master elections. It is:

Automatically managed by CrateDB
Persisted as part of the cluster state
Dynamically updated when nodes join or leave the cluster
Never shrinks below 3 nodes, to preserve fault tolerance

Rules for updating the voting configuration:

If the number of master-eligible nodes is odd, all are included.
If the number is even, one node is excluded to avoid split-brain.
The minimum voting configuration size is 3. Even if a node goes offline, CrateDB keeps the previous 3-node config and requires a quorum of 2 from those nodes.

🔍 Quorum and Cluster Availability

A CrateDB cluster remains available only if the current master can reach a quorum of the nodes from the most recent voting configuration.

If quorum is lost:

The master node steps down
The cluster becomes unavailable
No SQL queries or write operations can be performed until quorum is restored

🛠️ Infrastructure Maintenance Example

Let’s say you have a 5-node cluster where all nodes are master-eligible:

Nodes: 1, 2, 3, 4, 5
Current master: Node 1

1. You shut down Node 5

Voting configuration adjusts: Nodes 1–4
Quorum = 3
Node 1 can still reach 2, 3, 4 → ✅ Quorum met

2. You shut down Node 4

Voting configuration adjusts: Nodes 1–3
Quorum = 2
Node 1 can still reach 2 and 3 → ✅ Quorum met

3. You shut down Node 3

Only Nodes 1 and 2 remain online
Voting configuration is still Nodes 1, 2, 3
Node 1 can only reach Node 2 → ❌ Quorum not met
Node 1 steps down → cluster becomes unavailable

Even though 2 nodes are online, quorum requires 2 out of 3 voting nodes to be reachable by the master. Since the master can only reach 1, the cluster goes down.

🔄 Recovery from Quorum Loss

To bring the cluster back online, you must:

✅ Restart Node 3 → Nodes 1, 2, 3 are online → quorum = 2 → cluster recovers

❌ Simply bringing back nodes not in the current voting config (e.g., Nodes 4 and 5) is not enough. They don’t count toward quorum unless the cluster first regains quorum and reconfigures itself.

Discovery

CrateDB uses a discovery module to manage:

Finding new nodes
Adding nodes to the cluster
Removing nodes that leave

Figure 1: Node discovery process: Node n3 joins an existing cluster with nodes n1 (master) and n2. State update is parallelized.

Discovery Steps

Bootstrap: Each node needs a list of potential peer addresses when starting. This can be configured statically or dynamically (e.g., via DNS SRV records, EC2 API).
Ping: The node pings all addresses. Any node that replies provides:
- Its cluster information
- The current master node
- Its own node name
Join Request:
- The joining node sends a request to the master node
- The master verifies and adds the node to the cluster state
Cluster State Propagation:
- The updated state is published to all nodes
- All nodes now share knowledge of the full cluster

Caution – Discovery Misconfiguration If a node starts with no initial_master_nodes configured or with discovery_type=single-node, it will never join a cluster—even if the config is later corrected.

To reset the cluster state, use the crate-node CLI tool. Be careful—this may cause data loss.

Networking

CrateDB uses a full-mesh topology, where every node connects to every other node.

Figure 2: One-way full mesh topology in a 5-node cluster. Each line represents a one-way connection.

Characteristics

Every node maintains one-way connections to every other node
This ensures:
- Shortest possible paths
- High reliability
But it also introduces a scaling limitation: The number of connections grows quadratically with the number of nodes:

c=n×(n−1)c = n \times (n - 1)c=n×(n−1)

Where:

c = number of connections
n = number of nodes

PreviousComponents of a CrateDB node NextCluster behavior

Last updated 4 months ago

Good evening

Multi-node setup

Table of Contents

Cluster State Management

Master Node Election

🗳️ Master Election Process

🧩 Voting Configuration

🔍 Quorum and Cluster Availability

🛠️ Infrastructure Maintenance Example

🔄 Recovery from Quorum Loss

Discovery

Discovery Steps

Networking

Characteristics