Replication

Replication in CrateDB ensures data redundancy, high availability, and potential faster reads by keeping multiple copies of each shard across different nodes.

When replication is enabled:

  • Each shard has one primary and zero or more replicas.

  • Writes always go to the primary shard.

  • Reads can go to either the primary or any replica.

  • Replicas are kept in sync with primaries via shard recovery.

If a primary shard is lost (e.g., node failure), CrateDB promotes a replica to primary, minimizing the risk of data loss.

More replicas → Higher redundancy and availability But also → More disk usage and intra-cluster network traffic.

Replication can also improve read performance because more shards are distributed across nodes, enabling parallel query execution.


Table of Contents


Configuring Replication

You configure replication per table using the number_of_replicas setting.

Example:

CREATE TABLE my_table (
    first_column INTEGER,
    second_column TEXT
) WITH (number_of_replicas = 0);

Fixed Replica Count

Set number_of_replicas to an integer for a fixed number of replicas per primary.

Replica Ranges

You can specify a range using "min-max". This allows CrateDB to adjust the number of replicas dynamically as cluster size changes.

Range
Explanation

0-1

Default. One node → 0 replicas. Multiple nodes → 1 replica.

2-4

Minimum 2 replicas for green health.

Max 4 replicas if enough nodes.

Fewer than 5 nodes → yellow health.

0-all

One replica per additional node (excluding the primary’s node).

If not set, CrateDB defaults to 0-1.

You can change number_of_replicas any time with ALTER TABLE.

See also: CREATE TABLE: WITH clause


Shard Recovery

If a node becomes unavailable:

  1. Promote replicas to primaries for lost primaries.

  2. Rebuild missing replicas from the new primaries until the configured number_of_replicas is met.

This ensures continuous availability and partition tolerance, with some consistency trade-offs. See: CAP theorem

You can control shard placement with Allocation Settings.


Underreplication

Underreplication occurs when CrateDB cannot fully allocate the configured number of replicas because there aren’t enough suitable nodes.

Rules:

  • No node ever stores two copies of the same shard.

  • For 1 primary + n replicas, you need n + 1 nodes to fully replicate.

  • If not met → table enters underreplication → yellow health.


Live Scaling Example

Goal: Start with minimal replication and scale out while letting CrateDB automatically increase redundancy.

1. Start Small

CREATE TABLE orders (
    id INT PRIMARY KEY,
    customer TEXT,
    amount DOUBLE
) WITH (
    number_of_replicas = '0-2'
);
  • Cluster size: 1 node

  • Effective replicas: 0 (writes go only to the primary shard)

  • Reason: With one node, replicas can’t be placed elsewhere.

2. Add a Second Node

CrateDB automatically detects the new node and increases replicas within the configured range:

  • Cluster size: 2 nodes

  • Effective replicas: 1

  • Result: One primary + one replica per shard → green health.

Check:

SELECT table_name, COUNT(*) FILTER (WHERE primary = false) AS replicas
FROM sys.shards
WHERE table_name = 'orders'
GROUP BY table_name;

3. Add a Third Node

  • Cluster size: 3 nodes

  • Effective replicas: 2

  • Result: One primary + two replicas spread across three nodes.

The system automatically balances shard placement to avoid putting multiple copies of the same shard on a single node.

4. Node Failure Scenario

If one node goes offline:

  • CrateDB promotes replicas to primaries as needed.

  • Missing replicas are rebuilt from the primaries once the failed node returns or another node is available.

Check replication health:

SELECT health, severity, missing_shards
FROM sys.health
WHERE table_name = 'orders';

5. Back to Full Health

Once the cluster regains enough nodes, CrateDB restores full replication automatically, within the 0-2 limit.

Outcome: By setting a replica range instead of a fixed count, replication adapts to cluster size without manual reconfiguration, improving both resilience and resource usage.

Last updated