Rolling Upgrade

CrateDB supports rolling upgrades to help you upgrade your cluster with zero downtime, by upgrading one node at a time.

A rolling upgrade is possible only between compatible versions—typically between consecutive feature releases. Some examples:

✅ You can do a rolling upgrade from X.5.z to X.6.0 ✅ You can do a rolling upgrade from last X version release to first (X+1) version release ❌ You cannot upgrade directly from X.5.x to X.8.x unless explicitly stated in the release notes

Warning: Rolling upgrades are only supported for stable releases. If you are upgrading to a testing version, you must perform a full cluster restart. Always consult the release notes of your target version for specific upgrade guidance.

How It Works

Rolling upgrades involve stopping and upgrading one node at a time using CrateDB’s graceful stop mechanism. This ensures ongoing operations complete before the node shuts down.

Graceful Stop Behavior

The node stops accepting new requests
It completes all in-progress operations
It then reallocates shards based on your availability configuration

Note: Due to CrateDB’s distributed nature, some client requests may fail temporarily during a rolling upgrade.

Data Availability Options

CrateDB offers three levels of minimum data availability during a graceful stop, configurable via the cluster.graceful_stop.min_availability setting:

Setting

Behavior

Cluster Health Impact

full

All primary and replica shards are moved off the node

Cluster stays green

primaries

Only primary shards are moved; replicas stay

Cluster may go yellow

none

No guarantees; node stops even if data becomes temporarily unavailable

Cluster may go red

Requirements

For `full` Minimum Availability

Your cluster must have enough nodes and disk space to hold the full replica count even after one node shuts down.

Rule of thumb: number_of_nodes > max_number_of_replicas + 1

Examples:

If a table has 1 replica, you need at least 3 nodes
If a table allows a range of replicas (e.g., 0-1), CrateDB uses the maximum number for allocation logic

If the requirements are not met, the graceful stop will fail.

For `primaries` Minimum Availability

Ensure that enough shards remain to maintain write consistency
By default, CrateDB requires a quorum of active shards: quorum = floor(replicas / 2) + 1

Note: If a table has 1 replica, a single active shard (primary or replica) is enough for writes to succeed.

Rolling Upgrade Procedure

Warning: Before starting, back up your data using snapshots.

Step 1: Disable Allocations (Optional)

To prevent CrateDB from reallocating shards while nodes are offline, temporarily restrict routing:

sqlCopierModifierSET GLOBAL TRANSIENT "cluster.routing.allocation.enable" = 'new_primaries';

Skip this step if you are using min_availability = full, as CrateDB will handle shard movement internally.

Step 2: Gracefully Stop the Node

Use the DECOMMISSION SQL command to initiate a graceful shutdown:

Moves shards off the node according to the min_availability setting
Ensures ongoing operations complete before the node shuts down

Avoid stopping nodes using TERM signals (e.g., Ctrl+C or systemctl stop) unless you want a non-graceful shutdown

Monitor Reallocation

You can track shard reallocation progress with:

-- Remaining shards on the node
SELECT count(*) AS remaining_shards
FROM sys.shards
WHERE _node['name'] = 'your_node_name';

-- Detailed view
SELECT schema_name AS schema, table_name AS "table", id, "primary", state
FROM sys.shards
WHERE _node['name'] = 'your_node_name' AND schema_name IN ('blob', 'doc')
ORDER BY schema, "table", id, "primary", state;

-- Tables with 0 replicas (primaries that will be moved)
SELECT table_schema AS schema, table_name AS "table"
FROM information_schema.tables
WHERE number_of_replicas = 0 AND table_schema IN ('blob', 'doc')
ORDER BY schema, "table";

Note: When using the Admin UI, you may briefly see a red cluster state during shutdown. This is usually a UI timing artifact, not an actual failure.

Step 3: Upgrade CrateDB

Once the node is stopped, perform the upgrade using your preferred method.

Examples:

Tarball:

/path/to/bin/crate

RHEL/YUM:

yum update -y crate

Refer to your OS or package manager documentation for specific upgrade instructions.

Step 4: Restart the Node

After upgrading, restart CrateDB:

Tarball:

/path/to/bin/crate

RHEL/YUM:

service crate start

Step 5: Repeat for All Nodes

Repeat steps 2–4 for each remaining node in your cluster.

Step 6: Re-enable Allocations

Once all nodes are upgraded and running:

SET GLOBAL TRANSIENT "cluster.routing.allocation.enable" = 'all';

Final Notes

Always test the upgrade in a staging environment first
Monitor logs and metrics during and after each upgrade step
Consider enabling alerts for cluster health changes during upgrades
If using snapshots, verify their validity before beginning the upgrade

PreviousPlanning NextFull Restart Upgrade

Last updated 3 months ago

Good evening

How It Works

Graceful Stop Behavior

Data Availability Options

Requirements

For full Minimum Availability

For primaries Minimum Availability

Rolling Upgrade Procedure

Step 1: Disable Allocations (Optional)

Step 2: Gracefully Stop the Node

Monitor Reallocation

Step 3: Upgrade CrateDB

Step 4: Restart the Node

Step 5: Repeat for All Nodes

Step 6: Re-enable Allocations

Final Notes

For `full` Minimum Availability

For `primaries` Minimum Availability