Rolling Upgrade
CrateDB supports rolling upgrades to help you upgrade your cluster with zero downtime, by upgrading one node at a time.
A rolling upgrade is possible only between compatible versions—typically between consecutive feature releases. Some examples:
✅ You can do a rolling upgrade from X.5.z
to X.6.0
✅ You can do a rolling upgrade from last X version
release to first (X+1)
version release
❌ You cannot upgrade directly from X.5.x
to X.8.x
unless explicitly stated in the release notes
Warning: Rolling upgrades are only supported for stable releases. If you are upgrading to a testing version, you must perform a full cluster restart. Always consult the release notes of your target version for specific upgrade guidance.
How It Works
Rolling upgrades involve stopping and upgrading one node at a time using CrateDB’s graceful stop mechanism. This ensures ongoing operations complete before the node shuts down.
Graceful Stop Behavior
The node stops accepting new requests
It completes all in-progress operations
It then reallocates shards based on your availability configuration
Note: Due to CrateDB’s distributed nature, some client requests may fail temporarily during a rolling upgrade.
Data Availability Options
CrateDB offers three levels of minimum data availability during a graceful stop, configurable via the cluster.graceful_stop.min_availability
setting:
full
All primary and replica shards are moved off the node
Cluster stays green
primaries
Only primary shards are moved; replicas stay
Cluster may go yellow
none
No guarantees; node stops even if data becomes temporarily unavailable
Cluster may go red
Requirements
For full
Minimum Availability
full
Minimum AvailabilityYour cluster must have enough nodes and disk space to hold the full replica count even after one node shuts down.
Rule of thumb:
number_of_nodes > max_number_of_replicas + 1
Examples:
If a table has
1
replica, you need at least 3 nodesIf a table allows a range of replicas (e.g.,
0-1
), CrateDB uses the maximum number for allocation logic
If the requirements are not met, the graceful stop will fail.
For primaries
Minimum Availability
primaries
Minimum AvailabilityEnsure that enough shards remain to maintain write consistency
By default, CrateDB requires a quorum of active shards:
quorum = floor(replicas / 2) + 1
Note: If a table has 1
replica, a single active shard (primary or replica) is enough for writes to succeed.
Rolling Upgrade Procedure
Warning: Before starting, back up your data using snapshots.
Step 1: Disable Allocations (Optional)
To prevent CrateDB from reallocating shards while nodes are offline, temporarily restrict routing:
sqlCopierModifierSET GLOBAL TRANSIENT "cluster.routing.allocation.enable" = 'new_primaries';
Step 2: Gracefully Stop the Node
Use the DECOMMISSION
SQL command to initiate a graceful shutdown:
Moves shards off the node according to the
min_availability
settingEnsures ongoing operations complete before the node shuts down
Monitor Reallocation
You can track shard reallocation progress with:
-- Remaining shards on the node
SELECT count(*) AS remaining_shards
FROM sys.shards
WHERE _node['name'] = 'your_node_name';
-- Detailed view
SELECT schema_name AS schema, table_name AS "table", id, "primary", state
FROM sys.shards
WHERE _node['name'] = 'your_node_name' AND schema_name IN ('blob', 'doc')
ORDER BY schema, "table", id, "primary", state;
-- Tables with 0 replicas (primaries that will be moved)
SELECT table_schema AS schema, table_name AS "table"
FROM information_schema.tables
WHERE number_of_replicas = 0 AND table_schema IN ('blob', 'doc')
ORDER BY schema, "table";
Step 3: Upgrade CrateDB
Once the node is stopped, perform the upgrade using your preferred method.
Examples:
Tarball:
/path/to/bin/crate
RHEL/YUM:
yum update -y crate
Refer to your OS or package manager documentation for specific upgrade instructions.
Step 4: Restart the Node
After upgrading, restart CrateDB:
Tarball:
/path/to/bin/crate
RHEL/YUM:
service crate start
Step 5: Repeat for All Nodes
Repeat steps 2–4 for each remaining node in your cluster.
Step 6: Re-enable Allocations
Once all nodes are upgraded and running:
SET GLOBAL TRANSIENT "cluster.routing.allocation.enable" = 'all';
Final Notes
Always test the upgrade in a staging environment first
Monitor logs and metrics during and after each upgrade step
Consider enabling alerts for cluster health changes during upgrades
If using snapshots, verify their validity before beginning the upgrade
Last updated