Snapshots
CrateDB provides a cluster-wide backup and restore mechanism based on snapshots. Snapshots allow you to back up data safely with minimal disruption and restore it quickly in case of data loss, corruption, or infrastructure failure.
Table of Content
Overview
Snapshots in CrateDB work similarly to those in Elasticsearch, as both systems share underlying snapshot infrastructure. A snapshot represents a consistent backup of tables and partitions taken while the cluster is running—without requiring downtime.
You can use snapshots to:
Perform regular backups of your cluster
Recover data after accidental deletion or hardware failures
Transfer data between environments or clusters
Optimize storage costs by archiving older partitions to cold or frozen storage tiers
How It Works
CrateDB stores snapshots in an external location called a snapshot repository. You must register a repository before you can create or restore any snapshots.
Supported repository types:
Local filesystem
Amazon S3
Google Cloud Storage (GCS)
Microsoft Azure Blob Storage
Once a repository is registered, you can create snapshots of your entire cluster or specific tables.
Syntax Examples
1. Create a Repository
CREATE REPOSITORY backup
TYPE fs
WITH (
location = '/mount/backups/',
compress = false
);
This creates a repository named backup
using the filesystem (fs
) type.
2. Create a Snapshot
CREATE SNAPSHOT backup.snapshot1 ALL
WITH (
wait_for_completion = true,
ignore_unavailable = true
);
This creates a snapshot of all tables in the cluster and waits for completion before returning.
3. List Available Snapshots
SELECT repository, name, state
FROM sys.snapshots
ORDER BY repository, name;
4. Restore a Snapshot
RESTORE SNAPSHOT backup.snapshot1
TABLE quotes
WITH (
wait_for_completion = true
);
This restores the quotes
table from the snapshot snapshot1
.
System Tables
CrateDB exposes snapshot metadata via system tables:
sys.repositories
– Lists all registered repositoriessys.snapshots
– Lists all created snapshotssys.snapshots_restore
– Shows ongoing or past restore operations
Usage Guidelines
Snapshots are incremental—only new or changed data is copied after the first snapshot, making repeated backups efficient.
Snapshot operations are non-blocking; your cluster remains available for read/write workloads.
Use snapshots as part of your disaster recovery and data migration strategies.
For large clusters or frequent snapshot operations, consider using remote object storage (e.g., S3, GCS) to decouple backups from local disks.
Last updated