Snapshots

CrateDB provides a cluster-wide backup and restore mechanism based on snapshots. Snapshots allow you to back up data safely with minimal disruption and restore it quickly in case of data loss, corruption, or infrastructure failure.

Table of Content

Overview

Snapshots in CrateDB work similarly to those in Elasticsearch, as both systems share underlying snapshot infrastructure. A snapshot represents a consistent backup of tables and partitions taken while the cluster is running—without requiring downtime.

You can use snapshots to:

Perform regular backups of your cluster
Recover data after accidental deletion or hardware failures
Transfer data between environments or clusters
Optimize storage costs by archiving older partitions to cold or frozen storage tiers

How It Works

CrateDB stores snapshots in an external location called a snapshot repository. You must register a repository before you can create or restore any snapshots.

Supported repository types:

Local filesystem
Amazon S3
Google Cloud Storage (GCS)
Microsoft Azure Blob Storage

Once a repository is registered, you can create snapshots of your entire cluster or specific tables.

Syntax Examples

1. Create a Repository

CREATE REPOSITORY backup
TYPE fs
WITH (
  location = '/mount/backups/',
  compress = false
);

This creates a repository named backup using the filesystem (fs) type.

2. Create a Snapshot

CREATE SNAPSHOT backup.snapshot1 ALL
WITH (
  wait_for_completion = true,
  ignore_unavailable = true
);

This creates a snapshot of all tables in the cluster and waits for completion before returning.

3. List Available Snapshots

SELECT repository, name, state
FROM sys.snapshots
ORDER BY repository, name;

4. Restore a Snapshot

RESTORE SNAPSHOT backup.snapshot1
TABLE quotes
WITH (
  wait_for_completion = true
);

This restores the quotes table from the snapshot snapshot1.

System Tables

CrateDB exposes snapshot metadata via system tables:

sys.repositories – Lists all registered repositories
sys.snapshots – Lists all created snapshots
sys.snapshots_restore – Shows ongoing or past restore operations

Usage Guidelines

Snapshots are incremental—only new or changed data is copied after the first snapshot, making repeated backups efficient.
Snapshot operations are non-blocking; your cluster remains available for read/write workloads.
Use snapshots as part of your disaster recovery and data migration strategies.
For large clusters or frequent snapshot operations, consider using remote object storage (e.g., S3, GCS) to decouple backups from local disks.

PreviousCluster management NextCross-Cluster Replication

Last updated 3 months ago

Good evening