Snapshots

CrateDB provides a cluster-wide backup and restore mechanism based on snapshots. Snapshots allow you to back up data safely with minimal disruption and restore it quickly in case of data loss, corruption, or infrastructure failure.

Table of Content


Overview

Snapshots in CrateDB work similarly to those in Elasticsearch, as both systems share underlying snapshot infrastructure. A snapshot represents a consistent backup of tables and partitions taken while the cluster is running—without requiring downtime.

You can use snapshots to:

  • Perform regular backups of your cluster

  • Recover data after accidental deletion or hardware failures

  • Transfer data between environments or clusters

  • Optimize storage costs by archiving older partitions to cold or frozen storage tiers


How It Works

CrateDB stores snapshots in an external location called a snapshot repository. You must register a repository before you can create or restore any snapshots.

Supported repository types:

  • Local filesystem

  • Amazon S3

  • Google Cloud Storage (GCS)

  • Microsoft Azure Blob Storage

Once a repository is registered, you can create snapshots of your entire cluster or specific tables.


Syntax Examples

1. Create a Repository

CREATE REPOSITORY backup
TYPE fs
WITH (
  location = '/mount/backups/',
  compress = false
);

This creates a repository named backup using the filesystem (fs) type.

2. Create a Snapshot

CREATE SNAPSHOT backup.snapshot1 ALL
WITH (
  wait_for_completion = true,
  ignore_unavailable = true
);

This creates a snapshot of all tables in the cluster and waits for completion before returning.

3. List Available Snapshots

SELECT repository, name, state
FROM sys.snapshots
ORDER BY repository, name;

4. Restore a Snapshot

RESTORE SNAPSHOT backup.snapshot1
TABLE quotes
WITH (
  wait_for_completion = true
);

This restores the quotes table from the snapshot snapshot1.


System Tables

CrateDB exposes snapshot metadata via system tables:

  • sys.repositories – Lists all registered repositories

  • sys.snapshots – Lists all created snapshots

  • sys.snapshots_restore – Shows ongoing or past restore operations


Usage Guidelines

  • Snapshots are incremental—only new or changed data is copied after the first snapshot, making repeated backups efficient.

  • Snapshot operations are non-blocking; your cluster remains available for read/write workloads.

  • Use snapshots as part of your disaster recovery and data migration strategies.

  • For large clusters or frequent snapshot operations, consider using remote object storage (e.g., S3, GCS) to decouple backups from local disks.

Last updated