BLOB store

CrateDB includes built-in support for storing binary large objects (BLOBs) through an HTTP-accessible object storage subsystem, similar in concept to AWS S3, but tightly integrated with CrateDB’s distributed architecture.


Overview

CrateDB's BLOB storage subsystem allows you to store, retrieve, and manage large binary files such as images, videos, documents, and other unstructured content. BLOBs are stored in dedicated BLOB tables, which can be sharded and replicated across the CrateDB cluster just like regular database tables.

You can interact with BLOBs via:

  • HTTP endpoints (for uploading, downloading, deleting)

  • CrateDB SQL interface (for management of metadata and table structure)

  • CrateDB client drivers, including the Python driver


Why Use CrateDB BLOB Storage?

  • Distributed by default: Files are automatically sharded and replicated across your CrateDB cluster for resilience and scalability.

  • HTTP access: Upload and download files using simple HTTP PUT/GET/DELETE operations.

  • Efficient deduplication: Files are stored based on their SHA-1 hash, avoiding duplicate storage.

  • Simple integration: Use alongside structured SQL data in the same platform.


Example: Creating a BLOB Table

sqlCopierModifierCREATE BLOB TABLE myblobs
CLUSTERED INTO 8 SHARDS
WITH (number_of_replicas = 3);
  • BLOB TABLE: Declares a table for storing binary objects.

  • CLUSTERED INTO 8 SHARDS: Distributes files across 8 shards.

  • number_of_replicas: Specifies the replication factor for high availability.


How It Works

Uploading a File

To store a file:

curl -X PUT \
  -T image.jpg \
  http://localhost:4200/_blobs/myblobs/5d41402abc4b2a76b9719d911017c592

Where 5d41402abc4b2a76b9719d911017c592 is the SHA-1 hash of the file contents.

Downloading a File

curl http://localhost:4200/_blobs/myblobs/5d41402abc4b2a76b9719d911017c592 --output image.jpg

Deleting a File

curl -X DELETE http://localhost:4200/_blobs/myblobs/5d41402abc4b2a76b9719d911017c592

Files are only removed when all replicas are deleted from the cluster.


Integration with Applications

CrateDB’s BLOB store is ideal for applications needing:

  • Media storage (e.g. images, audio, video)

  • Document archives (PDFs, spreadsheets)

  • IoT or sensor data logs (raw binary)

  • Any binary file needing scalable, fault-tolerant storage


Python Example: Uploading Files via the Python Client

CrateDB’s Python driver supports working with BLOBs:

from crate import client

connection = client.connect("http://localhost:4200", username="crate")
cursor = connection.cursor()

cursor.execute("CREATE BLOB TABLE IF NOT EXISTS files WITH (number_of_replicas=1)")

# Upload via HTTP using requests
import requests
import hashlib

file_path = "example.png"
with open(file_path, "rb") as f:
    content = f.read()
    digest = hashlib.sha1(content).hexdigest()
    url = f"http://localhost:4200/_blobs/files/{digest}"
    response = requests.put(url, data=content)
    print("Uploaded:", response.status_code == 201)

Key Concepts

Concept
Description

BLOB Table

A special table type for storing binary files

SHA-1 Hash

Used as the unique file identifier and storage key

HTTP API

REST-like interface for interacting with BLOBs

Deduplication

CrateDB stores each file only once, even if uploaded multiple times

Last updated