BLOB store
CrateDB includes built-in support for storing binary large objects (BLOBs) through an HTTP-accessible object storage subsystem, similar in concept to AWS S3, but tightly integrated with CrateDB’s distributed architecture.
Overview
CrateDB's BLOB storage subsystem allows you to store, retrieve, and manage large binary files such as images, videos, documents, and other unstructured content. BLOBs are stored in dedicated BLOB tables, which can be sharded and replicated across the CrateDB cluster just like regular database tables.
You can interact with BLOBs via:
HTTP endpoints (for uploading, downloading, deleting)
CrateDB SQL interface (for management of metadata and table structure)
CrateDB client drivers, including the Python driver
Why Use CrateDB BLOB Storage?
Distributed by default: Files are automatically sharded and replicated across your CrateDB cluster for resilience and scalability.
HTTP access: Upload and download files using simple HTTP PUT/GET/DELETE operations.
Efficient deduplication: Files are stored based on their SHA-1 hash, avoiding duplicate storage.
Simple integration: Use alongside structured SQL data in the same platform.
Example: Creating a BLOB Table
sqlCopierModifierCREATE BLOB TABLE myblobs
CLUSTERED INTO 8 SHARDS
WITH (number_of_replicas = 3);
BLOB TABLE
: Declares a table for storing binary objects.CLUSTERED INTO 8 SHARDS
: Distributes files across 8 shards.number_of_replicas
: Specifies the replication factor for high availability.
How It Works
Uploading a File
To store a file:
curl -X PUT \
-T image.jpg \
http://localhost:4200/_blobs/myblobs/5d41402abc4b2a76b9719d911017c592
Where 5d41402abc4b2a76b9719d911017c592
is the SHA-1 hash of the file contents.
Downloading a File
curl http://localhost:4200/_blobs/myblobs/5d41402abc4b2a76b9719d911017c592 --output image.jpg
Deleting a File
curl -X DELETE http://localhost:4200/_blobs/myblobs/5d41402abc4b2a76b9719d911017c592
Files are only removed when all replicas are deleted from the cluster.
Integration with Applications
CrateDB’s BLOB store is ideal for applications needing:
Media storage (e.g. images, audio, video)
Document archives (PDFs, spreadsheets)
IoT or sensor data logs (raw binary)
Any binary file needing scalable, fault-tolerant storage
Python Example: Uploading Files via the Python Client
CrateDB’s Python driver supports working with BLOBs:
from crate import client
connection = client.connect("http://localhost:4200", username="crate")
cursor = connection.cursor()
cursor.execute("CREATE BLOB TABLE IF NOT EXISTS files WITH (number_of_replicas=1)")
# Upload via HTTP using requests
import requests
import hashlib
file_path = "example.png"
with open(file_path, "rb") as f:
content = f.read()
digest = hashlib.sha1(content).hexdigest()
url = f"http://localhost:4200/_blobs/files/{digest}"
response = requests.put(url, data=content)
print("Uploaded:", response.status_code == 201)
Key Concepts
BLOB Table
A special table type for storing binary files
SHA-1 Hash
Used as the unique file identifier and storage key
HTTP API
REST-like interface for interacting with BLOBs
Deduplication
CrateDB stores each file only once, even if uploaded multiple times
Last updated