Primary key strategies

CrateDB is built for horizontal scalability and high ingestion throughput. To achieve this, operations must complete independently on each node—without central coordination. This design choice means CrateDB does not support traditional auto-incrementing primary key types like SERIAL in PostgreSQL or MySQL by default.

This page explains why that is and walks you through five common alternatives to generate unique primary key values in CrateDB, including a recipe to implement your own auto-incrementing sequence mechanism when needed.

Why Auto-Increment Doesn't Exist in CrateDB

In traditional RDBMS systems, auto-increment fields rely on a central counter. In a distributed system like CrateDB, this would create a global coordination bottleneck, limiting insert throughput and reducing scalability.

Instead, CrateDB provides flexibility: you can choose a primary key strategy tailored to your use case, whether for strict uniqueness, time ordering, or external system integration.

Primary Key Strategies in CrateDB

1. Use a Timestamp as a Primary Key

BIGINT DEFAULT now() PRIMARY KEY

Pros

Auto-generated, always-increasing value
Useful when records are timestamped anyway

Cons

Can result in gaps
Collisions possible if multiple records are created in the same millisecond

2. Use UUIDs (v4)

TEXT DEFAULT gen_random_text_uuid() PRIMARY KEY

Pros

Universally unique
No conflicts when merging from multiple environments or sources

Cons

Not ordered
Harder to read/debug
No efficient range queries

Use UUIDv7 for Time-Ordered IDs

UUIDv7 is a new format that preserves temporal ordering, making them better suited for distributed inserts and range queries.

You can use UUIDv7 in CrateDB via a User-Defined Function (UDF), based on your preferred language.

Pros

Globally unique and almost sequential
Range queries possible

Cons

Not human-friendly
Slight overhead due to UDF use

4. Use External System IDs

If you're ingesting data from a source system that already generates unique IDs, you can reuse those:

No need for CrateDB to generate anything
Ensures consistency across systems

See Replicating data from other databases to CrateDB with Debezium and Kafka for an example.

5. Implement a Custom Sequence Table

If you must have an auto-incrementing numeric ID (e.g., for compatibility or legacy reasons), you can implement a simple sequence generator using a dedicated table and client-side logic.

Step 1: Create a sequence tracking table

CREATE TABLE sequences (
    name TEXT PRIMARY KEY,
    last_value BIGINT
) CLUSTERED INTO 1 SHARDS;

Step 2: Initialize your sequence

INSERT INTO sequences (name, last_value)
VALUES ('mysequence', 0);

Step 3: Create a target table

CREATE TABLE mytable (
    id BIGINT PRIMARY KEY,
    field1 TEXT
);

Step 4: Generate and use sequence values in Python

Use optimistic concurrency control to generate unique, incrementing values even in parallel ingestion scenarios:

# Requires: records, sqlalchemy-cratedb
import time
import records

db = records.Database("crate://")
sequence_name = "mysequence"

max_retries = 5
base_delay = 0.1  # 100 milliseconds

for attempt in range(max_retries):
    select_query = """
        SELECT last_value, _seq_no, _primary_term
        FROM sequences
        WHERE name = :sequence_name;
    """
    row = db.query(select_query, sequence_name=sequence_name).first()
    new_value = row.last_value + 1

    update_query = """
        UPDATE sequences
        SET last_value = :new_value
        WHERE name = :sequence_name
          AND _seq_no = :seq_no
          AND _primary_term = :primary_term
        RETURNING last_value;
    """
    result = db.query(
        update_query,
        new_value=new_value,
        sequence_name=sequence_name,
        seq_no=row._seq_no,
        primary_term=row._primary_term
    ).all()

    if result:
        break

    delay = base_delay * (2**attempt)
    print(f"Attempt {attempt + 1} failed. Retrying in {delay:.1f} seconds...")
    time.sleep(delay)
else:
    raise Exception("Failed to acquire sequence after multiple retries.")

insert_query = "INSERT INTO mytable (id, field1) VALUES (:id, :field1)"
db.query(insert_query, id=new_value, field1="abc")
db.close()

Pros

Fully customizable (you can add prefixes, adjust increment size, etc.)
Sequential IDs possible

Cons

More complex client logic required
The sequence table may become a bottleneck at very high ingestion rates

Summary

Strategy

Ordered

Unique

Scalable

Human-Friendly

Range Queries

Notes

Timestamp

✅

⚠️

✅

Potential collisions

UUID (v4)

❌

✅

❌

Default UUIDs

UUIDv7

✅

❌

✅

Requires UDF

External System IDs

✅/❌

✅

Depends on source

Sequence Table

✅

⚠️

✅

Manual retry logic

PreviousConstraints NextColumn policy

Last updated 4 months ago

Good evening

Why Auto-Increment Doesn't Exist in CrateDB

Primary Key Strategies in CrateDB

1. Use a Timestamp as a Primary Key

2. Use UUIDs (v4)

Use UUIDv7 for Time-Ordered IDs

4. Use External System IDs

5. Implement a Custom Sequence Table

Summary