Relational data (Ivan - Live)

Status: Content copied to PR #270arrow-up-right as of Sep-2. See previewarrow-up-right. The content has been updated since!


CrateDB is a distributed SQL database that offers rich relational data modeling with the flexibility of dynamic schemas and the scalability of NoSQL systems. It supports primary keys, joins, aggregations, and subqueries, just like traditional RDBMS systems—while also enabling hybrid use cases with time-series, geospatial, full-text, vector search, and semi-structured data.

Use CrateDB when you need to scale relational workloads horizontally while keeping the simplicity of SQL.


Table Definitions

CrateDB supports strongly typed relational schemas using familiar SQL syntax:

CREATE TABLE customers (
  id         TEXT DEFAULT gen_random_text_uuid() PRIMARY KEY,
  name       TEXT,
  email      TEXT,
  created_at TIMESTAMP DEFAULT now()
);

Key Features:

  • Supports scalar types (TEXT, INTEGER, DOUBLE, BOOLEAN, TIMESTAMP, etc.)

  • gen_random_text_uuid(), now() or current_timestamp() recommended for primary keys in distributed environments

  • Default replication, sharding, and partitioning options are built-in for scale

circle-info

CrateDB supports column_policy = 'dynamic' if you want to mix relational and semi-structured models (like JSON) in the same table.


Joins & Relationships

CrateDB supports inner joins, left/right joins, cross joins, outer joins, and even self joins.

Example: Join Customers and Orders

Joins are executed efficiently across shards in a distributed query planner that parallelizes execution.


Normalization vs. Embedding

CrateDB supports both normalized (relational) and denormalized (embedded JSON) approaches.

  • For strict referential integrity and modularity: use normalized tables with joins.

  • For performance in high-ingest or read-optimized workloads: embed reference data as nested JSON.

Example: Embedded products inside an orders table:

circle-info

CrateDB lets you query nested fields directly using bracket notation: items['name'], items['price'], etc.


Aggregations & Grouping

Use familiar SQL aggregation functions (SUM, AVG, COUNT, MIN, MAX) with GROUP BY, HAVING, WINDOW FUNCTIONS ... etc.

circle-info

CrateDB's columnar storage optimizes performance for aggregations—even on large datasets.


Constraints & Indexing

CrateDB supports:

  • Primary Keys – enforced for uniqueness and data distribution

  • Check - enforces custom value validation

  • Indexes – automatic index for all columns

  • Full-text indexes - manually defined, supports many tokenizers, analyzers and filters

In CrateDB every column is indexed by default, depending on the datatype a different index is used, indexing is controlled and maintained by the database, there is no need to vacuum or re-index like in other systems. Indexing can be manually turned off.


Views & Subqueries

CrateDB supports views, CTEs, and nested subqueries.

Example: Reusable View

Example: Correlated Subquery

Example: Common table expression


Further Learning & Resources

Last updated