Sharding

In CrateDB, every table partition is divided into a configurable number of shards. These shards are distributed across the cluster nodes to optimize data distribution and query performance.

As you add more nodes to your cluster, CrateDB automatically redistributes shards to maintain an even balance across the cluster.

Non-partitioned tables behave like a single partition, but are still split into the configured number of shards.

Sharding is completely transparent at the table level—you don’t need to worry about shards when running queries. CrateDB handles the parallel execution of read requests across multiple shards and nodes, greatly improving query performance and scalability.

Table of Contents


Number of Shards

You define the number of shards when creating a table using the CLUSTERED INTO <number> SHARDS clause.

Example:

cr> CREATE TABLE my_table5 (
    first_column INTEGER
) CLUSTERED INTO 10 SHARDS;
CREATE OK, 1 row affected (... sec)

If you don’t specify the number of shards explicitly, CrateDB applies a sensible default value.

See also: CREATE TABLE: CLUSTERED

You can change the number of shards after table creation, but only to a multiple of the number_of_routing_shards (which is set at table creation, automatically or explicitly). Changing the shard count will temporarily put the table into a read-only state until the operation completes.


Routing

With a fixed number of primary shards, CrateDB routes each individual row to a specific shard based on a simple formula:

shard number = hash(routing_column) % total_primary_shards

When the hash function distributes values evenly (which usually happens), rows are evenly distributed across shards.

Routing Column

You specify the routing column with the CLUSTERED BY clause when creating a table. Rows with the same routing column value will always be stored on the same shard.

  • If you define a primary key, it is used as the default routing column.

  • Otherwise, CrateDB uses an internal document ID for routing.

Example:

cr> CREATE TABLE my_table6 (
    first_column INTEGER,
    second_column TEXT
) CLUSTERED BY (first_column);
CREATE OK, 1 row affected (... sec)

If primary key constraints exist, the routing column can be omitted since primary key columns are automatically used for routing.

Example with explicit routing column and multiple primary keys:

cr> CREATE TABLE my_table8 (
    first_column INTEGER PRIMARY KEY,
    second_column TEXT PRIMARY KEY,
    third_column TEXT
) CLUSTERED BY (first_column);
CREATE OK, 1 row affected (... sec)

Combining Custom Routing and Shard Count

You can combine both the routing column and the number of shards during table creation:

cr> CREATE TABLE my_table9 (
    first_column INTEGER PRIMARY KEY,
    second_column TEXT PRIMARY KEY,
    third_column TEXT
) CLUSTERED BY (first_column) INTO 10 SHARDS;
CREATE OK, 1 row affected (... sec)

Last updated