Sharding

In CrateDB, every table partition is divided into a configurable number of shards. These shards are distributed across the cluster nodes to optimize data distribution and query performance.

As you add more nodes to your cluster, CrateDB automatically redistributes shards to maintain an even balance across the cluster.

circle-info

Non-partitioned tables behave like a single partition, but are still split into the configured number of shards.

Sharding is completely transparent at the table level—you don’t need to worry about shards when running queries. CrateDB handles the parallel execution of read requests across multiple shards and nodes, greatly improving query performance and scalability.

Table of Contents


Number of Shards

You define the number of shards when creating a table using the CLUSTERED INTO <number> SHARDS clause.

Example:

cr> CREATE TABLE my_table5 (
    first_column INTEGER
) CLUSTERED INTO 10 SHARDS;
CREATE OK, 1 row affected (... sec)

If you don’t specify the number of shards explicitly, CrateDB applies a sensible default value.

See also: CREATE TABLE: CLUSTERED

circle-info

You can change the number of shards after table creation, but only to a multiple of the number_of_routing_shards (which is set at table creation, automatically or explicitly). Changing the shard count will temporarily put the table into a read-only state until the operation completes.

triangle-exclamation

Routing

With a fixed number of primary shards, CrateDB routes each individual row to a specific shard based on a simple formula:

When the hash function distributes values evenly (which usually happens), rows are evenly distributed across shards.

Routing Column

You specify the routing column with the CLUSTERED BY clause when creating a table. Rows with the same routing column value will always be stored on the same shard.

  • If you define a primary key, it is used as the default routing column.

  • Otherwise, CrateDB uses an internal document ID for routing.

Example:

If primary key constraints exist, the routing column can be omitted since primary key columns are automatically used for routing.

circle-exclamation

Example with explicit routing column and multiple primary keys:

Combining Custom Routing and Shard Count

You can combine both the routing column and the number of shards during table creation:

Last updated