Sharding
In CrateDB, every table partition is divided into a configurable number of shards. These shards are distributed across the cluster nodes to optimize data distribution and query performance.
As you add more nodes to your cluster, CrateDB automatically redistributes shards to maintain an even balance across the cluster.
Sharding is completely transparent at the table level—you don’t need to worry about shards when running queries. CrateDB handles the parallel execution of read requests across multiple shards and nodes, greatly improving query performance and scalability.
Table of Contents
Number of Shards
You define the number of shards when creating a table using the CLUSTERED INTO <number> SHARDS
clause.
Example:
cr> CREATE TABLE my_table5 (
first_column INTEGER
) CLUSTERED INTO 10 SHARDS;
CREATE OK, 1 row affected (... sec)
If you don’t specify the number of shards explicitly, CrateDB applies a sensible default value.
See also: CREATE TABLE: CLUSTERED
Proper shard allocation is critical for performance. Make sure to read the Sharding Guide to optimize your cluster’s shard setup.
Routing
With a fixed number of primary shards, CrateDB routes each individual row to a specific shard based on a simple formula:
shard number = hash(routing_column) % total_primary_shards
When the hash function distributes values evenly (which usually happens), rows are evenly distributed across shards.
Routing Column
You specify the routing column with the CLUSTERED BY
clause when creating a table. Rows with the same routing column value will always be stored on the same shard.
If you define a primary key, it is used as the default routing column.
Otherwise, CrateDB uses an internal document ID for routing.
Example:
cr> CREATE TABLE my_table6 (
first_column INTEGER,
second_column TEXT
) CLUSTERED BY (first_column);
CREATE OK, 1 row affected (... sec)
If primary key constraints exist, the routing column can be omitted since primary key columns are automatically used for routing.
If you specify the routing column explicitly, it must be a primary key column.
Example with explicit routing column and multiple primary keys:
cr> CREATE TABLE my_table8 (
first_column INTEGER PRIMARY KEY,
second_column TEXT PRIMARY KEY,
third_column TEXT
) CLUSTERED BY (first_column);
CREATE OK, 1 row affected (... sec)
Combining Custom Routing and Shard Count
You can combine both the routing column and the number of shards during table creation:
cr> CREATE TABLE my_table9 (
first_column INTEGER PRIMARY KEY,
second_column TEXT PRIMARY KEY,
third_column TEXT
) CLUSTERED BY (first_column) INTO 10 SHARDS;
CREATE OK, 1 row affected (... sec)
Last updated