Partitioning
A partitioned table is a virtual table composed of zero or more partitions, where each partition is itself backed by one or more shards.
Conceptual structure:
partitioned_table
├── partition 1
│ ├── shard 0
│ └── shard 1
└── partition 2
├── shard 0
└── shard 1A table becomes partitioned when one or more partition columns are defined. When a row with a new unique combination of partition column values is inserted, a new partition is automatically created.
Partitioned tables can be queried like regular tables, but offer additional flexibility for scaling and data lifecycle management.
Advantages
Flexible scaling – You can adjust the number of shards for future partitions without affecting existing ones.
Selective backup/restore – Partitions can be individually backed up and restored.
Partition pruning – Queries filtering by partition column(s) skip irrelevant partitions, reducing cost.
Efficient deletes – Dropping entire partitions is much faster than row-by-row deletes.
Disadvantages
Poor partition key choice can cause too many shards, reducing stability and performance.
Stale partitions – Deletes that cannot drop full partitions leave empty ones behind.
Limited optimizations – Some optimizations (e.g.,
GROUP BYonCLUSTERED BYcolumns) don’t apply to partitioned tables.
Partition names include base32-encoded partition column values, schema name, table name, and an internal overhead of 14 bytes.
The total table name length must not exceed 255 bytes.
Every partition has its own set of shards. Proper shard allocation is crucial. See the Sharding Guide for best practices.
Table of Contents
Creating Partitioned Tables
Use CREATE TABLE with the PARTITIONED BY clause:
This creates an empty partitioned table. Partitions are created on-demand when rows are inserted.
Using generated columns for partitioning:
This avoids having to compute partition values client-side.
Inspecting Partitioned Tables
Partitioned tables appear in information_schema.tables with a non-null partitioned_by column.
List partitions:
Inserting Data
On insert, if the PARTITIONED BY column(s) match an existing partition, data goes into that partition; otherwise, a new partition is created.
Subsequent inserts with the same day value will reuse the existing partition.
Updating Data
You cannot update partitioned-by columns (or any columns used in the generation of a partitioned-by column):
Other columns can be updated normally.
Deleting Data
If the WHERE clause targets all rows of a partition, CrateDB drops the entire partition instead of deleting row by row (much faster):
Querying Partitioned Tables
Queries are automatically partition-pruned if possible, based on the WHERE clause.
Example:
This query will only touch the partition for day='1970-01-01':
CrateDB evaluates the WHERE clause before execution to identify relevant partitions.
Partitioning by Generated Columns
When using generated columns as partition keys, CrateDB optimizes queries in the same way—inferring the minimum required partitions.
Altering Partitioned Tables
You can alter schema and parameters for partitioned tables as you would for regular tables.
Changing the Number of Shards
You can adjust number_of_shards for future partitions at any time:
Existing partitions keep their current shard count.
Altering a Single Partition
You can change parameters for a specific partition:
Changing Parameters
You can alter settings like number_of_replicas for specific partitions, but cannot alter their schema.
Altering Only New Partitions
Use ONLY to alter settings for future partitions without affecting existing ones:
Closing and Opening Partitions
You can close or open a single partition just like a regular table:
Closed partitions are excluded from queries.
Limitations
WHEREclauses cannot mix partitioned-by columns and regular columns in the sameORcondition:
Consistency Notes
If a partition is deleted during an active bulk insert/update, it will not be re-created automatically.
The reported number of affected rows always reflects the actual inserted/updated count.
Last updated

