Partitioning
A partitioned table is a virtual table composed of zero or more partitions, where each partition is itself backed by one or more shards.
Conceptual structure:
partitioned_table
├── partition 1
│ ├── shard 0
│ └── shard 1
└── partition 2
├── shard 0
└── shard 1
A table becomes partitioned when one or more partition columns are defined. When a row with a new unique combination of partition column values is inserted, a new partition is automatically created.
Partitioned tables can be queried like regular tables, but offer additional flexibility for scaling and data lifecycle management.
Advantages
Flexible scaling – You can adjust the number of shards for future partitions without affecting existing ones.
Selective backup/restore – Partitions can be individually backed up and restored.
Partition pruning – Queries filtering by partition column(s) skip irrelevant partitions, reducing cost.
Efficient deletes – Dropping entire partitions is much faster than row-by-row deletes.
Disadvantages
Poor partition key choice can cause too many shards, reducing stability and performance.
Stale partitions – Deletes that cannot drop full partitions leave empty ones behind.
Limited optimizations – Some optimizations (e.g.,
GROUP BY
onCLUSTERED BY
columns) don’t apply to partitioned tables.
Table of Contents
Creating Partitioned Tables
Use CREATE TABLE
with the PARTITIONED BY
clause:
CREATE TABLE parted_table (
id BIGINT,
title TEXT,
content TEXT,
width DOUBLE PRECISION,
day TIMESTAMP WITH TIME ZONE
) CLUSTERED BY (title) INTO 4 SHARDS
PARTITIONED BY (day);
This creates an empty partitioned table. Partitions are created on-demand when rows are inserted.
Using generated columns for partitioning:
CREATE TABLE computed_parted_table (
id BIGINT,
data DOUBLE PRECISION,
created_at TIMESTAMP WITH TIME ZONE,
month TIMESTAMP WITH TIME ZONE
GENERATED ALWAYS AS date_trunc('month', created_at)
) PARTITIONED BY (month);
This avoids having to compute partition values client-side.
Inspecting Partitioned Tables
Partitioned tables appear in information_schema.tables
with a non-null partitioned_by
column.
SELECT table_schema, table_name, number_of_shards, partitioned_by
FROM information_schema.tables
WHERE table_name='parted_table';
List partitions:
SELECT *
FROM information_schema.table_partitions
WHERE table_schema='doc' AND table_name='parted_table';
Inserting Data
On insert, if the PARTITIONED BY
column(s) match an existing partition, data goes into that partition; otherwise, a new partition is created.
INSERT INTO parted_table (id, title, width, day)
VALUES (1, 'Don''t Panic', 19.5, '2014-04-08');
Subsequent inserts with the same day
value will reuse the existing partition.
Updating Data
You cannot update partitioned-by columns (or any columns used in the generation of a partitioned-by column):
UPDATE parted_table
SET day = '2014-04-07'
WHERE id = 1;
-- Error: Updating a partitioned-by column is not supported
Other columns can be updated normally.
Deleting Data
If the WHERE
clause targets all rows of a partition, CrateDB drops the entire partition instead of deleting row by row (much faster):
DELETE FROM parted_table
WHERE day = 1396915200000;
Querying Partitioned Tables
Queries are automatically partition-pruned if possible, based on the WHERE
clause.
Example:
This query will only touch the partition for day='1970-01-01'
:
SELECT COUNT(*) FROM parted_table
WHERE day='1970-01-01';
CrateDB evaluates the WHERE
clause before execution to identify relevant partitions.
Partitioning by Generated Columns
When using generated columns as partition keys, CrateDB optimizes queries in the same way—inferring the minimum required partitions.
SELECT id, date_format('%Y-%m', month) AS m
FROM computed_parted_table
WHERE created_at = '2015-11-16T13:27:00.000Z';
Altering Partitioned Tables
You can alter schema and parameters for partitioned tables as you would for regular tables.
ALTER TABLE parted_table
SET (number_of_replicas = '0-all');
ALTER TABLE parted_table
ADD COLUMN new_col TEXT;
Changing the Number of Shards
You can adjust number_of_shards
for future partitions at any time:
ALTER TABLE parted_table
SET (number_of_shards = 10);
Existing partitions keep their current shard count.
Altering a Single Partition
You can change parameters for a specific partition:
ALTER TABLE parted_table
PARTITION (day=1396915200000)
SET (number_of_shards = 5);
Changing Parameters
You can alter settings like number_of_replicas
for specific partitions, but cannot alter their schema.
ALTER TABLE parted_table
PARTITION (day=1396915200000)
RESET (number_of_replicas);
Altering Only New Partitions
Use ONLY
to alter settings for future partitions without affecting existing ones:
ALTER TABLE ONLY parted_table
SET (number_of_replicas = 1);
Closing and Opening Partitions
You can close or open a single partition just like a regular table:
sqlCopierModifierALTER TABLE parted_table
PARTITION (day=1396915200000) CLOSE;
ALTER TABLE parted_table
PARTITION (day=1396915200000) OPEN;
Closed partitions are excluded from queries.
Limitations
WHERE
clauses cannot mix partitioned-by columns and regular columns in the sameOR
condition:WHERE partition_col = 'x' OR normal_col = x -- Not supported
Consistency Notes
If a partition is deleted during an active bulk insert/update, it will not be re-created automatically.
The reported number of affected rows always reflects the actual inserted/updated count.
Last updated