Partitioning

A partitioned table is a virtual table composed of zero or more partitions, where each partition is itself backed by one or more shards.

Conceptual structure:

partitioned_table
  ├── partition 1
  │     ├── shard 0
  │     └── shard 1
  └── partition 2
        ├── shard 0
        └── shard 1

A table becomes partitioned when one or more partition columns are defined. When a row with a new unique combination of partition column values is inserted, a new partition is automatically created.

Partitioned tables can be queried like regular tables, but offer additional flexibility for scaling and data lifecycle management.

Advantages

Flexible scaling – You can adjust the number of shards for future partitions without affecting existing ones.
Selective backup/restore – Partitions can be individually backed up and restored.
Partition pruning – Queries filtering by partition column(s) skip irrelevant partitions, reducing cost.
Efficient deletes – Dropping entire partitions is much faster than row-by-row deletes.

Disadvantages

Poor partition key choice can cause too many shards, reducing stability and performance.
Stale partitions – Deletes that cannot drop full partitions leave empty ones behind.
Limited optimizations – Some optimizations (e.g., GROUP BY on CLUSTERED BY columns) don’t apply to partitioned tables.

Partition names include base32-encoded partition column values, schema name, table name, and an internal overhead of 14 bytes.

The total table name length must not exceed 255 bytes.

Every partition has its own set of shards. Proper shard allocation is crucial. See the Sharding Guide for best practices.

Creating Partitioned Tables

Use CREATE TABLE with the PARTITIONED BY clause:

CREATE TABLE parted_table (
  id BIGINT,
  title TEXT,
  content TEXT,
  width DOUBLE PRECISION,
  day TIMESTAMP WITH TIME ZONE
) CLUSTERED BY (title) INTO 4 SHARDS
  PARTITIONED BY (day);

This creates an empty partitioned table. Partitions are created on-demand when rows are inserted.

Using generated columns for partitioning:

CREATE TABLE computed_parted_table (
  id BIGINT,
  data DOUBLE PRECISION,
  created_at TIMESTAMP WITH TIME ZONE,
  month TIMESTAMP WITH TIME ZONE
    GENERATED ALWAYS AS date_trunc('month', created_at)
) PARTITIONED BY (month);

This avoids having to compute partition values client-side.

Inspecting Partitioned Tables

Partitioned tables appear in information_schema.tables with a non-null partitioned_by column.

SELECT table_schema, table_name, number_of_shards, partitioned_by
FROM information_schema.tables
WHERE table_name='parted_table';

List partitions:

SELECT *
FROM information_schema.table_partitions
WHERE table_schema='doc' AND table_name='parted_table';

Inserting Data

On insert, if the PARTITIONED BY column(s) match an existing partition, data goes into that partition; otherwise, a new partition is created.

INSERT INTO parted_table (id, title, width, day)
VALUES (1, 'Don''t Panic', 19.5, '2014-04-08');

Subsequent inserts with the same day value will reuse the existing partition.

Updating Data

You cannot update partitioned-by columns (or any columns used in the generation of a partitioned-by column):

UPDATE parted_table
SET day = '2014-04-07'
WHERE id = 1;
-- Error: Updating a partitioned-by column is not supported

Other columns can be updated normally.

Deleting Data

If the WHERE clause targets all rows of a partition, CrateDB drops the entire partition instead of deleting row by row (much faster):

DELETE FROM parted_table
WHERE day = 1396915200000;

Querying Partitioned Tables

Queries are automatically partition-pruned if possible, based on the WHERE clause.

Example: This query will only touch the partition for day='1970-01-01':

SELECT COUNT(*) FROM parted_table
WHERE day='1970-01-01';

CrateDB evaluates the WHERE clause before execution to identify relevant partitions.

Partitioning by Generated Columns

When using generated columns as partition keys, CrateDB optimizes queries in the same way—inferring the minimum required partitions.

SELECT id, date_format('%Y-%m', month) AS m
FROM computed_parted_table
WHERE created_at = '2015-11-16T13:27:00.000Z';

Altering Partitioned Tables

You can alter schema and parameters for partitioned tables as you would for regular tables.

ALTER TABLE parted_table
SET (number_of_replicas = '0-all');

ALTER TABLE parted_table
ADD COLUMN new_col TEXT;

Changing the Number of Shards

You can adjust number_of_shards for future partitions at any time:

ALTER TABLE parted_table
SET (number_of_shards = 10);

Existing partitions keep their current shard count.

Altering a Single Partition

You can change parameters for a specific partition:

ALTER TABLE parted_table
PARTITION (day=1396915200000)
SET (number_of_shards = 5);

Changing Parameters

You can alter settings like number_of_replicas for specific partitions, but cannot alter their schema.

ALTER TABLE parted_table
PARTITION (day=1396915200000)
RESET (number_of_replicas);

Altering Only New Partitions

Use ONLY to alter settings for future partitions without affecting existing ones:

ALTER TABLE ONLY parted_table
SET (number_of_replicas = 1);

Closing and Opening Partitions

You can close or open a single partition just like a regular table:

sqlCopierModifierALTER TABLE parted_table
PARTITION (day=1396915200000) CLOSE;

ALTER TABLE parted_table
PARTITION (day=1396915200000) OPEN;

Closed partitions are excluded from queries.

Limitations

WHERE clauses cannot mix partitioned-by columns and regular columns in the same OR condition:
```
WHERE partition_col = 'x' OR normal_col = x -- Not supported
```

Consistency Notes

If a partition is deleted during an active bulk insert/update, it will not be re-created automatically.
The reported number of affected rows always reflects the actual inserted/updated count.

PreviousToken filters NextSharding & replica

Last updated 3 months ago

Good evening