Early filtering
Reducing the amount of data processed early in a query can significantly cut execution time and resource use.
Filter as Early as Possible
Avoid applying filters after multiple JOIN
s or in outer queries. This can confuse the optimizer and prevent index usage.
Bad:
SELECT *
FROM (
SELECT * FROM customers JOIN orders ON ...
)
WHERE country = 'DE';
Better:
SELECT *
FROM customers
JOIN orders ON ...
WHERE country = 'DE';
Avoid SELECT *
SELECT *
CrateDB is a columnar database. Selecting only the columns you need reduces disk I/O and memory usage.
Bad:
SELECT * FROM customers;
Better:
SELECT customer_id, country FROM customers;
Avoid Large Result Sets
CrateDB excels at processing large datasets but is not optimized for returning large volumes of raw results. Use LIMIT
or pagination when working with hundreds of thousands of rows.
See Fetching Large Result Sets from CrateDB for batching examples.
Propagate LIMIT
Clauses When Possible
LIMIT
Clauses When PossibleApply LIMIT
early in subqueries when possible to reduce intermediate processing load.
Instead of:
SELECT ...
FROM device_data
JOIN factory_metadata
WHERE reading_time BETWEEN '2024-01-01' AND '2025-01-01'
LIMIT 10;
Use:
WITH filtered_data AS (
SELECT factory_id, device_name, reading_value
FROM device_data
WHERE reading_time BETWEEN '2024-12-01' AND '2025-01-01'
LIMIT 10
)
SELECT ...
FROM filtered_data
JOIN factory_metadata ON ...;
Filter Before UNNEST
When working with arrays of objects, add a filter condition before UNNEST
to minimize unnecessary data expansion.
Bad:
SELECT *
FROM (
SELECT UNNEST(my_array) obj
FROM my_table
)
WHERE obj['field1'] = 1;
Better:
SELECT *
FROM (
SELECT UNNEST(my_array) obj
FROM my_table
WHERE 1 = ANY(my_array['field1'])
)
WHERE obj['field1'] = 1;
Last updated