Early filtering

Reducing the amount of data processed early in a query can significantly cut execution time and resource use.

Filter as Early as Possible

Avoid applying filters after multiple JOINs or in outer queries. This can confuse the optimizer and prevent index usage.

Bad:

SELECT * 
FROM (
  SELECT * FROM customers JOIN orders ON ...
) 
WHERE country = 'DE';

Better:

SELECT * 
FROM customers 
JOIN orders ON ...
WHERE country = 'DE';

Avoid SELECT *

CrateDB is a columnar database. Selecting only the columns you need reduces disk I/O and memory usage.

Bad:

SELECT * FROM customers;

Better:

SELECT customer_id, country FROM customers;

Avoid Large Result Sets

CrateDB excels at processing large datasets but is not optimized for returning large volumes of raw results. Use LIMIT or pagination when working with hundreds of thousands of rows.

See Fetching Large Result Sets from CrateDB for batching examples.

Propagate LIMIT Clauses When Possible

Apply LIMIT early in subqueries when possible to reduce intermediate processing load.

Instead of:

SELECT ... 
FROM device_data 
JOIN factory_metadata 
WHERE reading_time BETWEEN '2024-01-01' AND '2025-01-01' 
LIMIT 10;

Use:

WITH filtered_data AS (
  SELECT factory_id, device_name, reading_value
  FROM device_data
  WHERE reading_time BETWEEN '2024-12-01' AND '2025-01-01'
  LIMIT 10
)
SELECT ...
FROM filtered_data
JOIN factory_metadata ON ...;

Filter Before UNNEST

When working with arrays of objects, add a filter condition before UNNEST to minimize unnecessary data expansion.

Bad:

SELECT * 
FROM (
  SELECT UNNEST(my_array) obj 
  FROM my_table
)
WHERE obj['field1'] = 1;

Better:

SELECT * 
FROM (
  SELECT UNNEST(my_array) obj 
  FROM my_table 
  WHERE 1 = ANY(my_array['field1'])
) 
WHERE obj['field1'] = 1;

Last updated