Join optimizations
Efficient execution of joins—especially on large datasets—is critical for query performance. CrateDB implements multiple join optimization strategies to reduce execution time, network load, and memory usage.
Table of Contents
Query-Then-Fetch Optimization
Join operations on large relations—especially those using NESTED LOOP
joins—can suffer from poor performance due to their quadratic runtime complexity (O(n × m)). This is particularly true for CROSS JOIN
s, which can generate large intermediate results and overload the handler node with unnecessary data.
To reduce this burden, CrateDB uses a Query Then Fetch approach:
Filter and order data on each shard to identify relevant document IDs.
Fetch only the selected fields of the final matching documents.
Return the final result set to the client.
This reduces both the volume of data transferred and the memory footprint during join processing.
Push-Down Query Optimization
For complex queries, CrateDB's planner decomposes the query into subqueries and pushes down filtering, sorting, and limiting operations to the shard level before the join is executed.
Example query:
SELECT L.a, R.x
FROM L, R
WHERE L.id = R.id
AND L.b > 100
AND R.y < 10
ORDER BY L.a;
In this case:
Filtering and ordering are pushed to the shards holding
L
andR
.Intermediate results are broadcast to the nodes performing the join.
The workload is distributed, reducing both data movement and join execution time.

Figure: Filtering and ordering applied before the join, reducing the data volume.
Cross Join Elimination
CrateDB attempts to eliminate inefficient CROSS JOIN
s by reordering joins to convert them into more efficient INNER JOIN
s—provided the conditions in the query allow for it.
Example:
SELECT *
FROM t1 CROSS JOIN t2
INNER JOIN t3 ON t3.z = t1.x AND t3.z = t2.y;
The optimizer rewrites the join order to:
t2 → t1 → t3
So that all joins include a join condition. The CROSS JOIN
is replaced with an INNER JOIN
, improving performance.
To disable this behavior (experimental):
SET optimizer_eliminate_cross_join = false;
Last updated