CTEs vs JOINS
In some scenarios, especially with large or unevenly distributed tables, rewriting complex JOIN
s as Common Table Expressions (CTEs) can lead to more consistent and faster execution times.
This technique works by breaking down the query into smaller, focused parts—making it easier for the query planner to optimize execution. However, the effectiveness of this approach depends on the structure and distribution of your data.
Original Version (with JOIN
s)
JOIN
s)SELECT SUM(quantity)
FROM invoices
JOIN invoice_items USING (invoice_number)
JOIN products USING (product_id)
WHERE product_description = 'super cool product'
AND invoices.issue_date BETWEEN '2024-01-01' AND '2024-02-01';
This query joins three tables directly. On large datasets, this can trigger expensive join reordering or inefficient execution plans.
Improved Version (using CTEs)
WITH relevant_product_ids AS (
SELECT product_id
FROM products
WHERE product_description = 'super cool product'
),
relevant_invoice_lines AS (
SELECT invoice_number, quantity
FROM invoice_items
WHERE product_id IN (
SELECT product_id FROM relevant_product_ids
)
),
relevant_invoices AS (
SELECT invoice_number, issue_date
FROM invoices
WHERE invoice_number IN (
SELECT invoice_number FROM relevant_invoice_lines
)
)
SELECT SUM(quantity)
FROM relevant_invoices
JOIN relevant_invoice_lines USING (invoice_number)
WHERE issue_date BETWEEN '2024-01-01' AND '2024-02-01';
This version isolates each filtering step:
First, find the matching product IDs.
Then, filter invoice items by those products.
Finally, restrict invoices to the relevant subset.
By doing this:
You reduce the amount of data involved in each step.
The optimizer has fewer paths to evaluate.
The overall query becomes easier to reason about and debug.
When to Use This Pattern
Use CTE-based rewrites when:
You're joining large tables with selective filters.
The planner appears to pick inefficient join orders.
You notice high variance in execution times for similar queries.
Avoid if:
The
JOIN
strategy is already optimal.Your tables are small and well-indexed.
You need the absolute shortest execution time and the overhead of subqueries outweighs benefits.
Last updated