Large subsets
When working with very large tables and composite primary keys, you may need to retrieve thousands—or even tens of thousands—of specific records efficiently.
A common challenge: building a large WHERE
clause with multiple AND
/OR
conditions for composite primary keys can cause:
Performance degradation
Errors like
statement is too large (stack overflow while parsing)
orStackOverflowError[null]
The Solution: Use CrateDB’s _id
System Column
_id
System ColumnEvery record in CrateDB has an internal _id
column. This column is:
A unique, deterministic string representing all the primary key values of a row.
Automatically generated and indexed.
Perfect for high-performance lookups—even on large datasets.
How It Works
Let’s say your table sensor_data
uses a composite primary key:
PRIMARY KEY (machine_id, sensor_type)
Now suppose you have tens of thousands of key combinations (from another process or table) and want to fetch all matching rows from sensor_data
.
Instead of writing a massive WHERE (machine_id, sensor_type) IN (...)
clause, you can do this:
SELECT *
FROM sensor_data
WHERE _id IN (
SELECT _id FROM relevant_pk_values
);
Here’s what’s happening:
relevant_pk_values
is a staging table (temporary or persistent) that holds_id
values derived from your original key combinations.Since
_id
is index-backed and deterministic, this lookup is extremely fast—even at scale.
Why This Works
CrateDB’s
_id
encodes primary key values in a deterministic format.If two tables share the same primary key definition and values, their
_id
values will match.You avoid the complexity and limits of deeply nested
WHERE
clauses or hugeIN
lists.
Summary
_id
system column
Fast, deterministic primary key representation
WHERE _id IN (SELECT …)
Scales to tens of thousands of lookups easily
Shared PK logic across tables
Enables cross-table matching via _id
Avoids SQL parsing limits
Prevents StackOverflowError
or large query errors
Last updated