Large subsets
When working with very large tables and composite primary keys, you may need to retrieve thousands—or even tens of thousands—of specific records efficiently.
A common challenge: building a large WHERE clause with multiple AND/OR conditions for composite primary keys can cause:
Performance degradation
Errors like
statement is too large (stack overflow while parsing)orStackOverflowError[null]
The Solution: Use CrateDB’s _id System Column
_id System ColumnEvery record in CrateDB has an internal _id column. This column is:
A unique, deterministic string representing all the primary key values of a row.
Automatically generated and indexed.
Perfect for high-performance lookups—even on large datasets.
How It Works
Let’s say your table sensor_data uses a composite primary key:
PRIMARY KEY (machine_id, sensor_type)Now suppose you have tens of thousands of key combinations (from another process or table) and want to fetch all matching rows from sensor_data.
Instead of writing a massive WHERE (machine_id, sensor_type) IN (...) clause, you can do this:
SELECT *
FROM sensor_data
WHERE _id IN (
SELECT _id FROM relevant_pk_values
);Here’s what’s happening:
relevant_pk_valuesis a staging table (temporary or persistent) that holds_idvalues derived from your original key combinations.Since
_idis index-backed and deterministic, this lookup is extremely fast—even at scale.
Why This Works
CrateDB’s
_idencodes primary key values in a deterministic format.If two tables share the same primary key definition and values, their
_idvalues will match.You avoid the complexity and limits of deeply nested
WHEREclauses or hugeINlists.
Summary
_id system column
Fast, deterministic primary key representation
WHERE _id IN (SELECT …)
Scales to tens of thousands of lookups easily
Shared PK logic across tables
Enables cross-table matching via _id
Avoids SQL parsing limits
Prevents StackOverflowError or large query errors
Last updated

