Insert methods
CrateDB supports multiple ways to insert data efficiently. Depending on your setup and workload, some insert methods can significantly outperform others. Selecting the right method is a simple yet powerful way to improve insert performance.
1. Insert Statement Types
CrateDB supports three main types of INSERT
statements:
Single inserts — Simple, one-row-at-a-time inserts
UNNEST inserts — Insert multiple rows using the
UNNEST
table functionMultiple value expressions — Insert multiple rows with repeated
VALUES
expressions
Each type of insert can be used with one of three client approaches, which we’ll cover after the insert types.
1.1 Single Inserts
Single inserts are the most basic form of data insertion:
INSERT INTO my_table (column_a) VALUES ('value 1');
Pros:
Fast and efficient for low-volume inserts
Easy to write and debug
CrateDB insert internals:
Insert is applied to the primary shard
The insert is then replicated to all configured replica shards in parallel
CrateDB acknowledges the request only after all replicas are updated
Performance considerations:
Each insert carries overhead for parsing, planning, and execution
High insert rates generate significant internal network traffic
Can bottleneck large-scale insert workloads
1.2 UNNEST Inserts
Use UNNEST
to transform arrays into row sets and insert multiple rows in a single operation:
INSERT INTO my_table (id, name)
(SELECT *
FROM UNNEST(
[1, 2, 3],
['Arthur', 'Trillian', 'Marvin']));
Pros:
Dramatically reduces parsing and execution overhead
Significantly less internal network traffic
Disk flushes only once per shard when
translog.durability = 'REQUEST'
(default)Works well with parameterized prepared statements
Example using CrateDB Python client:
client.execute("""
INSERT INTO my_table (id, name)
(SELECT * FROM UNNEST(?, ?))
""", ([1, 2, 3], ["Arthur", "Trillian", "Marvin"]))
1.3 Multiple Value Expressions
You can also insert multiple rows using a list of VALUES
expressions:
INSERT INTO my_table (id, name)
VALUES (1, 'Arthur'),
(2, 'Trillian'),
(3, 'Marvin');
Pros:
Easy to read and understand
Supported in most SQL clients and drivers
Cons:
Parsing cost grows with the number of rows
Query string must be dynamically generated if the number of rows varies
Cannot use with parameterized static prepared statements easily
Compared to UNNEST
, this method is usually slower for large inserts.
2. Client Approaches
Each insert method can be used with one of three client approaches:
Standard querying
Bulk operations
Prepared statements
2.1 Standard Querying
Executes individual SQL statements without batching or reuse.
Example (Python):
client.execute("INSERT INTO my_table (column_a) VALUES (?)", ["value 1"])
Best for: Simple, one-off inserts or debugging.
2.2 Bulk Operations
CrateDB’s HTTP endpoint supports bulk operations, allowing multiple inserts in a single request.
Advantages:
One-time parsing, planning, and execution per bulk
Reduced network and coordination overhead
One disk flush per shard (with
translog.durability = 'REQUEST'
)
Important:
Combining bulk
operations with UNNEST
or multi-value VALUES
usually offers no additional performance benefit.
See Performance: Bulk Inserts for details.
2.3 Prepared Statements
Prepared statements are parsed once and can be executed multiple times, often using a binary protocol for higher efficiency.
Example (JDBC):
PreparedStatement stmt = connection.prepareStatement(
"INSERT INTO my_table (id, first_name) VALUES (?, ?)");
stmt.setString(1, "Arthur");
stmt.addBatch();
stmt.setString(1, "Trillian");
stmt.addBatch();
stmt.setString(1, "Marvin");
stmt.addBatch();
int[] results = stmt.executeBatch();
Advantages:
No repeated parsing
Low-overhead binary protocol
Uses existing connections
Best for: Inserting many rows with consistent structure using supported clients (JDBC, Python DBAPI, etc.)
3. Performance Testing
To benchmark and tune insert performance:
Choose your target insert method and client approach
Configure your test environment (e.g., cluster size, replication settings)
Run insert workloads with varying parameters (e.g.,
--bulk-size
)Measure and plot throughput over time
Re-test with different configurations (e.g., durability, indexing, shard count)
4. Summary Table
Single insert
Low-volume, simple inserts
✅
Fast for small workloads, high overhead when scaling
UNNEST
High-volume, bulk inserts
✅
Most efficient for large batches, but error-prone
Multiple value expr.
Medium-volume, human-readable
❌
Easier to read, harder to dynamically scale
Bulk HTTP insert
External clients, scripts
✅
Very efficient, preferred for automated ingest pipelines
Prepared statements
Repeated inserts in apps
✅
Most clients support batch mode, efficient execution
Last updated