Insert methods

CrateDB supports multiple ways to insert data efficiently. Depending on your setup and workload, some insert methods can significantly outperform others. Selecting the right method is a simple yet powerful way to improve insert performance.

1. Insert Statement Types

CrateDB supports three main types of INSERT statements:

Single inserts — Simple, one-row-at-a-time inserts
UNNEST inserts — Insert multiple rows using the UNNEST table function
Multiple value expressions — Insert multiple rows with repeated VALUES expressions

Each type of insert can be used with one of three client approaches, which we’ll cover after the insert types.

1.1 Single Inserts

Single inserts are the most basic form of data insertion:

INSERT INTO my_table (column_a) VALUES ('value 1');

Pros:

Fast and efficient for low-volume inserts
Easy to write and debug

CrateDB insert internals:

Insert is applied to the primary shard
The insert is then replicated to all configured replica shards in parallel
CrateDB acknowledges the request only after all replicas are updated

Performance considerations:

Each insert carries overhead for parsing, planning, and execution
High insert rates generate significant internal network traffic
Can bottleneck large-scale insert workloads

1.2 UNNEST Inserts

Use UNNEST to transform arrays into row sets and insert multiple rows in a single operation:

INSERT INTO my_table (id, name)
  (SELECT *
     FROM UNNEST(
            [1, 2, 3],
            ['Arthur', 'Trillian', 'Marvin']));

Pros:

Dramatically reduces parsing and execution overhead
Significantly less internal network traffic
Disk flushes only once per shard when translog.durability = 'REQUEST' (default)
Works well with parameterized prepared statements

Example using CrateDB Python client:

client.execute("""
  INSERT INTO my_table (id, name)
    (SELECT * FROM UNNEST(?, ?))
""", ([1, 2, 3], ["Arthur", "Trillian", "Marvin"]))

Warning: CrateDB may silently drop rows with incompatible types or invalid column names when using UNNEST. Always validate input and refer to the UNNEST reference documentation for behavior details.

1.3 Multiple Value Expressions

You can also insert multiple rows using a list of VALUES expressions:

INSERT INTO my_table (id, name)
  VALUES (1, 'Arthur'),
         (2, 'Trillian'),
         (3, 'Marvin');

Pros:

Easy to read and understand
Supported in most SQL clients and drivers

Cons:

Parsing cost grows with the number of rows
Query string must be dynamically generated if the number of rows varies
Cannot use with parameterized static prepared statements easily

Compared to UNNEST, this method is usually slower for large inserts.

2. Client Approaches

Each insert method can be used with one of three client approaches:

Standard querying
Bulk operations
Prepared statements

2.1 Standard Querying

Executes individual SQL statements without batching or reuse.

Example (Python):

client.execute("INSERT INTO my_table (column_a) VALUES (?)", ["value 1"])

Best for: Simple, one-off inserts or debugging.

2.2 Bulk Operations

CrateDB’s HTTP endpoint supports bulk operations, allowing multiple inserts in a single request.

Advantages:

One-time parsing, planning, and execution per bulk
Reduced network and coordination overhead
One disk flush per shard (with translog.durability = 'REQUEST')

Important: Combining bulk operations with UNNEST or multi-value VALUES usually offers no additional performance benefit.

See Performance: Bulk Inserts for details.

2.3 Prepared Statements

Prepared statements are parsed once and can be executed multiple times, often using a binary protocol for higher efficiency.

Example (JDBC):

PreparedStatement stmt = connection.prepareStatement(
  "INSERT INTO my_table (id, first_name) VALUES (?, ?)");

stmt.setString(1, "Arthur");
stmt.addBatch();

stmt.setString(1, "Trillian");
stmt.addBatch();

stmt.setString(1, "Marvin");
stmt.addBatch();

int[] results = stmt.executeBatch();

Advantages:

No repeated parsing
Low-overhead binary protocol
Uses existing connections

Best for: Inserting many rows with consistent structure using supported clients (JDBC, Python DBAPI, etc.)

3. Performance Testing

To benchmark and tune insert performance:

Choose your target insert method and client approach
Configure your test environment (e.g., cluster size, replication settings)
Run insert workloads with varying parameters (e.g., --bulk-size)
Measure and plot throughput over time
Re-test with different configurations (e.g., durability, indexing, shard count)

Note: Insert operations using a subquery (INSERT INTO ... SELECT) or COPY FROM are protected by overload protection, which prevents performance degradation for other queries. Refer to the Overload Protection documentation to tune these settings.

4. Summary Table

Insert Method

Best For

Supports Params

Notes

Single insert

Low-volume, simple inserts

✅

Fast for small workloads, high overhead when scaling

UNNEST

High-volume, bulk inserts

✅

Most efficient for large batches, but error-prone

Multiple value expr.

Medium-volume, human-readable

❌

Easier to read, harder to dynamically scale

Bulk HTTP insert

External clients, scripts

✅

Very efficient, preferred for automated ingest pipelines

Prepared statements

Repeated inserts in apps

✅

Most clients support batch mode, efficient execution

PreviousConfiguration tuning NextUnnesting

Last updated 4 months ago

Good evening

1. Insert Statement Types

1.1 Single Inserts

1.2 UNNEST Inserts

1.3 Multiple Value Expressions

2. Client Approaches

2.1 Standard Querying

2.2 Bulk Operations

2.3 Prepared Statements

3. Performance Testing

4. Summary Table