Insert methods

CrateDB supports multiple ways to insert data efficiently. Depending on your setup and workload, some insert methods can significantly outperform others. Selecting the right method is a simple yet powerful way to improve insert performance.

1. Insert Statement Types

CrateDB supports three main types of INSERT statements:

  • Single inserts — Simple, one-row-at-a-time inserts

  • UNNEST inserts — Insert multiple rows using the UNNEST table function

  • Multiple value expressions — Insert multiple rows with repeated VALUES expressions

Each type of insert can be used with one of three client approaches, which we’ll cover after the insert types.

1.1 Single Inserts

Single inserts are the most basic form of data insertion:

INSERT INTO my_table (column_a) VALUES ('value 1');

Pros:

  • Fast and efficient for low-volume inserts

  • Easy to write and debug

CrateDB insert internals:

  1. Insert is applied to the primary shard

  2. The insert is then replicated to all configured replica shards in parallel

  3. CrateDB acknowledges the request only after all replicas are updated

Performance considerations:

  • Each insert carries overhead for parsing, planning, and execution

  • High insert rates generate significant internal network traffic

  • Can bottleneck large-scale insert workloads

1.2 UNNEST Inserts

Use UNNEST to transform arrays into row sets and insert multiple rows in a single operation:

INSERT INTO my_table (id, name)
  (SELECT *
     FROM UNNEST(
            [1, 2, 3],
            ['Arthur', 'Trillian', 'Marvin']));

Pros:

  • Dramatically reduces parsing and execution overhead

  • Significantly less internal network traffic

  • Disk flushes only once per shard when translog.durability = 'REQUEST' (default)

  • Works well with parameterized prepared statements

Example using CrateDB Python client:

client.execute("""
  INSERT INTO my_table (id, name)
    (SELECT * FROM UNNEST(?, ?))
""", ([1, 2, 3], ["Arthur", "Trillian", "Marvin"]))

Warning: CrateDB may silently drop rows with incompatible types or invalid column names when using UNNEST. Always validate input and refer to the UNNEST reference documentation for behavior details.

1.3 Multiple Value Expressions

You can also insert multiple rows using a list of VALUES expressions:

INSERT INTO my_table (id, name)
  VALUES (1, 'Arthur'),
         (2, 'Trillian'),
         (3, 'Marvin');

Pros:

  • Easy to read and understand

  • Supported in most SQL clients and drivers

Cons:

  • Parsing cost grows with the number of rows

  • Query string must be dynamically generated if the number of rows varies

  • Cannot use with parameterized static prepared statements easily

Compared to UNNEST, this method is usually slower for large inserts.

2. Client Approaches

Each insert method can be used with one of three client approaches:

  • Standard querying

  • Bulk operations

  • Prepared statements

2.1 Standard Querying

Executes individual SQL statements without batching or reuse.

Example (Python):

client.execute("INSERT INTO my_table (column_a) VALUES (?)", ["value 1"])

Best for: Simple, one-off inserts or debugging.

2.2 Bulk Operations

CrateDB’s HTTP endpoint supports bulk operations, allowing multiple inserts in a single request.

Advantages:

  • One-time parsing, planning, and execution per bulk

  • Reduced network and coordination overhead

  • One disk flush per shard (with translog.durability = 'REQUEST')

Important: Combining bulk operations with UNNEST or multi-value VALUES usually offers no additional performance benefit.

See Performance: Bulk Inserts for details.

2.3 Prepared Statements

Prepared statements are parsed once and can be executed multiple times, often using a binary protocol for higher efficiency.

Example (JDBC):

PreparedStatement stmt = connection.prepareStatement(
  "INSERT INTO my_table (id, first_name) VALUES (?, ?)");

stmt.setString(1, "Arthur");
stmt.addBatch();

stmt.setString(1, "Trillian");
stmt.addBatch();

stmt.setString(1, "Marvin");
stmt.addBatch();

int[] results = stmt.executeBatch();

Advantages:

  • No repeated parsing

  • Low-overhead binary protocol

  • Uses existing connections

Best for: Inserting many rows with consistent structure using supported clients (JDBC, Python DBAPI, etc.)

3. Performance Testing

To benchmark and tune insert performance:

  1. Choose your target insert method and client approach

  2. Configure your test environment (e.g., cluster size, replication settings)

  3. Run insert workloads with varying parameters (e.g., --bulk-size)

  4. Measure and plot throughput over time

  5. Re-test with different configurations (e.g., durability, indexing, shard count)

Note: Insert operations using a subquery (INSERT INTO ... SELECT) or COPY FROM are protected by overload protection, which prevents performance degradation for other queries. Refer to the Overload Protection documentation to tune these settings.

4. Summary Table

Insert Method
Best For
Supports Params
Notes

Single insert

Low-volume, simple inserts

Fast for small workloads, high overhead when scaling

UNNEST

High-volume, bulk inserts

Most efficient for large batches, but error-prone

Multiple value expr.

Medium-volume, human-readable

Easier to read, harder to dynamically scale

Bulk HTTP insert

External clients, scripts

Very efficient, preferred for automated ingest pipelines

Prepared statements

Repeated inserts in apps

Most clients support batch mode, efficient execution

Last updated