Configuration tuning

This guide outlines key hardware and software configurations you can adjust to optimize CrateDB insert performance.

Hardware Recommendations

Use Solid-State Drives (SSDs)

Switching from traditional spinning disks (HDDs) to SSDs is often the most cost-effective performance upgrade you can make for insert-heavy workloads.

To assess disk performance, use the iostat command:

shCopierModifier$ iostat
...
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.53    0.01    0.05    0.01    0.00   99.41
...

Pay special attention to the %iowait value—it shows how much time the CPU spends waiting on disk I/O. Lower values generally correlate with better insert performance.

CPU Considerations

Insert performance can be CPU-bound if:

  • Your table schema is complex

  • You use full-text indexing or heavy analyzers

If CPU becomes a bottleneck:

  • Switch to lighter full-text analyzers

  • Disable full-text indexing where it's not strictly needed

  • Upgrade to more powerful processors

Software Configuration

Replication Settings

Adding replicas improves durability and high availability—but impacts insert speed.

  • The largest performance cost occurs when increasing from 0 to 1 replica

  • After that, additional replicas introduce a linear increase in write load

circle-info

Avoid using 0 replicas in production. It’s only acceptable for short-lived or throwaway data loads.

For clusters with 3+ nodes, at least one replica per table is recommended.

Shards and Partitioning

Sharding and partitioning impact insert performance, especially at scale. Refer to the Sharding Guide for recommendations on:

  • Number of shards per table

  • Partitioning strategies

  • Balancing write throughput across nodes

Indexing

By default:

  • Plain indexes are created for regular columns

  • Full-text indexes are created for text columns using analyzers

Indexes improve query performance but are expensive to maintain during inserts.

To improve insert throughput:

  • Disable indexes on columns that don’t require search

  • Avoid full-text indexes unless absolutely necessary

circle-info

Disabling an index means you cannot query that column in a WHERE clause.

Primary Keys

If your data lacks a natural primary key, use CrateDB’s built-in _id system column instead of generating your own surrogate key (e.g. UUIDs). This:

  • Reduces insert-time indexing overhead

  • Eliminates an extra column to manage

Translog Durability

The translog.durability setting affects write acknowledgment behavior:

  • REQUEST (default): Flushes the translog after every operation—safer, but slower

  • ASYNC: Flushes less frequently—faster, but less durable

Use ASYNC if your application can tolerate possible data loss during node crashes.

circle-info

Consider ASYNC for high-speed ingest pipelines where durability is less critical.

Overload Protection

CrateDB automatically limits insert concurrency to prevent node overload. These limits adapt dynamically based on request round-trip time.

If you're benchmarking or stress-testing insert performance, you may want to adjust these settings:

  • overload_protection.dml.concurrent_operations

  • overload_protection.dml.concurrent_requests

  • overload_protection.dml.max_bytes_per_request

Each time you change a setting, the concurrency estimator is reset.

circle-info

See the Overload Protection reference for full details.

Refresh Interval

CrateDB uses index refreshes to make data searchable. Inserts are not visible to most queries until a refresh occurs.

  • Default: refresh_interval = 1000ms

  • Higher values = better insert throughput (less frequent refreshes)

  • Lower values = lower latency for querying recent inserts

If your use case doesn’t require immediate read-after-write visibility, increase the refresh interval for better performance:

Statistics and ANALYZE

After bulk inserts or loading large datasets, update table statistics using:

This helps CrateDB’s query planner generate more efficient execution plans.

  • Statistics are recalculated automatically over time

  • You can manually trigger them with ANALYZE

  • Throttling settings limit how much bandwidth is used for stats collection

circle-info

See the ANALYZE reference for tuning options and scheduling.

Manual Optimization

CrateDB stores data as segment files in an append-only fashion. Over time, these accumulate and may degrade performance.

The OPTIMIZE command merges and compacts segments:

You can also optimize individual partitions if needed.

Normally, CrateDB manages this automatically. Manual optimization is only recommended when:

  • You're doing sustained, high-volume inserts

  • You observe degraded performance due to too many segments

Summary: Quick Tuning Checklist

Area
Recommendation

Disk

Use SSDs and monitor %iowait with iostat

CPU

Avoid unnecessary full-text indexing; upgrade CPUs if bottlenecked

Replication

Use 1+ replicas in production, but expect a performance cost

Indexing

Turn off unnecessary column indexes

Primary Key

Use _id instead of synthetic UUIDs

Translog

Consider ASYNC mode for speed over durability

Refresh Interval

Increase refresh_interval for better insert throughput

Statistics

Run ANALYZE after major inserts

Optimization

Use OPTIMIZE for large or frequently written tables

Overload Protection

Adjust settings when benchmarking inserts

Last updated