Configuration tuning

This guide outlines key hardware and software configurations you can adjust to optimize CrateDB insert performance.

Hardware Recommendations

Use Solid-State Drives (SSDs)

Switching from traditional spinning disks (HDDs) to SSDs is often the most cost-effective performance upgrade you can make for insert-heavy workloads.

To assess disk performance, use the iostat command:

shCopierModifier$ iostat
...
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.53    0.01    0.05    0.01    0.00   99.41
...

Pay special attention to the %iowait value—it shows how much time the CPU spends waiting on disk I/O. Lower values generally correlate with better insert performance.

CPU Considerations

Insert performance can be CPU-bound if:

Your table schema is complex
You use full-text indexing or heavy analyzers

If CPU becomes a bottleneck:

Switch to lighter full-text analyzers
Disable full-text indexing where it's not strictly needed
Upgrade to more powerful processors

Software Configuration

Replication Settings

Adding replicas improves durability and high availability—but impacts insert speed.

The largest performance cost occurs when increasing from 0 to 1 replica
After that, additional replicas introduce a linear increase in write load

Avoid using 0 replicas in production. It’s only acceptable for short-lived or throwaway data loads.

For clusters with 3+ nodes, at least one replica per table is recommended.

Shards and Partitioning

Sharding and partitioning impact insert performance, especially at scale. Refer to the Sharding Guide for recommendations on:

Number of shards per table
Partitioning strategies
Balancing write throughput across nodes

Indexing

By default:

Plain indexes are created for regular columns
Full-text indexes are created for text columns using analyzers

Indexes improve query performance but are expensive to maintain during inserts.

To improve insert throughput:

Disable indexes on columns that don’t require search
Avoid full-text indexes unless absolutely necessary

Disabling an index means you cannot query that column in a WHERE clause.

Primary Keys

If your data lacks a natural primary key, use CrateDB’s built-in _id system column instead of generating your own surrogate key (e.g. UUIDs). This:

Reduces insert-time indexing overhead
Eliminates an extra column to manage

Translog Durability

The translog.durability setting affects write acknowledgment behavior:

REQUEST (default): Flushes the translog after every operation—safer, but slower
ASYNC: Flushes less frequently—faster, but less durable

Use ASYNC if your application can tolerate possible data loss during node crashes.

Consider ASYNC for high-speed ingest pipelines where durability is less critical.

Overload Protection

CrateDB automatically limits insert concurrency to prevent node overload. These limits adapt dynamically based on request round-trip time.

If you're benchmarking or stress-testing insert performance, you may want to adjust these settings:

overload_protection.dml.concurrent_operations
overload_protection.dml.concurrent_requests
overload_protection.dml.max_bytes_per_request

Each time you change a setting, the concurrency estimator is reset.

See the Overload Protection reference for full details.

Refresh Interval

CrateDB uses index refreshes to make data searchable. Inserts are not visible to most queries until a refresh occurs.

Default: refresh_interval = 1000ms
Higher values = better insert throughput (less frequent refreshes)
Lower values = lower latency for querying recent inserts

If your use case doesn’t require immediate read-after-write visibility, increase the refresh interval for better performance:

ALTER TABLE your_table SET ("refresh_interval" = '5s');

Statistics and ANALYZE

After bulk inserts or loading large datasets, update table statistics using:

ANALYZE your_table;

This helps CrateDB’s query planner generate more efficient execution plans.

Statistics are recalculated automatically over time
You can manually trigger them with ANALYZE
Throttling settings limit how much bandwidth is used for stats collection

See the ANALYZE reference for tuning options and scheduling.

Manual Optimization

CrateDB stores data as segment files in an append-only fashion. Over time, these accumulate and may degrade performance.

The OPTIMIZE command merges and compacts segments:

OPTIMIZE TABLE your_table;

You can also optimize individual partitions if needed.

Normally, CrateDB manages this automatically. Manual optimization is only recommended when:

You're doing sustained, high-volume inserts
You observe degraded performance due to too many segments

Summary: Quick Tuning Checklist

Area

Recommendation

Disk

Use SSDs and monitor %iowait with iostat

CPU

Avoid unnecessary full-text indexing; upgrade CPUs if bottlenecked

Replication

Use 1+ replicas in production, but expect a performance cost

Indexing

Turn off unnecessary column indexes

Primary Key

Use _id instead of synthetic UUIDs

Translog

Consider ASYNC mode for speed over durability

Refresh Interval

Increase refresh_interval for better insert throughput

Statistics

Run ANALYZE after major inserts

Optimization

Use OPTIMIZE for large or frequently written tables

Overload Protection

Adjust settings when benchmarking inserts

PreviousOptimizing INSERTs NextInsert methods

Last updated 4 months ago

Good evening

Hardware Recommendations

Use Solid-State Drives (SSDs)

CPU Considerations

Software Configuration

Replication Settings

Shards and Partitioning

Indexing

Primary Keys

Translog Durability

Overload Protection

Refresh Interval

Statistics and ANALYZE

Manual Optimization

Summary: Quick Tuning Checklist