Configuration tuning
This guide outlines key hardware and software configurations you can adjust to optimize CrateDB insert performance.
Hardware Recommendations
Use Solid-State Drives (SSDs)
Switching from traditional spinning disks (HDDs) to SSDs is often the most cost-effective performance upgrade you can make for insert-heavy workloads.
To assess disk performance, use the iostat
command:
shCopierModifier$ iostat
...
avg-cpu: %user %nice %system %iowait %steal %idle
0.53 0.01 0.05 0.01 0.00 99.41
...
Pay special attention to the %iowait
value—it shows how much time the CPU spends waiting on disk I/O. Lower values generally correlate with better insert performance.
CPU Considerations
Insert performance can be CPU-bound if:
Your table schema is complex
You use full-text indexing or heavy analyzers
If CPU becomes a bottleneck:
Switch to lighter full-text analyzers
Disable full-text indexing where it's not strictly needed
Upgrade to more powerful processors
Software Configuration
Replication Settings
Adding replicas improves durability and high availability—but impacts insert speed.
The largest performance cost occurs when increasing from 0 to 1 replica
After that, additional replicas introduce a linear increase in write load
Shards and Partitioning
Sharding and partitioning impact insert performance, especially at scale. Refer to the Sharding Guide for recommendations on:
Number of shards per table
Partitioning strategies
Balancing write throughput across nodes
Indexing
By default:
Plain indexes are created for regular columns
Full-text indexes are created for text columns using analyzers
Indexes improve query performance but are expensive to maintain during inserts.
To improve insert throughput:
Disable indexes on columns that don’t require search
Avoid full-text indexes unless absolutely necessary
Primary Keys
If your data lacks a natural primary key, use CrateDB’s built-in _id
system column instead of generating your own surrogate key (e.g. UUIDs). This:
Reduces insert-time indexing overhead
Eliminates an extra column to manage
Translog Durability
The translog.durability
setting affects write acknowledgment behavior:
REQUEST
(default): Flushes the translog after every operation—safer, but slowerASYNC
: Flushes less frequently—faster, but less durable
Use ASYNC
if your application can tolerate possible data loss during node crashes.
Overload Protection
CrateDB automatically limits insert concurrency to prevent node overload. These limits adapt dynamically based on request round-trip time.
If you're benchmarking or stress-testing insert performance, you may want to adjust these settings:
overload_protection.dml.concurrent_operations
overload_protection.dml.concurrent_requests
overload_protection.dml.max_bytes_per_request
Each time you change a setting, the concurrency estimator is reset.
Refresh Interval
CrateDB uses index refreshes to make data searchable. Inserts are not visible to most queries until a refresh occurs.
Default:
refresh_interval = 1000ms
Higher values = better insert throughput (less frequent refreshes)
Lower values = lower latency for querying recent inserts
If your use case doesn’t require immediate read-after-write visibility, increase the refresh interval for better performance:
ALTER TABLE your_table SET ("refresh_interval" = '5s');
Statistics and ANALYZE
After bulk inserts or loading large datasets, update table statistics using:
ANALYZE your_table;
This helps CrateDB’s query planner generate more efficient execution plans.
Statistics are recalculated automatically over time
You can manually trigger them with
ANALYZE
Throttling settings limit how much bandwidth is used for stats collection
Manual Optimization
CrateDB stores data as segment files in an append-only fashion. Over time, these accumulate and may degrade performance.
The OPTIMIZE
command merges and compacts segments:
OPTIMIZE TABLE your_table;
You can also optimize individual partitions if needed.
Normally, CrateDB manages this automatically. Manual optimization is only recommended when:
You're doing sustained, high-volume inserts
You observe degraded performance due to too many segments
Summary: Quick Tuning Checklist
Disk
Use SSDs and monitor %iowait
with iostat
CPU
Avoid unnecessary full-text indexing; upgrade CPUs if bottlenecked
Replication
Use 1+ replicas in production, but expect a performance cost
Indexing
Turn off unnecessary column indexes
Primary Key
Use _id
instead of synthetic UUIDs
Translog
Consider ASYNC
mode for speed over durability
Refresh Interval
Increase refresh_interval
for better insert throughput
Statistics
Run ANALYZE
after major inserts
Optimization
Use OPTIMIZE
for large or frequently written tables
Overload Protection
Adjust settings when benchmarking inserts
Last updated