Cluster sizing

CrateDB is a distributed SQL database designed for handling high-throughput ingest and analytical queries at scale. Whether you're processing time-series data, IoT sensor streams, or log analytics, it's essential to:

Estimate the cluster size based on data volume and expected ingest.
Measure ingest performance to ensure your cluster can handle data at the required rate.
Benchmark query performance under concurrent workloads to optimize response times.
Fine-tune cluster resources based on real-world usage patterns.

The first step is to size your cluster based on raw data volume and expected ingest rates. This serves as the starting point for deployment.

Once the initial cluster is set up, it's crucial to test its behavior under different loads. This helps optimize resource allocation and inform scaling decisions.

In this post, we'll walk through two key benchmarking methods:

Ingest Benchmarking using nodeIngestBench to measure and optimize write performance.
Query Benchmarking using Locust to evaluate read performance under concurrent workloads.

Beyond just cluster size, several other factors influence CrateDB's performance, including:

Shard count – Too few can lead to hotspots, and too many can cause overhead.
Replication factor – Balances data redundancy vs. write performance.
Batch size – Affects ingest efficiency and query speed.

By systematically sizing, benchmarking, and fine-tuning your CrateDB cluster, you can ensure that it efficiently and cost-effectively meets your workload demands

PreviousEnvironment Variables NextInfrastructure

Last updated 6 months ago