Testing Insert Performance
To test insert performance in CrateDB, we recommend using the cr8
command-line tool. This tool is purpose-built for benchmarking CrateDB and provides a simple yet powerful interface to run insert performance tests.
Why Start Small?
Insert performance generally scales linearly with cluster size. Therefore, you should begin by testing performance on a single node. Once you establish a baseline, you can expand your tests to include larger clusters and measure scalability.
Step 1: Prepare Your Test Data
Option 1: Use Real Data
If you already have a table containing real data, it’s best to test insert performance using a representative sample of that data.
Export the data using
COPY TO
:COPY my_table TO DIRECTORY '/tmp/crate';
You should see output similar to:
COPY OK, 1000000 rows affected (... sec)
Capture the table schema:
SHOW CREATE TABLE my_table;
Use the output to create a new test table (e.g.,
my_table_test
) that mirrors the production table.
Option 2: Generate Fake Data
If you don’t have real data available, you can generate synthetic data using cr8 insert-fake-data
.
Create a test table:
CREATE TABLE my_table_test (
id INTEGER,
name TEXT
);
Adapt the schema to match your production use case as closely as possible.
Insert fake data:
cr8 insert-fake-data \
--hosts localhost:4200 \
--table my_table_test \
--num-records 1000000
This command automatically detects the table schema and inserts 1 million records of appropriate fake data. You can change the --num-records
value to match your needs.
Export the inserted fake data:
COPY my_table_test TO DIRECTORY '/tmp/crate';
Clear the table (to prepare for performance testing):
DELETE FROM my_table_test;
Step 2: Run the Insert Performance Test
Use cr8 insert-json
to replay the exported JSON data and measure insert performance.
cat /tmp/crate/my_table_*.json | cr8 insert-json \
--hosts localhost:4200 \
--table my_table_test \
--bulk-size 1000 \
--concurrency 25
Example Output
Executing inserts: bulk_size=1000 concurrency=25
1000 requests [00:35, 27.84 requests/s]
Runtime (in ms):
mean: 103.556 ± 3.957
min/max: 11.587 → 521.434
Percentile:
50: 89.764 ± 63.851 (stdev)
95: 220.739
99.9: 475.568
Step 3: Analyze and Iterate
With performance test results in hand:
Adjust the
--bulk-size
and--concurrency
values to observe how performance changes.Plot results (e.g., throughput vs. concurrency) to visualize trends.
Experiment with different node setups or configurations (e.g., refresh intervals, indexing strategies, replicas).
Repeat the test after making changes to get a comprehensive performance profile.
Additional Tips
Avoid measuring insert and data generation at the same time; always generate data first, then insert.
When benchmarking, ensure the cluster is not under additional load from other operations.
Use
--dry-run
withcr8
commands to preview what the tool will do before executing a full test.
By following this methodical approach, you'll gain a deeper understanding of your system’s insert performance under different conditions and be better equipped to tune it for production workloads.
Last updated