Testing Insert Performance
To test insert performance in CrateDB, we recommend using the cr8 command-line tool. This tool is purpose-built for benchmarking CrateDB and provides a simple yet powerful interface to run insert performance tests.
Why Start Small?
Insert performance generally scales linearly with cluster size. Therefore, you should begin by testing performance on a single node. Once you establish a baseline, you can expand your tests to include larger clusters and measure scalability.
Step 1: Prepare Your Test Data
Option 1: Use Real Data
If you already have a table containing real data, it’s best to test insert performance using a representative sample of that data.
Export the data using
COPY TO:COPY my_table TO DIRECTORY '/tmp/crate';You should see output similar to:
COPY OK, 1000000 rows affected (... sec)Capture the table schema:
SHOW CREATE TABLE my_table;Use the output to create a new test table (e.g.,
my_table_test) that mirrors the production table.
Option 2: Generate Fake Data
If you don’t have real data available, you can generate synthetic data using cr8 insert-fake-data.
Create a test table:
Adapt the schema to match your production use case as closely as possible.
Insert fake data:
This command automatically detects the table schema and inserts 1 million records of appropriate fake data. You can change the --num-records value to match your needs.
Export the inserted fake data:
Clear the table (to prepare for performance testing):
Step 2: Run the Insert Performance Test
Use cr8 insert-json to replay the exported JSON data and measure insert performance.
The flags --bulk-size and --concurrency are optional. If omitted, they default to 1000 and 25, respectively.
Example Output
Step 3: Analyze and Iterate
With performance test results in hand:
Adjust the
--bulk-sizeand--concurrencyvalues to observe how performance changes.Plot results (e.g., throughput vs. concurrency) to visualize trends.
Experiment with different node setups or configurations (e.g., refresh intervals, indexing strategies, replicas).
Repeat the test after making changes to get a comprehensive performance profile.
Additional Tips
Avoid measuring insert and data generation at the same time; always generate data first, then insert.
When benchmarking, ensure the cluster is not under additional load from other operations.
Use
--dry-runwithcr8commands to preview what the tool will do before executing a full test.
By following this methodical approach, you'll gain a deeper understanding of your system’s insert performance under different conditions and be better equipped to tune it for production workloads.
Last updated

