Testing Insert Performance

To test insert performance in CrateDB, we recommend using the cr8 command-line tool. This tool is purpose-built for benchmarking CrateDB and provides a simple yet powerful interface to run insert performance tests.

Why Start Small?

Insert performance generally scales linearly with cluster size. Therefore, you should begin by testing performance on a single node. Once you establish a baseline, you can expand your tests to include larger clusters and measure scalability.

Step 1: Prepare Your Test Data

Option 1: Use Real Data

If you already have a table containing real data, it’s best to test insert performance using a representative sample of that data.

Export the data using COPY TO:

COPY my_table TO DIRECTORY '/tmp/crate';

You should see output similar to:

COPY OK, 1000000 rows affected (... sec)

Capture the table schema:
```
SHOW CREATE TABLE my_table;
```
Use the output to create a new test table (e.g., my_table_test) that mirrors the production table.

Option 2: Generate Fake Data

If you don’t have real data available, you can generate synthetic data using cr8 insert-fake-data.

Create a test table:

CREATE TABLE my_table_test (
  id INTEGER,
  name TEXT
);

Adapt the schema to match your production use case as closely as possible.

Insert fake data:

cr8 insert-fake-data \
  --hosts localhost:4200 \
  --table my_table_test \
  --num-records 1000000

This command automatically detects the table schema and inserts 1 million records of appropriate fake data. You can change the --num-records value to match your needs.

Export the inserted fake data:

COPY my_table_test TO DIRECTORY '/tmp/crate';

Clear the table (to prepare for performance testing):

DELETE FROM my_table_test;

Step 2: Run the Insert Performance Test

Use cr8 insert-json to replay the exported JSON data and measure insert performance.

cat /tmp/crate/my_table_*.json | cr8 insert-json \
  --hosts localhost:4200 \
  --table my_table_test \
  --bulk-size 1000 \
  --concurrency 25

The flags --bulk-size and --concurrency are optional. If omitted, they default to 1000 and 25, respectively.

Example Output

Executing inserts: bulk_size=1000 concurrency=25
1000 requests [00:35, 27.84 requests/s]
Runtime (in ms):
    mean:    103.556 ± 3.957
    min/max: 11.587 → 521.434
Percentile:
    50:   89.764 ± 63.851 (stdev)
    95:   220.739
    99.9: 475.568

Step 3: Analyze and Iterate

With performance test results in hand:

Adjust the --bulk-size and --concurrency values to observe how performance changes.
Plot results (e.g., throughput vs. concurrency) to visualize trends.
Experiment with different node setups or configurations (e.g., refresh intervals, indexing strategies, replicas).
Repeat the test after making changes to get a comprehensive performance profile.

Additional Tips

Avoid measuring insert and data generation at the same time; always generate data first, then insert.
When benchmarking, ensure the cluster is not under additional load from other operations.
Use --dry-run with cr8 commands to preview what the tool will do before executing a full test.

By following this methodical approach, you'll gain a deeper understanding of your system’s insert performance under different conditions and be better equipped to tune it for production workloads.

PreviousUpdates NextOptimizing SELECTs

Last updated 4 months ago

Good evening

Why Start Small?

Step 1: Prepare Your Test Data

Option 1: Use Real Data

Option 2: Generate Fake Data

Step 2: Run the Insert Performance Test

Example Output

Step 3: Analyze and Iterate

Additional Tips