Cluster-wide settings

You can dynamically inspect or adjust almost all CrateDB cluster-wide settings at runtime using:

SELECT * FROM sys.cluster.settings;

Settings documented as runtime‑changeable can be modified while the cluster is operational. Settings that are non-runtime must be preconfigured identically on all nodes for proper cluster behavior.


Table of Contents

  1. Non-runtime Settings

  2. Statistics Collection

  3. Shard Limits

  4. Usage Data Collector (UDC)

  5. Graceful Stop Behavior

  6. Bulk Operations

  7. Cluster Discovery & Setup

    • Unicast, DNS, EC2

  8. Routing & Shard Allocation

  9. Shard Balancing

  10. Attribute-Based Allocation

  11. Disk-Based Allocation

  12. Recovery Limits

  13. Memory Management & Circuit Breakers

  14. Thread Pools & Overload Protection

  15. Metadata / Gateway Settings

  16. Logical Replication Settings


Non-runtime Settings

  • Must be configured identically in each node’s configuration files.

  • Crucial: inconsistency leads to cluster malfunction.


🔢 Statistics Collection

Setting
Default
Runtime
Description

stats.enabled

true

Yes

Enables job/operation statistics collection. May incur slight performance overhead.

stats.jobs_log_size

10000

Yes

Max number of job entries per node. 0 disables job logging.

stats.jobs_log_expiration

0s (disabled)

Yes

Time-based eviction of job log entries. Overrides size-based limit.

stats.jobs_log_filter

true

Yes

Boolean expression to filter jobs logged. Example: $$ended - started > '5 minutes'::interval$$.

stats.jobs_log_persistent_filter

false

Yes

Filters jobs written into CrateDB logs. Should be scoped carefully to avoid blocking IO.

stats.operations_log_size

10000

Yes

Max operation logs per node. 0 disables operations logging.

stats.operations_log_expiration

0s (disabled)

Yes

Time-based expiry for operation logs.

stats.service.interval

24h

Yes

Frequency to refresh table stats for query planning.

stats.service.max_bytes_per_sec

40mb

Yes

Throttles ANALYZE traffic per node when gathering stats. 0 disables throttling.


🛡️ Shard Limits

  • cluster.max_shards_per_node: Default 1000, runtime changeable. Caps primary + replica shards per node; controls overall cluster max shards (limit × nodes). Exceeding rejects new shard-creating operations.

  • cluster.routing.allocation.total_shards_per_node: Default -1 (unlimited), runtime. Enforces shard cap per node; nodes exceeding limit are excluded from allocation.


UDC (Usage Data Collector)

  • Read-only settings (cannot change at runtime):

Setting
Default
Purpose

udc.enabled

true

Toggle UDC on/off

udc.initial_delay

10m

Delay before first ping post-start

udc.interval

24h

Ping interval

udc.url

https://udc.crate.io/

Ping destination URL


🚦 Graceful Stop Behavior

  • cluster.graceful_stop.min_availability: runtime, values none | primaries | full.

  • cluster.graceful_stop.timeout: runtime, max time to wait for shard reallocation.

  • cluster.graceful_stop.force: runtime, forces shutdown if timeout expires.

Note: Ignored if cluster has only one node.


🧱 Bulk Operations

  • bulk.request_timeout: Default 1m, runtime. Internal request timeout for large COPY, INSERT, or UPDATE jobs.


🧭 Cluster Discovery & Setup

  • Non-runtime only:

    • discovery.seed_hosts: List of master-eligible nodes or host:port values.

    • cluster.initial_master_nodes: Required for first startup elections.

    • discovery.type: zen (multi-node) or single-node (default); single-node disables auto-joining.

  • Discovery Providers:

    • discovery.seed_providers: srv or ec2.

    • DNS: discovery.srv.query, discovery.srv.resolver.

    • EC2: discovery.ec2.* settings including access key, filtering by security group/tags, host type, endpoint.


🚚 Routing & Shard Allocation

Setting
Default
Runtime
Behavior

cluster.routing.allocation.enable

all

Yes

Enables/disables alloc steps.

cluster.routing.rebalance.enable

all

Yes

Controls rebalancing primary vs replica shards.

cluster.routing.allocation.allow_rebalance

indices_all_active

Yes

Conditions under which rebalancing triggers.

cluster.routing.allocation.cluster_concurrent_rebalance

2

Yes

Global concurrent rebalance count.

cluster.routing.allocation.node_initial_primaries_recoveries

4

Yes

Concurrent local primary recoveries.

cluster.routing.allocation.node_concurrent_recoveries

2

Yes

General recovery concurrency per node.


⚖️ Shard Balancing

  • balance.shard, balance.index, balance.threshold: runtime floats controlling how aggressively the cluster balances shards across nodes and indices.


🌍 Attribute-Based Allocation

  • Non-runtime:

    • routing.allocation.awareness.attributes

    • routing.allocation.awareness.force.*.values

  • Runtime:

    • allocation.include.*, allocation.exclude.*, allocation.require.*: filter nodes based on custom node attributes for shard allocation.


💾 Disk-Based Allocation

  • Runtime shard-allocation controls:

    • disk.threshold_enabled

    • disk.watermark.low (default 85%)

    • disk.watermark.high (90%)

    • disk.watermark.flood_stage (95%; triggers read-only block)

    • allocation.total_shards_per_node

  • Use percentages or absolute bytes, not both; defaults updated every 30s via cluster.info.update.interval.


🔁 Recovery Limits

Runtime-recover settings:

  • indices.recovery.max_bytes_per_sec (default 40mb)

  • indices.recovery.retry_delay_state_sync, retry_delay_network

  • indices.recovery.internal_action_timeout, internal_action_long_timeout

  • indices.recovery.recovery_activity_timeout

  • indices.recovery.max_concurrent_file_chunks


🧠 Memory & Circuit Breakers

  • memory.allocation.type: Default on-heap, runtime. off-heap is experimental.

  • memory.operation_limit: Default 0; cluster-wide memory cap for operations. New sessions affected only.

Circuit Breakers:

  • indices.breaker.query.limit (default 60%)

  • indices.breaker.request.limit (60%)

  • indices.breaker.accounting.limit (deprecated)

  • stats.breaker.log.jobs.limit (5%)

  • stats.breaker.log.operations.limit (5%)

  • indices.breaker.total.limit (95%)


⚙️ Thread Pools & Overload Protection

  • Thread pools: write, search, get, refresh, generic, logical_replication.

    • Runtime pool types fixed or scaling, configured via thread_pool.<name>.type.

    • Fixed pools include thread_pool.<name>.size and .queue_size.

  • Overload Protection settings (runtime):

    • overload_protection.dml.initial_concurrency

    • min_concurrency, max_concurrency, queue_size


🧾 Metadata / Gateway

  • cluster.info.update.interval: Runtime stat collection cadence.

  • Deprecated gateway settings (non-runtime):

    • gateway.expected_nodes, gateway.expected_data_nodes

    • gateway.recover_after_nodes, gateway.recover_after_data_nodes

    • gateway.recover_after_time


🔁 Logical Replication Settings

Runtime settings:

  • replication.logical.ops_batch_size

  • replication.logical.reads_poll_duration

  • replication.logical.recovery.chunk_size

  • replication.logical.recovery.max_concurrent_file_chunks


✅ Summary

  • View all settings at runtime via sys.cluster.settings.

  • Only runtime-configurable settings can be updated while the cluster operates.

  • Non-runtime settings must be uniform across nodes before node startup.

  • Careful with filters, logging, and memory limits, especially in production environments.

Last updated