Circuit breaker
CrateDB’s circuit breakers protect cluster stability by preventing queries and background processes from exhausting a node’s memory. They do this by estimating the memory required for each operation and aborting the process before the JVM heap is overwhelmed.
What Is a Circuit Breaker?
A circuit breaker is a software safeguard designed to halt operations when resource usage crosses a critical threshold. The concept is similar to a household fuse box: if too many appliances draw power from a single line, the circuit trips to prevent damage. In a software system, the stressed resource might be memory, CPU, file descriptors, or even external services.
In CrateDB, the primary resource under pressure is RAM. Queries often run in parallel across many shards. A single complex aggregation or JOIN can allocate gigabytes of memory in milliseconds. CrateDB’s circuit breakers detect this and proactively stop the query by throwing a CircuitBreakingException
, preventing an out-of-memory crash that could bring down the node.
How Circuit Breakers Work in CrateDB
Each query in CrateDB is executed as a series of logical operations. Before executing each step, CrateDB performs a best-effort estimate of the additional memory required. If the projected usage exceeds the configured limit for a circuit breaker, the operation is aborted immediately, and a CircuitBreakingException
is returned.
CrateDB intentionally avoids exact memory accounting. Memory usage is complex to predict precisely, especially in distributed environments, so estimates are used instead.
This preemptive behavior protects the JVM from reaching an unrecoverable out-of-memory state that would destabilize the node or cluster.
Types of Circuit Breakers
CrateDB includes six types of circuit breakers, each guarding a specific component or resource:
query
– Tracks memory used during query execution.request
– Covers memory used during request handling.jobs_log
– Tracks memory used when writing to the jobs log.operations_log
– Tracks memory used when writing to the operations log.total
– Also known as the parent breaker, it tracks overall memory usage across all other breakers.accounting
– Deprecated; will be removed in a future version.
The total
breaker acts as a global safety net. Even if individual breakers are within limits, the total
breaker can still trip if their combined usage exceeds the cluster-wide memory threshold.
For details on configuring breaker limits, see the cluster settings documentation.
Monitoring and Observability
To monitor circuit breaker behavior:
Use JMX metrics. Refer to the JMX Monitoring Guide, particularly the
CircuitBreakers
MXBean section.For hosted deployments, follow the CrateDB Cloud monitoring documentation.
For self-managed clusters, refer to the on-prem monitoring guide, which includes setup instructions for collecting metrics and visualizing them in Grafana.
Exception Handling
When a circuit breaker is triggered, CrateDB returns a CircuitBreakingException
. For example:
CircuitBreakingException[Allocating 2mb for 'query: mergeOnHandler' failed, breaker would use 976.4mb in total. Limit is 972.7mb. Either increase memory and limit, change the query or reduce concurrent query load]
Interpreting the Error
This exception indicates that the estimated memory for mergeOnHandler
exceeded the configured limit (indices.breaker.query.limit
). As a result, CrateDB aborted the operation to protect the node.
Immediate Actions
1. Optimize the Query
Poorly written or overly complex queries can trigger breakers. See the Performance tuning guide for practical tips.
2. Identify High-Memory Queries
You can identify the most memory-intensive active queries by running:
SELECT js.id,
stmt,
username,
sum(used_bytes) AS sum_bytes
FROM sys.operations op
JOIN sys.jobs js ON op.job_id = js.id
GROUP BY js.id, stmt, username
ORDER BY sum_bytes DESC;
To inspect completed jobs and operations, use the sys.jobs_log
and sys.operations_log
system tables. Note that table access permissions apply.
3. Scale the Cluster
If breakers continue to trip even after optimizing queries, consider scaling out your cluster to provide additional resources.
Summary
Circuit breakers are an essential safety mechanism in CrateDB, helping maintain performance and reliability under high memory pressure. By monitoring breaker metrics, tuning queries, and scaling resources as needed, you can avoid unexpected interruptions and ensure smooth cluster operation.
Last updated