Circuit breaker

CrateDB’s circuit breakers protect cluster stability by preventing queries and background processes from exhausting a node’s memory. They do this by estimating the memory required for each operation and aborting the process before the JVM heap is overwhelmed.


What Is a Circuit Breaker?

A circuit breaker is a software safeguard designed to halt operations when resource usage crosses a critical threshold. The concept is similar to a household fuse box: if too many appliances draw power from a single line, the circuit trips to prevent damage. In a software system, the stressed resource might be memory, CPU, file descriptors, or even external services.

In CrateDB, the primary resource under pressure is RAM. Queries often run in parallel across many shards. A single complex aggregation or JOIN can allocate gigabytes of memory in milliseconds. CrateDB’s circuit breakers detect this and proactively stop the query by throwing a CircuitBreakingException, preventing an out-of-memory crash that could bring down the node.


How Circuit Breakers Work in CrateDB

Each query in CrateDB is executed as a series of logical operations. Before executing each step, CrateDB performs a best-effort estimate of the additional memory required. If the projected usage exceeds the configured limit for a circuit breaker, the operation is aborted immediately, and a CircuitBreakingException is returned.

circle-exclamation

This preemptive behavior protects the JVM from reaching an unrecoverable out-of-memory state that would destabilize the node or cluster.


Types of Circuit Breakers

CrateDB includes six types of circuit breakers, each guarding a specific component or resource:

  • query – Tracks memory used during query execution.

  • request – Covers memory used during request handling.

  • jobs_log – Tracks memory used when writing to the jobs log.

  • operations_log – Tracks memory used when writing to the operations log.

  • total – Also known as the parent breaker, it tracks overall memory usage across all other breakers.

  • accounting – Deprecated; will be removed in a future version.

The total breaker acts as a global safety net. Even if individual breakers are within limits, the total breaker can still trip if their combined usage exceeds the cluster-wide memory threshold.

For details on configuring breaker limits, see the cluster settings documentation.


Monitoring and Observability

To monitor circuit breaker behavior:


Exception Handling

When a circuit breaker is triggered, CrateDB returns a CircuitBreakingException. For example:

Interpreting the Error

This exception indicates that the estimated memory for mergeOnHandler exceeded the configured limit (indices.breaker.query.limit). As a result, CrateDB aborted the operation to protect the node.

Immediate Actions

1. Optimize the Query

Poorly written or overly complex queries can trigger breakers. See the Performance tuning guide for practical tips.

2. Identify High-Memory Queries

You can identify the most memory-intensive active queries by running:

To inspect completed jobs and operations, use the sys.jobs_log and sys.operations_log system tables. Note that table access permissions apply.

3. Scale the Cluster

If breakers continue to trip even after optimizing queries, consider scaling out your cluster to provide additional resources.

circle-info

Similar exceptions may occur for other breaker types like [request], [parent], or [jobs_log]. A [parent] exception means multiple queries or background tasks exceeded the combined total memory limit (indices.breaker.total.limit).


Summary

Circuit breakers are an essential safety mechanism in CrateDB, helping maintain performance and reliability under high memory pressure. By monitoring breaker metrics, tuning queries, and scaling resources as needed, you can avoid unexpected interruptions and ensure smooth cluster operation.

Last updated