Components of a CrateDB node
To understand how a CrateDB cluster works, it helps to first examine the core components of an individual CrateDB node.
Figure 1: Multiple interconnected CrateDB nodes forming a single database cluster. Each node has the same components.
In a CrateDB cluster, all nodes are equal peers, each running the same set of components. These components:
Interact with each other within the node,
Communicate with the same components on other nodes,
And interface with external clients.
Each node consists of the following four main components listed below.
Table of Contents
SQL Handler
The SQL Handler is responsible for processing incoming client queries. Specifically, it:
Receives client requests,
Parses and analyzes the SQL statements,
Creates an execution plan based on the parsed query (abstract syntax tree).
This is the only component that interfaces directly with clients and external systems.
CrateDB supports multiple client communication protocols:
HTTP
PostgreSQL Wire Protocol
Binary Transport Protocol (CrateDB’s internal protocol)
A typical client request includes a SQL query and its arguments, which the SQL Handler processes and prepares for execution.
Job Execution Service
The Job Execution Service manages the execution of the plan created by the SQL Handler. A job includes several execution phases and operations, which are:
Defined in the execution plan,
Distributed across involved nodes (local and/or remote) via the Binary Transport Protocol.
Each job tracks the IDs of its constituent operations. This allows CrateDB to monitor the execution of distributed queries and manage them—such as terminating them if needed.
Cluster State Service
The Cluster State Service handles everything related to cluster coordination and discovery. It has three primary responsibilities:
Cluster state management – Keeping nodes synchronized with the current state of the cluster.
Master node election – Determining which node acts as the cluster’s master.
Node discovery – Managing the joining and communication of nodes (see Multi-node setup: Clusters).
This service uses the Binary Transport Protocol for internal communication between nodes.
Data Storage
The Data Storage component handles persistent storage and retrieval of data based on the execution plan.
Key characteristics:
Data is stored in shards, with each table split across one or more nodes.
Each shard is a Lucene index physically stored on the file system.
Reads and writes operate at the shard level, allowing for parallelized, distributed performance.
Last updated