Reference architectures

These reference architectures illustrate how CrateDB can be deployed and integrated in real-world scenarios. They provide guidance on scalability, reliability, data ingestion, and analytics across diverse domains.

1. Real-Time IoT Analytics (Cloud-Native Deployment)

Use Case

Monitoring and analyzing data from a high-volume stream of IoT sensors, such as industrial machinery, smart city infrastructure, or mobility fleets.

Architecture Components

  • CrateDB Cloud (multi-node cluster)

  • Message Broker: Kafka, MQTT or Azure Event Hub

  • Ingestion Layer: Kafka Connect / Fluent Bit

  • Real-Time Analytics: CrateDB’s SQL interface + dashboards (e.g., Grafana)

  • Downstream Integration: REST APIs or CDC tools (Debezium)

  • Security: API keys, TLS, access roles via CrateDB Cloud

2. Edge-to-Cloud Data Pipeline

Use Case

Processing time-series and event data at the edge with periodic sync to a central CrateDB Cloud cluster for advanced analytics.

Architecture Components

  • Edge Node: Lightweight CrateDB or TimescaleDB (for local buffering)

  • Sync Layer: Scheduled jobs or custom ETL using Python, Kafka, or Debezium

  • Central CrateDB Cloud Cluster: Stores harmonized data for analytics

  • Analytics/AI Layer: Python, Jupyter, or DB-native aggregations

  • Security: Encrypted transfer, RBAC for multi-tenant access

3. Unified Observability Platform

Use Case

Centralizing logs, metrics, and events for DevOps and infrastructure observability.

Architecture Components

  • CrateDB Cloud or Self-Hosted Cluster

  • Data Sources: Prometheus exporters, Fluent Bit, Logstash, OpenTelemetry

  • Ingestion Pipeline: Kafka → CrateDB, with optional transformation layer (e.g., Apache Flink)

  • Visualization: Grafana dashboards (via CrateDB plugin)

  • Storage Strategy: Partitioned by time and source, TTL policies

  • Access Control: SQL RBAC, token-based access for dashboards

4. AI/ML Feature Store Backed by CrateDB

Use Case

Storing and serving high-volume, real-time feature data for machine learning pipelines.

Architecture Components

  • CrateDB Cluster: Stores raw data + materialized features

  • Data Pipeline: Apache Spark, dbt, or Python ETL for feature computation

  • Serving Layer: CrateDB's low-latency SQL queries as REST or gRPC APIs

  • Model Training: Python (Pandas, Scikit-learn), notebooks, or Dataiku

  • Versioning & Time Travel: Implemented via snapshot tables or time-based partitions

5. Customer 360 and Operational Analytics

Use Case

Combining real-time operational data (clickstreams, events, CRM) with historical records to drive personalized experiences and business insight.

Architecture Components

  • Ingestion: Apache Kafka / Confluent → CrateDB

  • Batch Ingest: Airbyte, dbt, or custom ETL

  • CrateDB Cluster: Serving analytical SQL with fast aggregations and joins

  • BI Integration: Tableau, Power BI, or Apache Superset

  • API Layer: REST or GraphQL to serve data to apps/portals

6. Multi-Region CrateDB Cloud Deployment

Use Case

High availability and regional data residency compliance across Europe, North America, and Asia.

Architecture Components

  • CrateDB Cloud Clusters: One per region

  • Global Routing: DNS load balancing + application logic to route by geography

  • Data Sync: CDC-based replication, Kafka + Avro schemas, or cross-region export/import jobs

  • Failover & DR: Cloud-native replication strategy + backup/restore automation

  • Access Control: Project-level isolation, org-level roles

7. CrateDB in Kubernetes (Self-hosted or Hybrid)

Use Case

Deploying CrateDB in Kubernetes for scalability and DevOps automation.

Architecture Components

  • Kubernetes Cluster: EKS, AKS, GKE, or on-prem

  • CrateDB Helm Chart: Manages StatefulSet + persistent storage

  • Operator (optional): For lifecycle automation

  • Ingestion: Kafka or REST endpoints

  • Monitoring: Prometheus + Grafana with built-in CrateDB metrics

  • Security: NetworkPolicies, TLS, Kubernetes secrets

8. Geospatial Analytics Platform

Use Case

Analyzing location-based data (e.g., vehicle fleets, smart infrastructure, geofencing alerts).

Architecture Components

  • CrateDB Cluster: With native geospatial functions

  • Ingestion: GPS data over REST/Kafka

  • Geospatial Indexing: GEO_POINT and GEO_SHAPE columns

  • Query Layer: Proximity search, polygon joins, real-time tracking

  • Visualization: Kepler.gl, Mapbox, or custom web dashboards

  • Alerting: CrateDB scheduled jobs + webhook integration

9. E-commerce Event and Inventory Analytics

Use Case

Tracking product views, carts, conversions, and inventory changes in real time.

Architecture Components

  • Frontend Events: Captured via JavaScript SDK → Kafka

  • Product and Inventory Data: Synced from ERP/CRM systems

  • CrateDB Cluster: Ingests and joins structured + semi-structured data

  • Reporting: Ad hoc queries, dashboards, real-time KPIs

  • Alerting: Low-stock, anomaly detection with scheduled queries

10. CrateDB + LLM Integration

Use Case

Embedding CrateDB in an AI application stack as a high-performance vector store for RAG (Retrieval-Augmented Generation).

Architecture Components

  • LLM Application: LangChain or Haystack

  • CrateDB Vector Support: VECTOR data type for embeddings

  • Embedding Generation: OpenAI, Hugging Face, etc.

  • Similarity Search: KNN queries over VECTOR columns

  • Index Management: Background jobs to update embeddings as new data arrives

Last updated