Reference architectures
These reference architectures illustrate how CrateDB can be deployed and integrated in real-world scenarios. They provide guidance on scalability, reliability, data ingestion, and analytics across diverse domains.
1. Real-Time IoT Analytics (Cloud-Native Deployment)
Use Case
Monitoring and analyzing data from a high-volume stream of IoT sensors, such as industrial machinery, smart city infrastructure, or mobility fleets.
Architecture Components
CrateDB Cloud (multi-node cluster)
Message Broker: Kafka, MQTT or Azure Event Hub
Ingestion Layer: Kafka Connect / Fluent Bit
Real-Time Analytics: CrateDB’s SQL interface + dashboards (e.g., Grafana)
Downstream Integration: REST APIs or CDC tools (Debezium)
Security: API keys, TLS, access roles via CrateDB Cloud
2. Edge-to-Cloud Data Pipeline
Use Case
Processing time-series and event data at the edge with periodic sync to a central CrateDB Cloud cluster for advanced analytics.
Architecture Components
Edge Node: Lightweight CrateDB or TimescaleDB (for local buffering)
Sync Layer: Scheduled jobs or custom ETL using Python, Kafka, or Debezium
Central CrateDB Cloud Cluster: Stores harmonized data for analytics
Analytics/AI Layer: Python, Jupyter, or DB-native aggregations
Security: Encrypted transfer, RBAC for multi-tenant access
3. Unified Observability Platform
Use Case
Centralizing logs, metrics, and events for DevOps and infrastructure observability.
Architecture Components
CrateDB Cloud or Self-Hosted Cluster
Data Sources: Prometheus exporters, Fluent Bit, Logstash, OpenTelemetry
Ingestion Pipeline: Kafka → CrateDB, with optional transformation layer (e.g., Apache Flink)
Visualization: Grafana dashboards (via CrateDB plugin)
Storage Strategy: Partitioned by time and source, TTL policies
Access Control: SQL RBAC, token-based access for dashboards
4. AI/ML Feature Store Backed by CrateDB
Use Case
Storing and serving high-volume, real-time feature data for machine learning pipelines.
Architecture Components
CrateDB Cluster: Stores raw data + materialized features
Data Pipeline: Apache Spark, dbt, or Python ETL for feature computation
Serving Layer: CrateDB's low-latency SQL queries as REST or gRPC APIs
Model Training: Python (Pandas, Scikit-learn), notebooks, or Dataiku
Versioning & Time Travel: Implemented via snapshot tables or time-based partitions
5. Customer 360 and Operational Analytics
Use Case
Combining real-time operational data (clickstreams, events, CRM) with historical records to drive personalized experiences and business insight.
Architecture Components
Ingestion: Apache Kafka / Confluent → CrateDB
Batch Ingest: Airbyte, dbt, or custom ETL
CrateDB Cluster: Serving analytical SQL with fast aggregations and joins
BI Integration: Tableau, Power BI, or Apache Superset
API Layer: REST or GraphQL to serve data to apps/portals
6. Multi-Region CrateDB Cloud Deployment
Use Case
High availability and regional data residency compliance across Europe, North America, and Asia.
Architecture Components
CrateDB Cloud Clusters: One per region
Global Routing: DNS load balancing + application logic to route by geography
Data Sync: CDC-based replication, Kafka + Avro schemas, or cross-region export/import jobs
Failover & DR: Cloud-native replication strategy + backup/restore automation
Access Control: Project-level isolation, org-level roles
7. CrateDB in Kubernetes (Self-hosted or Hybrid)
Use Case
Deploying CrateDB in Kubernetes for scalability and DevOps automation.
Architecture Components
Kubernetes Cluster: EKS, AKS, GKE, or on-prem
CrateDB Helm Chart: Manages StatefulSet + persistent storage
Operator (optional): For lifecycle automation
Ingestion: Kafka or REST endpoints
Monitoring: Prometheus + Grafana with built-in CrateDB metrics
Security: NetworkPolicies, TLS, Kubernetes secrets
8. Geospatial Analytics Platform
Use Case
Analyzing location-based data (e.g., vehicle fleets, smart infrastructure, geofencing alerts).
Architecture Components
CrateDB Cluster: With native geospatial functions
Ingestion: GPS data over REST/Kafka
Geospatial Indexing:
GEO_POINT
andGEO_SHAPE
columnsQuery Layer: Proximity search, polygon joins, real-time tracking
Visualization: Kepler.gl, Mapbox, or custom web dashboards
Alerting: CrateDB scheduled jobs + webhook integration
9. E-commerce Event and Inventory Analytics
Use Case
Tracking product views, carts, conversions, and inventory changes in real time.
Architecture Components
Frontend Events: Captured via JavaScript SDK → Kafka
Product and Inventory Data: Synced from ERP/CRM systems
CrateDB Cluster: Ingests and joins structured + semi-structured data
Reporting: Ad hoc queries, dashboards, real-time KPIs
Alerting: Low-stock, anomaly detection with scheduled queries
10. CrateDB + LLM Integration
Use Case
Embedding CrateDB in an AI application stack as a high-performance vector store for RAG (Retrieval-Augmented Generation).
Architecture Components
LLM Application: LangChain or Haystack
CrateDB Vector Support:
VECTOR
data type for embeddingsEmbedding Generation: OpenAI, Hugging Face, etc.
Similarity Search: KNN queries over
VECTOR
columnsIndex Management: Background jobs to update embeddings as new data arrives
Last updated