Reference architectures

These reference architectures illustrate how CrateDB can be deployed and integrated in real-world scenarios. They provide guidance on scalability, reliability, data ingestion, and analytics across diverse domains.

1. Real-Time IoT Analytics (Cloud-Native Deployment)

Use Case

Monitoring and analyzing data from a high-volume stream of IoT sensors, such as industrial machinery, smart city infrastructure, or mobility fleets.

Architecture Components

CrateDB Cloud (multi-node cluster)
Message Broker: Kafka, MQTT or Azure Event Hub
Ingestion Layer: Kafka Connect / Fluent Bit
Real-Time Analytics: CrateDB’s SQL interface + dashboards (e.g., Grafana)
Downstream Integration: REST APIs or CDC tools (Debezium)
Security: API keys, TLS, access roles via CrateDB Cloud

2. Edge-to-Cloud Data Pipeline

Use Case

Processing time-series and event data at the edge with periodic sync to a central CrateDB Cloud cluster for advanced analytics.

Architecture Components

Edge Node: Lightweight CrateDB or TimescaleDB (for local buffering)
Sync Layer: Scheduled jobs or custom ETL using Python, Kafka, or Debezium
Central CrateDB Cloud Cluster: Stores harmonized data for analytics
Analytics/AI Layer: Python, Jupyter, or DB-native aggregations
Security: Encrypted transfer, RBAC for multi-tenant access

3. Unified Observability Platform

Use Case

Centralizing logs, metrics, and events for DevOps and infrastructure observability.

Architecture Components

CrateDB Cloud or Self-Hosted Cluster
Data Sources: Prometheus exporters, Fluent Bit, Logstash, OpenTelemetry
Ingestion Pipeline: Kafka → CrateDB, with optional transformation layer (e.g., Apache Flink)
Visualization: Grafana dashboards (via CrateDB plugin)
Storage Strategy: Partitioned by time and source, TTL policies
Access Control: SQL RBAC, token-based access for dashboards

4. AI/ML Feature Store Backed by CrateDB

Use Case

Storing and serving high-volume, real-time feature data for machine learning pipelines.

Architecture Components

CrateDB Cluster: Stores raw data + materialized features
Data Pipeline: Apache Spark, dbt, or Python ETL for feature computation
Serving Layer: CrateDB's low-latency SQL queries as REST or gRPC APIs
Model Training: Python (Pandas, Scikit-learn), notebooks, or Dataiku
Versioning & Time Travel: Implemented via snapshot tables or time-based partitions

5. Customer 360 and Operational Analytics

Use Case

Combining real-time operational data (clickstreams, events, CRM) with historical records to drive personalized experiences and business insight.

Architecture Components

Ingestion: Apache Kafka / Confluent → CrateDB
Batch Ingest: Airbyte, dbt, or custom ETL
CrateDB Cluster: Serving analytical SQL with fast aggregations and joins
BI Integration: Tableau, Power BI, or Apache Superset
API Layer: REST or GraphQL to serve data to apps/portals

6. Multi-Region CrateDB Cloud Deployment

Use Case

High availability and regional data residency compliance across Europe, North America, and Asia.

Architecture Components

CrateDB Cloud Clusters: One per region
Global Routing: DNS load balancing + application logic to route by geography
Data Sync: CDC-based replication, Kafka + Avro schemas, or cross-region export/import jobs
Failover & DR: Cloud-native replication strategy + backup/restore automation
Access Control: Project-level isolation, org-level roles

7. CrateDB in Kubernetes (Self-hosted or Hybrid)

Use Case

Deploying CrateDB in Kubernetes for scalability and DevOps automation.

Architecture Components

Kubernetes Cluster: EKS, AKS, GKE, or on-prem
CrateDB Helm Chart: Manages StatefulSet + persistent storage
Operator (optional): For lifecycle automation
Ingestion: Kafka or REST endpoints
Monitoring: Prometheus + Grafana with built-in CrateDB metrics
Security: NetworkPolicies, TLS, Kubernetes secrets

8. Geospatial Analytics Platform

Use Case

Analyzing location-based data (e.g., vehicle fleets, smart infrastructure, geofencing alerts).

Architecture Components

CrateDB Cluster: With native geospatial functions
Ingestion: GPS data over REST/Kafka
Geospatial Indexing: GEO_POINT and GEO_SHAPE columns
Query Layer: Proximity search, polygon joins, real-time tracking
Visualization: Kepler.gl, Mapbox, or custom web dashboards
Alerting: CrateDB scheduled jobs + webhook integration

9. E-commerce Event and Inventory Analytics

Use Case

Tracking product views, carts, conversions, and inventory changes in real time.

Architecture Components

Frontend Events: Captured via JavaScript SDK → Kafka
Product and Inventory Data: Synced from ERP/CRM systems
CrateDB Cluster: Ingests and joins structured + semi-structured data
Reporting: Ad hoc queries, dashboards, real-time KPIs
Alerting: Low-stock, anomaly detection with scheduled queries

10. CrateDB + LLM Integration

Use Case

Embedding CrateDB in an AI application stack as a high-performance vector store for RAG (Retrieval-Augmented Generation).

Architecture Components

LLM Application: LangChain or Haystack
CrateDB Vector Support: VECTOR data type for embeddings
Embedding Generation: OpenAI, Hugging Face, etc.
Similarity Search: KNN queries over VECTOR columns
Index Management: Background jobs to update embeddings as new data arrives

PreviousResiliency NextInstallation guide

Last updated 4 months ago

Good evening

1. Real-Time IoT Analytics (Cloud-Native Deployment)

Use Case

Architecture Components

2. Edge-to-Cloud Data Pipeline

Use Case

Architecture Components

3. Unified Observability Platform

Use Case

Architecture Components

4. AI/ML Feature Store Backed by CrateDB

Use Case

Architecture Components

5. Customer 360 and Operational Analytics

Use Case

Architecture Components

6. Multi-Region CrateDB Cloud Deployment

Use Case

Architecture Components

7. CrateDB in Kubernetes (Self-hosted or Hybrid)

Use Case

Architecture Components

8. Geospatial Analytics Platform

Use Case

Architecture Components

9. E-commerce Event and Inventory Analytics

Use Case

Architecture Components

10. CrateDB + LLM Integration

Use Case

Architecture Components