Ingestr

ingestrarrow-up-right is a versatile data I/O framework and command-line application to copy data between any source and any destination. It supports many data sources, destinations, and data loading strategies out of the box.

Adapters for CrateDB let you migrate data from any proprietary enterprise data warehouse or database to CrateDB, to consolidate infrastructure and save operational costs.


Coverage

ingestr supports migration from 20-plus databases, data platforms, analytics engines, and other services, see sources supported by ingestrarrow-up-right and databases supported by SQLAlchemy.

  • Databases: Actian Data Platform, Vector, Actian X, Ingres, Amazon Athena, Amazon Redshift, Amazon S3, Apache Drill, Apache Druid, Apache Hive and Presto, Apache Solr, Clickhouse, CockroachDB, CrateDB, Databend, Databricks, Denodo, DuckDB, EXASOL DB, Elasticsearch, Firebird, Firebolt, Google BigQuery, Google Sheets, Greenplum, HyperSQL (hsqldb), IBM DB2 and Informix, IBM Netezza Performance Server, Impala, Kinetica, Microsoft Access, Microsoft SQL Server, MonetDB, MongoDB, MySQL and MariaDB, OpenGauss, OpenSearch, Oracle, PostgreSQL, Rockset, SAP ASE, SAP HANA, SAP Sybase SQL Anywhere, Snowflake, SQLite, Teradata Vantage, TiDB, YDB, YugabyteDB.

  • Brokers: Amazon Kinesis, Apache Kafka (Amazon MSK, Confluent Kafka, Redpanda, RobustMQ)

  • File formats: CSV, JSONL/NDJSON, Parquet

  • Object stores: Amazon S3, Google Cloud Storage

  • Services: Airtable, Asana, GitHub, Google Ads, Google Analytics, Google Sheets, HubSpot, Notion, Personio, Salesforce, Slack, Stripe, Zendesk, etc.


Install

uv tool install --prerelease=allow --upgrade 'cratedb-toolkit[io-ingestr]'

Configure

For hands-on examples of the configuration parameters enumerated here, see usage section below.

Data source and destination

ingestr uses four parameters to address source and destination, while ctk load table uses just two, by embedding the source and destination table names into the address URLs themselves, using the table query parameter.

Batch size

Because the underlying framework uses dltarrow-up-right, you will configure parameters like batch size in your .dlt/config.toml.

Incremental loading

ingestr supports incremental loadingarrow-up-right, which means you can choose to append, merge or delete+insert data into the destination table using different strategies. Incremental loading allows you to ingest only the new rows from the source table into the destination table, which means that you don’t have to load the entire table every time you run the data migration procedure.

Credentials

Source: Data pipeline source elements use their specific way for configuring access credentials, using individual parameters.

Destination: CrateDB as the pipeline sink element uses the same way to specify credentials across the board within the --cluster-url CLI option. Please note you must specify a password. If your account doesn’t use a password, use an arbitrary string like na.

Custom queries

ingestr provides custom queries for SQL sourcesarrow-up-right, by using the query: prefix to the source table name. The feature can be used for partial table loading to limit the import to use just a subset of the source columns, or for general data filtering and aggregation purposes. Example:


Data Loading Examples

BigQuery to CrateDB

Databricks to CrateDB

PostgreSQL to CrateDB

Salesforce to CrateDB

Kinesis to CrateDB

Source URL template: kinesis://?aws_access_key_id=<aws-access-key-id>&aws_secret_access_key=<aws-secret-access-key>&region_name=<region-name>&table=arn:aws:kinesis:<region-name>:<aws-account-id>:stream/<stream_name>

Amazon Redshift to CrateDB

Amazon S3 to CrateDB

See documentation about ingestr and Amazon S3arrow-up-right about details of the URI format, file globbing patterns, compression options, and file type hinting options.

Source URL template: s3://?access_key_id=<aws-access-key-id>&secret_access_key=<aws-secret-access-key>&table=<bucket-name>/<file-glob>

Apache Kafka to CrateDB

Apache Solr to CrateDB

Clickhouse to CrateDB

CrateDB to CrateDB

CSV to CrateDB

Databricks to CrateDB

DuckDB to CrateDB

EXASOL DB to CrateDB

Elasticsearch to CrateDB

GitHub to CrateDB

Google Cloud Storage to CrateDB

Google BigQuery to CrateDB

Google Sheets to CrateDB

HubSpot to CrateDB

See HubSpot entitiesarrow-up-right about any labels you can use for the table parameter in the source URL.

MongoDB to CrateDB

MySQL to CrateDB

Oracle to CrateDB

PostgreSQL to CrateDB

Salesforce to CrateDB

See Salesforce entitiesarrow-up-right about any labels you can use for the table parameter in the source URL.

Slack to CrateDB

See Slack entitiesarrow-up-right about any labels you can use for the table parameter in the source URL.

Snowflake to CrateDB

SQLite to CrateDB

Teradata to CrateDB

Last updated