dbt

About

dbtarrow-up-right is a tool for transforming data in data warehouses using Python and SQL.

Due to its unique capabilities, CrateDB is an excellent warehouse choice for data transformation projects. It offers automatic indexing, fast aggregations, easy partitioning, and the ability to scale horizontally.

Setup

To start a CrateDB instance for evaluation purposes, use Docker.

docker run --rm \
  --publish=4200:4200 --publish=5432:5432 \
  --env=CRATE_HEAP_SIZE=2g crate:latest

Install the most recent version of the dbt-cratedb2arrow-up-right Python package.

pip install --upgrade 'dbt-cratedb2'

Configure

As CrateDB is compatible with PostgreSQL, the same connectivity options apply like outlined on the dbt Postgres Setuparrow-up-right documentation page.

A minimal set of dbt profile configuration options, for example within a profiles.ymlarrow-up-right file at ~/.dbt/profiles.yml.

cratedb_analytics:
  target: dev
  outputs:
    dev:
      type: cratedb
      host: localhost
      port: 5432
      user: crate
      pass: crate
      dbname: crate
      schema: doc
      search_path: doc

Please note the values for dbname, schema, and search_path in this example.

Examples

When working with dbt, you are working on behalf of a dbt project. A dbt project has a specific structurearrow-up-right, and contains a combination of SQL, Jinja, YAML, and Markdown files. In your project folder, alongside the models folder that most projects have, a folder called macros can include macro override files.

You can explore a few ready-to-run dbt projects that demonstrate usage with CrateDB:

Good to know

A few notes about advanced configuration options and general usage information.

Search Patharrow-up-right

The search_path config controls the CrateDB “search path” that dbt configures when opening new connections to the database. By default, the CrateDB search path is "doc", meaning that unqualified names will be searched for in the doc schema.

Custom Schemasarrow-up-right

By default, dbt writes the models into the schema you configured in your profile, but in some dbt projects you may need to write data into different target schemas. You can adjust the target schema using custom schemas with dbtarrow-up-right.

If your dbt project has a custom macro called generate_schema_name, dbt will use it instead of the default macro. This allows you to customize the name generation according to your needs.

Full Connection Optionsarrow-up-right

CrateDB targets should be set up using the following dbt profile configuration in your profiles.ymlarrow-up-right file, which is identical to the setup options of dbt-postgresarrow-up-right.

CrateDB’s Differences

  • CrateDB’s fixed catalog name is crate, the default schema name is doc.

  • CrateDB does not implement the notion of a database, however tables can be created in different schemas.

  • When asked for a database name, specifying a schema name (any), or the fixed catalog name crate may be applicable.

  • If a database/schema name is omitted while connecting, the PostgreSQL drivers may default to the “username”.

  • The predefined superuser on an unconfigured CrateDB cluster is called crate, defined without a password.

  • For authenticating properly, please learn about the available authentication options.

Caveats

  • Incremental materializations do not support columns using the OBJECT data type yet.

  • Incremental materializations with CrateDB currently only support the delete+insert strategy.

  • Model materializations using the “materialized view” strategy are not supported yet.

Last updated