Files

CrateDB Cloud offers a seamless UI to import from standard data sources easily. Files uploaded from the cloud import utility has a max file size of 1GB.

Import from file

Supported file formats are:

CSV (all variants)
JSON (JSON Documents, JSON Arrays, and JSON-Lines)
Parquet

File Format Limitations

CSV files:

Comma, tab and pipe delimiters are supported.

JSON files:

The following formats are supported for JSON:

JSON Documents. Will insert as a single row in the table.
```
{
  "id":1,
  "text": "example"
}
```

JSON Arrays. Will insert as a row per array item.

[
  {
    "id":1,
    "text": "example"
  },
  {
    "id":2,
    "text": "example2"
  }
]

JSON-Lines. Each line will insert as a row.

{"id":1, "text": "example"}
{"id":2, "text": "example2"}

Import from S3 bucket

You can import directly from S3-compatible storage. To import a file from a bucket, provide the name of your bucket and path to the file. Importing multiple files is supported using path glob pattern matching, e.g. /folder/*.parquet

You will need to provide the S3 Access Key ID and S3 Secret Access Key. Optionally, you can also specify the endpoint for non-AWS S3 buckets.

Keep in mind that you may be charged for egress traffic, depending on your provider. There is also a volume limit of 10 GiB per file for S3 imports. The usual file formats are supported - CSV (all variants), JSON (JSON-Lines, JSON Arrays and JSON Documents), and Parquet.

Import from Azure Blob Storage Container

As with other imports, Parquet, CSV, and JSON files are supported. File size limitation for imports is 10 GiB per file.

To grant access to the files, you will need to provide either a connection string or a SAS token.

Importing multiple files from Azure Container/Blob Storage is also supported using path glob pattern matching, e.g. /folder/*.parquet

Import from URL

The following data formats are supported:

CSV (all variants)
JSON (JSON-Lines, JSON Arrays and JSON Documents)
Parquet

Gzip compressed files are also supported.

Schema evolution

Schema Evolution, available for all import types, enables automatic addition of new columns to existing tables during data import, eliminating the need to pre-define table schemas. This feature is applicable to both pre-existing tables and those created during the import process.

Note that Schema Evolution is limited to adding new columns; it does not modify existing ones. For instance, if an existing table has an ‘OrderID’ column of type INTEGER, and an import is attempted with Schema Evolution enabled for data where ‘OrderID’ column is of type STRING, the import job will fail due to type mismatch.

The type of a newly discovered column will be determined by its first value.

PreviousData sources NextForeign Data Wrappers (who? - need review)

Last updated 3 months ago

Good morning

hashtagImport from file

hashtagFile Format Limitations

hashtagImport from S3 bucket

hashtagImport from Azure Blob Storage Container

hashtagImport from URL

hashtagSchema evolution