Files (Ivan - need review)
CrateDB Cloud offers a seamless UI to import from standard data sources easily. Files uploaded from the cloud import utility has a max file size of 1GB.
Import from file
Supported file formats are:
CSV (all variants)
JSON (JSON Documents, JSON Arrays, and JSON-Lines)
Parquet
File Format Limitations
CSV files:
Comma, tab and pipe delimiters are supported.
JSON files:
The following formats are supported for JSON:
JSON Documents. Will insert as a single row in the table.
{ "id":1, "text": "example" }
JSON Arrays. Will insert as a row per array item.
[ { "id":1, "text": "example" }, { "id":2, "text": "example2" } ]
JSON-Lines. Each line will insert as a row.
{"id":1, "text": "example"} {"id":2, "text": "example2"}
Import from S3 bucket
You can import directly from S3-compatible storage. To import a file form bucket, provide the name of your bucket, and path to the file. The S3 Access Key ID, and S3 Secret Access Key are also needed. You can also specify the endpoint for non-AWS S3 buckets. Keep in mind that you may be charged for egress traffic, depending on your provider. There is also a volume limit of 10 GiB per file for S3 imports. The usual file formats are supported - CSV (all variants), JSON (JSON-Lines, JSON Arrays and JSON Documents), and Parquet.
Import from Azure Blob Storage Container
As with other imports Parquet, CSV, and JSON files are supported. File size limitation for imports is 10 GiB per file.
Importing multiple files, also known as import globbing is supported in any s3-compatible blob storage. The steps are the same as if importing from S3, i.e. bucket name, path to the file and S3 ID/Secret.
Importing multiple files from Azure Container/Blob Storage is also supported using path glob pattern matching, e.g. /folder/*.parquet
Import from URL
The following data formats are supported:
CSV (all variants)
JSON (JSON-Lines, JSON Arrays and JSON Documents)
Parquet
Gzip compressed files are also supported.
Schema evolution
Schema Evolution, available for all import types, enables automatic addition of new columns to existing tables during data import, eliminating the need to pre-define table schemas. This feature is applicable to both pre-existing tables and those created during the import process.
Note that Schema Evolution is limited to adding new columns; it does not modify existing ones. For instance, if an existing table has an ‘OrderID’ column of type INTEGER, and an import is attempted with Schema Evolution enabled for data where ‘OrderID’ column is of type STRING, the import job will fail due to type mismatch.
Last updated