Full-text data (Brian - Live)
Moved to PR #270. See preview.
The content has been updated since!
CrateDB offers native full-text search with Apache Lucene and Okapi BM25 ranking, accessible via SQL for easy modeling and querying of large-scale textual data. It features fuzzy matching, multi-language analysis, and composite indexing while fully integrating with data types like JSON, time-series, geospatial, vectors, and more for comprehensive multi-model queries. Whether for document search, catalog lookup, or content analytics, CrateDB is ideal.
Data Types & Indexing
By default, all text columns are indexed as plain (raw, unanalyzed)—efficient for equality search but not suitable for full‑text queries.
To use full-text search, add a FULLTEXT index with an optional analyzer to the text columns you want to search:
CREATE TABLE documents (
title TEXT,
body TEXT,
INDEX ft_body USING FULLTEXT(body) WITH (analyzer = 'english')
);You can also index multiple columns with composite full-text indices:
INDEX ft_all USING FULLTEXT(title, body) WITH (analyzer = 'english');For detailed options, check out the Reference Manual.
Analyzers
An analyzer splits text into searchable terms and consists of the following components:
Tokenizer - splits on whitespace/characters
Token Filters - e.g. lowercase, stemming, stop‑word removal
Char Filters - pre-processing (e.g. stripping HTML).
CrateDB offers about 50 built-in analyzers supporting more than 30 languages.
You can extend a built-in analyzer:
or create your own custom analyzer :
Learn more about the builtin analyzers and how to define your own with custom tokenizers and token filters.
Querying: MATCH Predicate & Scoring
CrateDB uses the SQL MATCH predicate to run full‑text queries against full‑text indices. It optionally returns a relevance score _score, ranked via BM25.
Basic usage:
Searching multiple indices with weighted ranking:
You can configure match options like:
using best_fields(default)fuzziness = 1(tolerate minor typos)operator = 'AND'orORslop = Nfor phrase proximity
Example: Fuzzy Search
This matches similar words like ‘James’.
Example: Multi‑language Composite Search
Use Cases & Integration
CrateDB is ideal for searching semi-structured large text data—product catalogs, article archives, user-generated content, descriptions and logs.
Because full-text indices are updated in real-time, search results reflect newly ingested data almost instantly. This tight integration avoids the complexity of maintaining separate search infrastructure.
You can combine full-text search with other data domains, for example:
This blend lets you query by text relevance, numeric filters, and spatial constraints, all in one.
Further Learning & Resources
Full-text Search in Query Capabilities: Overview of full text search features
Reference Manual:
Full-text indices: Defining indices, extending builtin analyzers, custom analyzers.
Full-text analyzers: Builtin analyzers, tokenizers, token and char filters.
Full-text search: Details about MATCH predicate arguments and options.
Hands‑On Academy Course: explore FTS on real datasets (e.g. Chicago neighborhoods).
Last updated

