Full-text analyzers

Analyzers create fulltext indexes by converting field content into a stream of tokens. They consist of:

Char filters – applied first, modify or filter characters.
Tokenizer – splits character streams into tokens.
Token filters – transform, remove, or add tokens.

Analyzers can be built-in or custom, and they allow language-specific processing for better search accuracy.

Analyzer

Type

Description

Parameters

standard

standard

Uses Standard tokenizer + standard, lowercase, and stop token filters. Unicode text segmentation (UAX#29).

stopwords, max_token_length

default

default

Same as standard.

–

simple

simple

Uses Lowercase tokenizer.

–

plain

plain

Alias for keyword analyzer. Cannot be extended.

–

whitespace

whitespace

Uses Whitespace tokenizer.

–

stop

stop

Lowercase tokenizer + stop filter.

stopwords, stopwords_path

keyword

keyword

Produces a single token from the entire content.

–

pattern

pattern

Splits text based on regex.

pattern, lowercase, flags

language-specific

<language>

Supports many languages (English, French, German, etc.) with stopwords and optional stem exclusion.

stopwords, stopwords_path, stem_exclusion

snowball

snowball

Standard tokenizer + standard, lowercase, stop, snowball filter.

stopwords, language

fingerprint

fingerprint

Lowercases, normalizes, sorts, deduplicates, concatenates tokens. Useful for clustering.

separator, max_output_size, stopwords, stopwords_path

T

PreviousFull-text indexes NextChar filters

Last updated 3 months ago

Good afternoon

T