Full-text analyzers

Analyzers create fulltext indexes by converting field content into a stream of tokens. They consist of:

  1. Char filters – applied first, modify or filter characters.

  2. Tokenizer – splits character streams into tokens.

  3. Token filters – transform, remove, or add tokens.

Analyzers can be built-in or custom, and they allow language-specific processing for better search accuracy.

Analyzer
Type
Description
Parameters

standard

standard

Uses Standard tokenizer + standard, lowercase, and stop token filters. Unicode text segmentation (UAX#29).

stopwords, max_token_length

default

default

Same as standard.

simple

simple

Uses Lowercase tokenizer.

plain

plain

Alias for keyword analyzer. Cannot be extended.

whitespace

whitespace

Uses Whitespace tokenizer.

stop

stop

Lowercase tokenizer + stop filter.

stopwords, stopwords_path

keyword

keyword

Produces a single token from the entire content.

pattern

pattern

Splits text based on regex.

pattern, lowercase, flags

language-specific

<language>

Supports many languages (English, French, German, etc.) with stopwords and optional stem exclusion.

stopwords, stopwords_path, stem_exclusion

snowball

snowball

Standard tokenizer + standard, lowercase, stop, snowball filter.

stopwords, language

fingerprint

fingerprint

Lowercases, normalizes, sorts, deduplicates, concatenates tokens. Useful for clustering.

separator, max_output_size, stopwords, stopwords_path


T

Last updated