Full-text analyzers
Analyzers create fulltext indexes by converting field content into a stream of tokens. They consist of:
Char filters – applied first, modify or filter characters.
Tokenizer – splits character streams into tokens.
Token filters – transform, remove, or add tokens.
Analyzers can be built-in or custom, and they allow language-specific processing for better search accuracy.
standard
standard
Uses Standard tokenizer + standard, lowercase, and stop token filters. Unicode text segmentation (UAX#29).
stopwords
, max_token_length
default
default
Same as standard
.
–
simple
simple
Uses Lowercase tokenizer.
–
plain
plain
Alias for keyword analyzer. Cannot be extended.
–
whitespace
whitespace
Uses Whitespace tokenizer.
–
stop
stop
Lowercase tokenizer + stop filter.
stopwords
, stopwords_path
keyword
keyword
Produces a single token from the entire content.
–
pattern
pattern
Splits text based on regex.
pattern
, lowercase
, flags
language-specific
<language>
Supports many languages (English, French, German, etc.) with stopwords and optional stem exclusion.
stopwords
, stopwords_path
, stem_exclusion
snowball
snowball
Standard tokenizer + standard, lowercase, stop, snowball filter.
stopwords
, language
fingerprint
fingerprint
Lowercases, normalizes, sorts, deduplicates, concatenates tokens. Useful for clustering.
separator
, max_output_size
, stopwords
, stopwords_path
T
Last updated