Token filters

There can be zero or more token filters in an analyzer. Every token filter can add, remove, or change tokens in the input token stream that it receives. Since it is possible to have multiple token filters in an analyzer, the output of each token filter is sent to the next one until all token filters are considered.

Elasticsearch comes with a number of token filters, and they can be used to compose your own custom analyzers.

Some examples of built-in token filters are the following:

  • Lowercase token filter: Replaces all tokens in the input with their lowercase versions.
  • Stop token filter: Removes stopwords, that is, words that do not add more meaning to the context. For example, in English sentences, words like is, a, an, and the, do not add extra meaning to a sentence. For many text search problems, it makes sense to remove such words, as they don't add any extra meaning or context to the content.

You can find a list of available built-in token filters here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenfilters.html.

Thus far, we have looked at the role of character filters, tokenizers, and token filters. This sets us up to understand how some of the built-in analyzers in Elasticsearch are composed. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset