The basics of text analysis

The analysis of text data is different from other types of data analysis, such as numbers, dates, and times. The analysis of numeric and date/time datatypes can be done in a very definitive way. For example, if you are looking for all records with a price greater than, or equal to, 50, the result is a simple yes or no for each record. Either the record in question qualifies or doesn't qualify for inclusion in the query's result. Similarly, when querying something by date or time, the criteria for searching through records is very clearly defined – a record either falls into the date/time range or it doesn't.

However, the analysis of text/string data can be different. Text data can be of a different nature, and it can be used for structured or unstructured analysis.

Some examples of structured types of string fields are as follows: country codes, product codes, non-numeric serial numbers/identifiers, and so on. The datatype of these fields may be a string, but often you may want to do exact-match queries on these fields.

We will first cover the analysis of unstructured text, which is also known as full-text search.

From the previous chapter, we already understand the concepts of Elasticsearch indexes, types, and mappings within a type. All fields that are of the text type are analyzed by what is known as an analyzer.

In the following sections, we will cover the following topics:

Understanding Elasticsearch analyzers
Using built-in analyzers
Implementing autocomplete with a custom analyzer

Table of Contents for The basics of text analysis

Create new playlist

Sign In

Sign Up

Table of Contents for
The basics of text analysis