When we are working with NLP, the first job is to divide the text into a list of tokens. This process is called tokenization. The granularity of the resulting tokens will vary based on the objective—for example, each token can consist of the following:
- A word
- A combination of words
- A sentence
- A paragraph