Using stemming

Finding the stem of a word involves removing any prefixes or suffixes, and what is left is considered to be the stem. Identifying stems is useful for tasks where finding similar words is important. For example, a search may be looking for occurrences of words such as book. There are many words that contain this word, including books, booked, bookings, and bookmark. It can be useful to identify stems and then look for their occurrence in a document. In many situations, this can improve the quality of a search.

A stemmer may produce a stem that is not a real word. For example, it may decide that bounties, bounty, and bountiful all have the same stem, bounti. This can still be useful for searches.

Similar to stemming is lemmatization. This is the process of finding its lemma, its form as found in a dictionary. This can also be useful for some searches. Stemming is frequently viewed as a more primitive technique, where the attempt to get to the root of a word involves cutting off parts of the beginning and/or ending of a token.
Lemmatization can be thought of as a more sophisticated approach, where effort is devoted to finding the morphological or lexical meaning of a token. For example, the word having has a stem of hav while its lemma is have. Also, the words was and been have different stems but the same lemma, be.
Lemmatization can often use more computational resources than stemming. They both have their place, and their utility is partially determined by the problem that needs to be solved.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset