Text-classifying techniques

Classification is concerned with taking a specific document and determining whether it fits into one of several other document groups. There are two basic techniques for classifying text:

  • Rule-based classification
  • Supervised machine learning

Rule-based classification uses a combination of words and other attributes that are organized around expert crafted rules. These can be very effective, but creating them is a time-consuming process. 

Supervised machine learning (SML) takes a collection of annotated training documents to create a model. The model is normally called the classifier. There are many different machine learning techniques, including Naive Bayes, support vector machine (SVM), and k-nearest neighbor.

We are not concerned with how these approaches work, but the interested reader will find innumerable sources that expand upon these and other techniques.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset