Getting started with two types of data, numerical and categorical

At first glance, the features in the preceding dataset are categorical, for example, male or female, one of four age groups, one of the predefined site categories, whether or not being interested in sports. Such types of data are different from the numerical type of feature data that we have worked with until now.

Categorical (also called qualitative) features represent characteristics, distinct groups, and a countable number of options. Categorical features may or may not have logical order. For example, household income from low, median to high, is an ordinal feature, while the category of an ad is not ordinal. Numerical (also called quantitative) features, on the other hand, have mathematical meaning as a measurement and of course are ordered. For instance, term frequency and the tf-idf variant are respectively discreet and continuous numerical features; the cardiotocography dataset contains both discreet (such as number of accelerations per second, number of fetal movements per second) and continuous (such as mean value of long term variability) numerical features.

Categorical features can also take on numerical values. For example, 1 to 12 represents months of the year, 1 and 0 indicates male and female. Still, these values do not contain mathematical implication.

Among the two classification algorithms, naive Bayes and SVM, which we learned about previously, the naive Bayes classifier works for both numerical and categorical features as likelihoods or are calculated in the same way, while SVM requires features to be numerical in order to compute margins.

Now if we think of predicting click or not click with naive Bayes, and try to explain the model to our advertiser clients, our clients will find it hard to understand the prior likelihood of individual attributes and their multiplication. Is there a classifier that is easy to interpret, explain to clients, and also able to handle categorical data?

Decision tree!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset