The field of Artificial Intelligence (AI) is not new, and if we remember thirty years ago when we studied AI, except for robotics, there was very little understanding of the future this field held back then. Now, especially in the last decade, there has been a considerable growth of interest in Artificial Intelligence and machine learning. In the broadest sense, these fields aim to 'discover and learn something useful' about the environment. The gathered information leads to the discovery of new algorithms, which then leads to the question, "how to process high-dimensional data and deal with uncertainty"?
Machine learning aims to generate classifying expressions that are simple enough to follow by humans. They must mimic human reasoning sufficiently to provide insights into the decision process. Similar to statistical approaches, background knowledge may be exploited in the development phase. Statistical learning plays a key role in many areas of science, and the science of learning plays a key role in the fields of statistics, data mining, and artificial intelligence, which intersect with areas of engineering and other disciplines.
The difference between statistical and machine learning is that statistics emphasizes inference, whereas machine learning emphasizes prediction. When one applies statistics, the general approach is to infer the process by which data was generated. For machine learning, one would want to know how to predict the future characteristics of the data with respect to some variable. There is a lot of overlap between statistical learning and machine learning, and often one side of the experts argues one way versus the other. Let's leave this debate to the experts and select a few areas to discuss in this chapter. Later in the following chapter, there will be elaborate examples of machine learning. Here are some of the algorithms:
The algorithms of machine learning are broadly categorized as supervised learning, unsupervised learning, reinforced learning, and deep learning. The supervised learning method of classification is where the test data is labeled, and like a teacher, it gives the classes supervision. Unsupervised learning does not have any labeled training data, whereas supervised learning has completely labeled training data. Semisupervised learning falls between supervised and unsupervised learning. This also makes use of the unlabeled data for training.
As the context of this book is data visualization, we will only discuss a few algorithms in the following sections.
The first machine learning algorithm that we will look at is k-nearest neighbors (k-NN). k-NN does not build the model from the training data. It compares a new piece of data without a label to every piece of existing data. Then, take the most similar pieces of data (the nearest neighbors) and view their labels. Now, look at the top k most similar pieces of data from the known dataset (k is an integer and is usually less than 20). The following code demonstrates k-nearest neighbors plot:
from numpy import random,argsort,sqrt from pylab import plot,show import matplotlib.pyplot as plt def knn_search(x, data, K): """ k nearest neighbors """ ndata = data.shape[1] K = K if K < ndata else ndata # euclidean distances from the other points sqd = sqrt(((data - x[:,:ndata])**2).sum(axis=0)) idx = argsort(sqd) # sorting # return the indexes of K nearest neighbors return idx[:K] data = random.rand(2,200) # random dataset x = random.rand(2,1) # query point neig_idx = knn_search(x,data,10) plt.figure(figsize=(12,12)) # plotting the data and the input point plot(data[0,:],data[1,:],'o, x[0,0],x[1,0],'o', color='#9a88a1', markersize=20) # highlighting the neighbors plot(data[0,neig_idx],data[1,neig_idx],'o', markerfacecolor='#BBE4B4',markersize=22,markeredgewidth=1) show()
The approach to k-Nearest Neighbors is as follows:
In order to test out a classifier, you can start with some known data so that you can hide the answer from the classifier and ask the classifier for its best guess.
Regression is a statistical process to estimate the relationships among variables. More specifically, regression helps you understand how the typical value of the dependent variable changes when any one of the independent variables is varied.
Linear regression is the oldest type of regression that can apply interpolation, but it is not suitable for predictive analytics. This kind of regression is sensitive to outliers and cross-correlations.
Bayesian regression is a kind of penalized estimator and is more flexible and stable than traditional linear regression. It assumes that you have some prior knowledge about the regression coefficients, and statistical analysis is applicable in the context of the Bayesian inference.
We will discuss a set of methods in which the target value (y) is expected to be a linear combination of some input variables (x1, x2, and … xn). In other words, representing the target values using notations is as follows:
Now, let's take a look at the Bayesian linear regression model. A logical question one may ask is "why Bayesian?" The answer being:
First, let's take a look at a graphical model for linear regression. In this model, let's say we are given data values—D = ((x1, y1), (x2, y2), … (xn, yn)) —and our goal is to model this data and come up with a function, as shown in the following equation:
Here, w is a weight vector and each Yi is normally distributed, as shown in the preceding equation. Yi are random variables, and with a new variable x to condition each of the random variable Yi = yi from the data, we can predict the corresponding y for the new variable x, as shown in the following code:
import numpy as np import matplotlib.pyplot as plt from scipy import stats from sklearn.linear_model import BayesianRidge from sklearn.linear_model import LinearRegression np.random.seed(0) n_samples, n_features = 200, 200 X = np.random.randn(n_samples, n_features) # Gaussian data # Create weights with a precision of 4. theta = 4. w = np.zeros(n_features) # Only keep 8 weights of interest relevant_features = np.random.randint(0, n_features, 8) for i in relevant_features: w[i] = stats.norm.rvs(loc=0, scale=1. / np.sqrt(theta)) alpha_ = 50. noise = stats.norm.rvs(loc=0, scale=1. / np.sqrt(alpha_), size=n_samples) y = np.dot(X, w) + noise # Fit the Bayesian Ridge Regression clf = BayesianRidge(compute_score=True) clf.fit(X, y) # Plot weights and estimated and histogram of weights plt.figure(figsize=(11,10)) plt.title("Weights of the model", fontsize=18) plt.plot(clf.coef_, 'b-', label="Bayesian Ridge estimate") plt.plot(w, 'g-', label="Training Set Accuracy") plt.xlabel("Features", fontsize=16) plt.ylabel("Values of the weights", fontsize=16) plt.legend(loc="best", prop=dict(size=12)) plt.figure(figsize=(11,10)) plt.title("Histogram of the weights", fontsize=18) plt.hist(clf.coef_, bins=n_features, log=True) plt.plot(clf.coef_[relevant_features], 5 * np.ones(len(relevant_features)), 'ro', label="Relevant features") plt.ylabel("Features", fontsize=16) plt.xlabel("Values of the weights", fontsize=16) plt.legend(loc="lower left") plt.show()