In most of the machine learning models, we will talk about how we start with classification. Classification is a form of supervised learning where the data is used to pick a name, value, or category. For example, using a neural network to scan images to find pictures of a shoe. In this field, there are two variants of classification:
- Binomial: If you are picking between one of two categories (coffee, tea)
- Multi-class: If there are more than two options
We use the Stanford linear classifier tool to help understand the concept of hyperplanes (http://vision.stanford.edu/teaching/cs231n-demos/linear-classify/). The following diagram shows a trained learning system's attempts to find the best hyperplane to divide colored balls. We can see that after several thousand iterations, the division is somewhat optimal, but there still are issues with the top-right region, where the corresponding hyperplane includes a ball that belongs to the top hyperplane. Shown below is an example of a less than optimal classification.
Here, hyperplanes are used to create artificial segments. The top right shows a single ball that should be classified with the other two balls at the top but was classified to belong to the bottom-right set.
Notice in the preceding example from Stanford that the hyperplane is a straight line. This is called a linear classifier and includes such constructs as support vector machines (which attempts to maximize the linearity), and logistic regression (which can be used for binomial class and multi-class fitting). The following graph shows a binomial linear classification of two data sets: circles, and diamonds.
Here, a line attempts to form a hyperplane to divide two distinct regions of variables. Notice the best linear relationship does include errors:
Nonlinear relationships are also common in machine learning, and using a linear model would cause severe error rates. The following graph shows the linear curve fit versus the nonlinear. One issue with the nonlinear model is the tendency to overfit the test series. As we will see later, this has the propensity to make the machine learning tool accurate when executed on the training test data, but useless in the field. Following figure is a comparison of a linear versus nonlinear classifier: