Practical machine learning problems

What does machine learning really mean? We already saw some convincing definitions of this term as well as the meaning of the term learning at the very beginning of this chapter. However, the reality is machine learning itself is defined by the problems to be resolved. In this section, we will first emphasize the machine learning classes and then we will list some well-known and popularly-used examples of real world machine learning problems. The typical classes include classification, clustering, rule extraction, and regression, which will all be discussed.

In addition, we will also discuss those problems based on the main taxonomy of standard machine learning problems. This is important, since knowing the type of problems we could face allows us to think about the data we need. Another important fact is that before knowing some practical machine learning problems, you might face difficulties in having an idea about developing your machine learning applications. In other words, to know the problem we need to know the data in the very first place. Therefore, the types of algorithm and their optimality to be addressed will be discussed throughout this chapter; data manipulation, however, will be discussed to dig-down the problems in Chapter 3, Understanding the Problem by Understanding the Data.

Machine learning classes

The problem classes we mentioned above are standards for most of the problems we refer to in everyday life while doing and applying machine learning techniques. However, knowing only the ML classes is not enough we also need to know what type of problems machines are learning, since you will find many problems that are simply problem solving that does not help a ML model or agent to learn at all.

When you think a problem is a machine learning problem, more technically, you are thinking of a decision problem that needs to be modeled from data that could be termed as a machine learning problem. In other words, as a data scientist or human expert, if you have enough time to answer a particular question by knowing the available dataset, you can more or less apply a suitable machine learning problem. Therefore, we can assume that a solvable problem using some ML algorithms would have mainly two parts - the data itself, which could be used to point to specific observations of the problem, and secondly the quantitative measurement of the quality of an available solution. Once you have succeeded in identifing a problem as an ML problem, you would probably be able to think about what types of problems you could formulate with it easily, or the type of aftermath your client will be asking for, or what sorts of requirements are to be satisfied. As already stated in the above section, the more frequently used machine learning classes are: classification, clustering, regression, and rule extraction. We will now provide a short overview of each class.

Classification and clustering

If the experimental dataset is labeled, it means a class has been assigned to it already. For instance, spam/non-spam during spam e-mail detection or fraud/non-fraud during credit card fraud identification. However, if the dataset based on which the fundamental decision will be made or modeled is unlabeled, new labels need to be made manually or algorithmically. This might be difficult, and can be thought of a judgment problem. On the contrary, sculpting the differences or resemblances between several groups might be computationally harder.

Clustering, on the other hand, handles the data that is not labeled or un-labeled. However, it still can be divided into groups based on similarity and other measures of natural structure in the data you have. Organizing pictures from a digital album by faces only without names could be an example, where the human users like us have to assign names to groups manually. Again, the same computational complexity might arise to label multiple image files manually; we will provide some examples in later chapters of how Spark provides several APIs to solve these issues.

Rule extraction and regression

From the given dataset, propositional rules can be generated by means of antecedent and consequent in the if...then style that defines the behavior of a machine learning agent. This type of rule generation technique is commonly referred to as rule extraction. You might be wondering if such rules might exist, however, they are typically not directed. That means the methods used to discover statistically meaningful or statistically significant relationships between attributes in your data.

An example of rule extraction is the mining association rules between items from business oriented transactional databases. Non-technically, a practical example could be the discovery of the relationship or association between the purchase of beer and diapers, which is illustrative of the desire and opportunity for the customers. However, some situation, might arise where some predictions out of the rules or data are not necessarily involved directly.

Now let's talk about the regression where the data is labeled with a real value. To be more exact, some floating point value rather than having labels in the data. The easiest way to understand an example would be time series data similar to the price of a stock or currency that changes over time. In these types of data, the regression task is to make a prediction for new and unpredicted data by some regression modeling techniques.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset