Corresponding machine learning algorithms – linear and logistic regression

Notice that a criterion table tends to use nice, whole numbers that are easy to add. Obviously, this is so the criteria are convenient for physicians to use while seeing patients. What would happen if we could somehow determine the optimal point values for each factor, as well as the optimal threshold? Remarkably, the machine learning method called logistic regression does just this.

Logistic regression is a popular statistical machine learning algorithm that is commonly used for binary classification tasks. It is a type of model known as a generalized linear model.

To understand logistic regression, we must first understand linear regression. In linear regression, the ith output variable (y-hat) is modeled as a weighted sum of the p individual predictor variables, xi:

The weights (beta) (also known as coefficients) of the variables can be determined by the following equation:

Logistic regression is like linear regression, except that it applies a transformation to the output variable that limits its range to be between 0 and 1. Therefore, it is well-suited to model probabilities of a positive response in classification tasks, since probabilities must also be between 0 and 1.

Logistic regression has many practical advantages. First of all, it is an intuitively simple model that is easy to understand and explain. Understanding its mechanics does not require much advanced mathematics beyond high school statistics, and can easily be explained to both technical and nontechnical stakeholders on a project.

Second, logistic regression is not computationally intensive, in terms of time or memory. The coefficients are simply a collection of numbers that is as long as the list of predictors, and its determination only involves several matrix multiplications (see the preceding second equation for an example). One caveat to this is that the matrices may be quite large when dealing with very large datasets (for example, billions of data points), but this is true of most machine learning models.

Third, logistic regression does not require much preprocessing (for example, centering or scaling) of the variables (although transformations that move predictors toward a normal distribution can increase performance). As long as the variables are in a numeric format, that is enough to get started with logistic regression.

Finally, logistic regression, especially when coupled with regularization techniques such as lasso regularization, can have reasonably strong performance in making predictions.

However, in today’s era of fast and powerful computing, logistic regression has largely been superseded by other algorithms that are more powerful, and typically more accurate. This is because logistic regression makes many major assumptions about the data and the modeling task:

  • It assumes that every predictor has a linear relationship with the outcome variable. This is obviously not the case in most datasets. In other words, logistic regression is not strong at modeling nonlinearities in the data.
  • It assumes that all of the predictors are independent of one another. Again, this is usually not the case, for example, two or more variables may interact to affect the prediction in a way that is more than just the linear sum of each variable. This can be partially remedied by adding products of predictors as interaction terms in the model, but choosing which interactions to model is not an easy task.
  • It is highly and adversely sensitive to multiply correlated predictor variables. In the presence of such data, logistic regression may cause overfitting. To overcome this, there are variable selection methods, such as forward step-wise logistic regression, backward step-wise logistic regression, and best subset logistic regression, but these algorithms are imprecise and/or time-intensive.

Finally, logistic regression is not robust to missing data, like some classifiers are (for example, Naive Bayes).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset