In the previous chapter, we learned about linear regression. We saw that linear regression is one of the most basic models that assumes that there is a linear relationship between a predictor variable and an output variable.
In this chapter, we will be discussing the details of logistic regression. We will be covering the following topics in this chapter:
statsmodel.api
and scikit-learn
modules for doing this.One thing to note about the linear regression model is that the output variable is always a continuous variable. In other words, linear regression is a good choice when one needs to predict continuous numbers. However, what if the output variable is a discrete number. What if we want to classify our records in two or more categories? Can we still extend the assumptions of a linear relationship and try to classify the records?
As it happens, there is a separate regression model that takes care of a situation where the output variable is a binary or categorical variable rather than a continuous variable. This model is called logistic regression. In other words, logistic regression is a variation of linear regression where the output variable is a binary or categorical variable. The two regressions are similar in the sense that they both assume a linear relationship between the predictor and output variables. However, as we will see soon, the output variable needs to undergo some transformation in the case of logistic regression.
A few scenarios where logistic regression can be applied are as follows:
Note how the output variable in both the cases is a binary or categorical variable.
The following table contains a comparison of the two models:
Linear regression |
Logistic regression | |
---|---|---|
Predictor variables |
Continuous numeric/categorical |
Continuous numeric/categorical |
Output variables |
Continuous numeric |
Categorical |
Relationship |
Linear |
Linear (with some transformations) |
Before we delve into implementing and assessing the model, it is of critical importance to understand the mathematics that makes the foundation of the algorithm. Let us try to understand some mathematical concepts that make the backbone of the logistic regression model.