This section briefly introduces the different optimization algorithms that can be applied to minimize the loss function, with or without a penalty term. These algorithms are described in more detail in the Summary of optimization techniques section in the Appendix A, Basic Concepts.
First, let's define the least squares problem. The minimization of the loss function consists of nullifying the first order derivatives, which in turn generates a system of D equations (also known as the gradient equations), D being the number of regression weights (parameters). The weights are iteratively computed by solving the system of equations using a numerical optimization algorithm.
M10: The definition of the least squares-based loss function for residual ri, weights w, a model f, input data xi, and expected values yi is as follows:
M10: The generation of gradient equations with a Jacobian J matrix (refer to the Mathematics section in the Appendix A, Basic Concepts) after minimization of the loss function L is defined as follows:
M11: The iterative approximation using the Taylor series on the model f for k iterations on the computation of weights w is defined as follows:
The logistic regression is a nonlinear function. Therefore, it requires the nonlinear minimization of the sum of least squares. The optimization algorithms for the nonlinear least squares problems can be divided into two categories: