Logistic regression

Logistic regression optimizes the logit loss function with respect to w:

Logistic regression

Here, y is binary (in this case plus or minus one). While there is no closed-form solution for the error minimization problem like there was in the previous case of linear regression, logistic function is differentiable and allows iterative algorithms that converge very fast.

The gradient is as follows:

Logistic regression

Again, we can quickly concoct a Scala program that uses the gradient to converge to the value, where Logistic regression (we use the MLlib LabeledPoint data structure only for convenience of reading the data):

$ bin/spark-shell 
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_   version 1.6.1-SNAPSHOT
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_40)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
SQL context available as sqlContext.

scala> import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.mllib.linalg.Vector

scala> import org.apache.spark.util._
import org.apache.spark.util._

scala> import org.apache.spark.mllib.util._
import org.apache.spark.mllib.util._

scala> val data = MLUtils.loadLibSVMFile(sc, "data/iris/iris-libsvm.txt")
data: org.apache.spark.rdd.RDD[org.apache.spark.mllib.regression.LabeledPoint] = MapPartitionsRDD[291] at map at MLUtils.scala:112

scala> var w = Vector.random(4)
w: org.apache.spark.util.Vector = (0.9515155226069267, 0.4901713461728122, 0.4308861351586426, 0.8030814804136821)

scala> for (i <- 1.to(10)) println { val gradient = data.map(p => ( - p.label / (1+scala.math.exp(p.label*(Vector(p.features.toDense.values) dot w))) * Vector(p.features.toDense.values) )).reduce(_+_); w -= 0.1 * gradient; w }
(-24.056553839570114, -16.585585503253142, -6.881629923278653, -0.4154730884796032)
(38.56344616042987, 12.134414496746864, 42.178370076721365, 16.344526911520397)
(13.533446160429868, -4.95558550325314, 34.858370076721364, 15.124526911520398)
(-11.496553839570133, -22.045585503253143, 27.538370076721364, 13.9045269115204)
(-4.002010810020908, -18.501520148476196, 32.506256310962314, 15.455945245916512)
(-4.002011353029471, -18.501520429824225, 32.50625615219947, 15.455945209971787)
(-4.002011896036225, -18.501520711171313, 32.50625599343715, 15.455945174027184)
(-4.002012439041171, -18.501520992517463, 32.506255834675365, 15.455945138082699)
(-4.002012982044308, -18.50152127386267, 32.50625567591411, 15.455945102138333)
(-4.002013525045636, -18.501521555206942, 32.506255517153384, 15.455945066194088)

scala> w *= 0.24 / 4
w: org.apache.spark.util.Vector = (-0.24012081150273815, -1.1100912933124165, 1.950375331029203, 0.9273567039716453)

The logistic regression was reduced to only one line of Scala code! The last line was to normalize the weights—only the relative values are important to define the separating plane—to compare them to the one obtained with the MLlib in previous chapter.

The Stochastic Gradient Descent (SGD) algorithm used in the actual implementation is essentially the same gradient descent, but optimized in the following ways:

  • The actual gradient is computed on a subsample of records, which may lead to faster conversion due to less rounding noise and avoid local minima.
  • The step—a fixed 0.1 in our case—is a monotonically decreasing function of the iteration as Logistic regression, which might also lead to better conversion.
  • It incorporates regularization; instead of minimizing just the loss function, you minimize the sum of the loss function, plus some penalty metric, which is a function of model complexity. I will discuss this in the following section.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset