Anomaly detection with one-class SVC

The design of the one-class SVC is an extension of the binary SVC. The main difference is that a single class contains most of the baseline (or normal) observations. A reference point, known as the SVC origin, replaces the second class. The outliers (or abnormal) observations reside beyond (or outside) the support vector of the single class:

Anomaly detection with one-class SVC

The visualization of the one-class SVC

The outlier observations have a labeled value of -1, while the remaining training sets are labeled +1. In order to create a relevant test, we add four more companies that have drastically cut their dividends (ticker symbols WLT, RGS, MDC, NOK, and GM). The dataset includes the stock prices and financial metrics recorded prior to the cut in dividends.

The implementation of this test case is very similar to the binary SVC driver code, except for the following:

  • The classifier uses the Nu-SVM formulation, OneSVFormulation
  • The labeled data is generated by assigning -1 to companies that have eliminated their dividends and +1 for all other companies

The test is executed against the resources/data/chap8/dividends2.csv dataset. First, we need to define the formulation for the one-class SVM:

class OneSVCFormulation(nu: Double) extends SVMFormulation {
  override def update(param: svm_parameter): Unit = {
    param.svm_type = svm_parameter.ONE_CLASS
    param.nu = nu
  }
}

The test code is similar to the execution code for the binomial SVC. The only difference is the definition of the output labels; -1 for companies eliminating dividends and +1 for all other companies:

val NU = 0.2
val GAMMA = 0.5
val EPS = 1e-3
val NFOLDS = 2

val extractor = relPriceChange :: debtToEquity ::
   dividendCoverage :: cashPerShareToPrice :: epsTrend ::
   dividendTrend :: List[Array[String] =>Double]()

val filter = (x: Double) => if(x == 0) -1.0 else 1.0  //43
val pfnSrc = DataSource(path, true, false, 1) |>
val config = SVMConfig(new OneSVCFormulation(NU),  //44
    new RbfKernel(GAMMA), SVMExecution(EPS, NFOLDS))

for {
  input <- pfnSrc(extractor)
  obs <- getObservations(input)
  svc <- SVM[Double](config, obs, 
             input.last.map(filter(_)).toVector)
} yield {
  show(s"${svc.toString}
accuracy ${svc.accuracy.get}")'
}

The labels or expected data is generated by applying a binary filter to the last dividendTrend field (line 43). The formulation in the configuration has the OneSVCFormulation type (line 44).

The model is generated with the accuracy of 0.821. This level of accuracy should not be a surprise; the outliers (companies that eliminated their dividends) are added to the original dividend .csv file. These outliers differ significantly from the baseline observations (companies who have reduced, maintained, or increased their dividends) in the original input file.

In cases where the labeled observations are available, the one-class support vector machine is an excellent alternative to clustering techniques.

Note

The definition of an anomaly

The results generated by a one-class support vector classifier depend heavily on the subjective definition of an outlier. The test case assumes that the companies that eliminate their dividends have unique characteristics that set them apart and are different even from companies who have cut, maintained, or increased their dividends. There is no guarantee that this assumption is indeed always valid.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset