Step 3 – building the model

As mentioned previously, one of the problems of the proposed algorithm is the lack of information about the dataset—we do not know the normal operation of the airplane. For this reason, we have implemented a simple method that identifies when a data point falls outside the standard deviation.

We can improve our algorithm using an unsupervised ML algorithm. A particularly interesting algorithm is the One-Class Support Vector Machine (OCSVM). The OCSVM splits data points into two different classes—good or bad. It requires a training dataset to build the boundaries. We can use the full dataset for the training phase and then evaluate each point to see if it falls into the good class or not. Our dataset has to be normalized, because it reports a reasonability trend. To do this, we can build an autoregressive integrated moving average (ARIMA) model and then analyze the residuals. This trick is a common way to remove trends or seasonality from a dataset.

The following code defines a function to extract residuals from the dataset and to apply the OCSVM:

from statsmodels.tsa.arima_model import ARIMA
from matplotlib import pyplot

def ARIMA_residuals(series):
    # fit model
    model = ARIMA(series, order=(5,1,3))
    model_fit = model.fit(disp=0)
    print(model_fit.summary())
    # plot residual errors
    residuals = pd.DataFrame(model_fit.resid)
    return residuals.T

def search_anomalies_OCSVM(y):
    X_train=y
    clf = svm.OneClassSVM(nu=0.005, kernel="rbf", gamma=0.01)
    clf.fit(X_train)
    anomalies=[]
    X_test=y
    y_pred_test = clf.predict(X_test)
    for i in range(0,len(y)):
        if(y_pred_test[i]<0):
             anomalies.append([[i, X_test[i][0]],[i, X_test[i][1]]])
 
    return {'anomalies': anomalies}

Now, we need only to bring these two functions together in a single main function:

Y=np.vstack(( ARIMA_residuals(Y_Param3_1), ARIMA_residuals(Y_Param1_4)) ).T
events_Param=search_anomalies_OCSVM(Y)
A_Param3_1 = [x[0] for x in events_Param['anomalies']]
A_Param1_4 = [x[1] for x in events_Param['anomalies']]

The first line of the preceding code computes ARIMA's residuals and puts them in an array. We then apply the OCSVM clustering algorithm to identify potential anomalies. The last two lines of the code split the result into two different arrays, ready to be plotted.

The basic idea is to remove the global trend of the data points, converting the dataset into a cluster of points around the 0-axis and then identifying the anomalies using a clustering algorithm such as OCSVM. The following graph shows the scatter plot of ARIMA's residuals and the outliers identified by OCSVM:

Scatter plot of ARIMA's residuals and outliers

In the time domain, the result is as follows:

Anomalies identified by ARIMA and OCSVM

Table of Contents for Step 3&#xA0;&#x2013;&#xA0;building the model

Create new playlist

Sign In

Sign Up

Table of Contents for
Step 3 – building the model