The first step is to build our simulated data. The following code generates the training dataset. The dataset is made up of two columns, wind and power, an array, X_train, and one column array, y_train.
The y_train contains the value 0 when the relation between wind and power is an anomaly and is marked as 1 using an additional variable:
X_train=[]
y_train=[]
# regular data
X_train.extend( [[x ,wind_turbine_model(x)] for x in range(0,30)] )
y_train.extend( [1 for x in range(0,30)] )
# anomaly data
for x in range(15,30):
X_train.extend([[x, 50 + x*random.random()]])
y_train.extend([0])
To generated the regular data, we used the wind_turbine_model() function. To generate anomaly data, we used the following formula:
power=50 + random * wind
ds = ws.get_default_datastore()
ds.upload(src_dir='./data', target_path='mydata', overwrite=True)
We are now ready to train our model, using the logistic regression method from Scikit-Learn:
logreg = LogisticRegression(C=1.0/args.reg, random_state=0, solver='lbfgs', multi_class='multinomial')
logreg.fit(X_train, y_train)
Finally, we test our model with the five samples, as follows:
test=logreg.predict([[16,wind_turbine_model(16)],[1,wind_turbine_model(1)],[25,wind_turbine_model(25)],[25,50],[18,250]])
print(test)
import numpy as np
# calculate accuracy of the prediction
acc = np.average(test == [1,1,1,0,1])
print('Accuracy is', acc)
We get an accuracy of about 80%. This is quite low, but we aren't going to investigate further into the efficiency of our model. The purpose of this exercise isn't to develop an anomaly detection algorithm, but to show how to train a model with Azure ML.