Forecast results

Now that we have run a forecast, we can look in more depth at the results that are generated by the forecasting process. By the way, we can view the results of a previously created forecast at any time in the UI via one of two methods. You can click the Forecast button in the Single Metric Viewer to reveal a list of Previous Forecasts, like so:

Alternatively, you can view them in the Job Management page under the Forecasts tab for that job:

Forecast results built in Kibana have a default lifespan of 14 days. After that, the forecast results are deleted permanently. If a different expiration duration is desired, then the forecast will have to be invoked via the _forecast API endpoint, which will be discussed later, but is documented at https://www.elastic.co/guide/en/elasticsearch/reference/current/ml-forecast.html.

In either case, clicking on the View icon will bring the user to the Single Metric Viewer. Notice that when you mouse over the forecast data points in the UI, the popup display will list three key pieces of information about the data point, the prediction value, the upper bound, and the lower bound value:

Recall that the upper and lower bounds define a range of 95th percentile confidence. The prediction value is the value with the highest likelihood (probability). These three key values are stored in the .ml-anomalies-* results indices with the following names:

  • forecast_prediction
  • forecast_upper
  • forecast_lower

We can query the .ml-anomalies-* indices to locate this exact point in time (remembering that the dates are stored in epoch time). Therefore, let's say we are executing the following query in the Dev Tools Console:

GET .ml-anomalies-*/_search
{
"query": {
"bool": {
"filter": [
{"term": {"timestamp": "1488808800000"}},
{"term": {"result_type": "model_forecast"}},
{"term": {"job_id": "a_forecast_example"}}
]
}
}
}

The output would be as follows:

{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.0,
"hits" : [
{
"_index" : ".ml-anomalies-shared",
"_type" : "doc",
"_id" : "a_forecast_example_model_forecast_i2DxbGgBITRq2rXM21p4_1488808800000_900_0_961_0",
"_score" : 0.0,
"_source" : {
"job_id" : "a_forecast_example",
"forecast_id" : "i2DxbGgBITRq2rXM21p4",
"result_type" : "model_forecast",
"bucket_span" : 900,
"detector_index" : 0,
"timestamp" : 1488808800000,
"model_feature" : "'bucket sum by person'",
"forecast_lower" : 11315.739312779506,
"forecast_upper" : 23080.83486433322,
"forecast_prediction" : 17198.287088556364
}
}
]
}
}

Note the unique forecast_id. If multiple forecasts were created spanning this time frame, there would be more than one result with different IDs.

These values match what we saw in the ML Single Metric Viewer in Kibana (with slight mathematical rounding). This type of query can be mapped to one of the use cases that we mentioned at the beginning of this chapter a value-focused inquiry (we give a date, and we ask for the value). Here, we asked for the most probable value of the time series for a particular time in the future.

To satisfy the time-focused inquiry, we need to re-orient the query a little to ask it to return the date (or dates) on which the predicted values meet certain criteria. To mix it up a little, we'll submit the query using Elastic SQL:

POST /_xpack/sql?format=txt
{
"query": "SELECT timestamp FROM ".ml-anomalies-*" WHERE job_id='a_forecast_example' AND result_type='model_forecast' AND forecast_prediction>'17700' ORDER BY timestamp DESC"
}

Here, we are asking if there are any times in which the predicted value exceeds our limit of the value of 17,700. The response is as follows:

      timestamp        
------------------------
2017-03-06T14:45:00.000Z

In other words, we may breach the threshold of 17,700 on March 6th (5 days from now in our fictitious example of today being March 1st, 2017) at 2:45 P.M. GMT, which is 9:45 A.M. in the Eastern time zone of the United States. This matches what is seen in the Kibana UI (which is localized to the East Coast GMT-5 time zone):

Your results might vary slightly in your time zone (since we chose an end time of the analysis to be relative to our local time zone). Thus, you may have analyzed a few hours more or less data than what was done in this example. Your prediction model could be slightly different and therefore your highest predicted value might be slightly different.

This approach could be useful for capacity planning, where you could ask something like "within the next 10 days, will my capacity exceed 80%?"

If we want to see how well ML's forecasting did compared to the actual next ten days of the dataset (remember, the ML job's models haven't yet actually seen those days), we can return to the Job Management page and start the datafeed of the job to continue on and analyze the remainder of the data. To do so, click on the Start datafeed link from the menu on the right-hand side:

Once the dialog comes up, set the Search start time to Continue from 2017-02-38 23:45:00 (or whatever it says it is for your local time zone) and specify the Search end time to be March 11th, 2017 at 12:00 AM:

Once you have done this, return to the Single Metric Viewer for the job, ensure that you are viewing the correct range of time with the Kibana time picker, and click on the Forecast button to view the previously created forecast, as described earlier in this chapter. You will now be able to see the forecast values superimposed over the actual values from the data:

As described earlier in this chapter, there will be a slight discrepancy between the Elastic ML prediction of the data and the actual value that arrives in the future. This is because the predictions are probabilistic, and with probability comes a certain level of uncertainty. However, this does not diminish the usefulness of the forecasts. Combined with the proactive alerting of Watcher (as described in Chapter 6, Alerting on ML Analysis), we could have been alerted to the possibility of a breach. This proactive notification is especially useful when users cannot track hundreds or thousands of entities individually. Multi-metric forecasting will allow us to track those entities automatically.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset