Influencer results

Yet another lens by which to view the results is via influencers. Viewing the results this way allows us to answer the question "what were the most unusual entities in my ML job and when were they unusual?" To understand the structure and content of influencer-level results, let's query the results for a particular ML job. We will start by looking at the results for a job split on a partition field. That field will also be the sole influencer that was chosen in the job configuration:

GET .ml-anomalies-*/_search
{
    "query": {
            "bool": {
              "filter": [
                  { "range" : { "timestamp" : { "gte": "now-2y" } } },
                  { "term" :  { "job_id" : "farequote" } },
                  { "term" :  { "result_type" : "influencer" } },
                  { "range" : { "influencer_score" : {"gte" : "98"}}}
              ]
            }
    }
}

Here, the query is asking for any influencer results that have existed over the last two years where the influencer_score is greater than or equal to 98. The result looks as follows:

{
  …
    "hits": {
    "total": 1,
    "max_score": 0,
    "hits": [
      {
        "_index": ".ml-anomalies-shared",
        "_type": "doc",
        "_id": "farequote_influencer_1486656000000_900_airline_64556_3",
        "_score": 0,
        "_source": {
          "job_id": "farequote",
          "result_type": "influencer",
          "influencer_field_name": "airline",
          "influencer_field_value": "AAL",
          "airline": "AAL",
          "influencer_score": 98.56065708451416,
          "initial_influencer_score": 98.56065708451416,
          "probability": 6.252543460836487e-19,
          "bucket_span": 900,
          "is_interim": false,
          "timestamp": 1486656000000
        }
      }
    ]
  }
…

Let's look at some key portions of the output:

timestamp: The timestamp of the leading edge of the time bucket, inside of which this entity was anomalous.
influencer_score: The current normalized score of the influencer, based upon the range of the influencers seen over the entirety of the job. The value of this score may fluctuate over time as new data is processed by the job and new influencers are found.
initial_influencer_score: The normalized score of the influencers from when that bucket was first analyzed by the analytics. This score, unlike the influencer_score, will not change as more data is analyzed.
influencer_field_name: The name of the influencer field being described here, in case there are multiple influencers in this anomaly.
influencer_field_value: The value of the influencer field being described here.
is_interim: A flag the signifies whether or not the bucket is finalized or whether the bucket is still waiting for all of the data within the bucket span to be received. This field is relevant for ongoing jobs that are operating in real time. For certain types of analysis, there could be interim results, despite the fact that not all of the data for the bucket has been seen yet.

In this case, we can see that the data associated with a particular airline during this bucket significantly contributed to the formation of the anomaly, since the influencer_score is high.

In summary, there is a fair amount of detail, at different levels of abstraction, available in the ML results indices. This will obviously be useful when it comes to building alerts with different levels of detail.

Table of Contents for Influencer results

Create new playlist

Sign In

Sign Up

Table of Contents for
Influencer results