Detecting things that rarely occur

In the context of a stream of temporal information (such as a log file), the notion of something being statistically rare (occurs at a low frequency) is paradoxically both intuitive and hard to understand. If I were asked, for example, to trawl through a log file and find a rare message, I might be tempted to label the first novel message that I saw as a rare one. But what if practically every message was novel? Are they all rare? Or is nothing rare?

In order to define rarity to be useful in the context of a stream of events in time, we need to agree that the declaration of something as being rare must take into account the context in which it exists. If there are lots of other routine things and a small number of unique things, then we can deem the unique things rare. If there are many unique things, then we will deem that nothing is rare.

When applying the rare function in an ML job, there is a requirement to declare which field the rare function is focusing on. This field is then defined in the by_field_name box. So, for example, to find log entries that reference a rare country name, structure your detector similar to this:

This could be handy for finding unexpected geographical access (as in our admins usually log in from the New York and London office almost daily, but never from Moscow!).

When looking at the results from a rarity analysis (such as rare process names running on hosts), you will see the Anomaly Explorer has a little different look to it. For more details, you can refer to the link at https://discuss.elastic.co/t/dec-4th-2018-en-ml-rarity-analysis-with-machine-learning/158979.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset