Counting the categories

Now that the messages are going to be categorized by the algorithm described previously, the next part of the process is to do the counting. In this case, we're not going to be counting the log lines (and thus the documents of an Elasticsearch index) themselves; instead, we're going to be counting the occurrence rate of the different categories that are the output of the algorithm. So, for example, given the example log lines in the previous section, if they occurred within the same bucket span, we would have the following output of the categorization algorithm:

mlcategory 1: 2 
mlcategory 2: 1 

In other words, there were two occurrences of the Error writing file on types of messages and one occurrence of the Opening database on host type in the last bucket span interval. It is this information that will ultimately get modeled and determined if found unusual by the ML job, as shown in next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset