Types of messages that can be categorized by ML

We need to be a little rigorous in our definition of the kinds of message-based log lines that are being considered here. What we are not considering are log lines/events/documents that are completely free-form and likely the result of human creation (emails, tweets, comments, and so on). These kinds of messages are too arbitrary and variable in their construction and content.

Instead, we are focused on machine-generated messages that are obviously emitted when an application encounters different situations or exceptions, thus constraining their construction and content to a relatively discrete set of possibilities (understanding that there may indeed be some variable aspects of the message). For example, let's look at the following few lines of an application log:

18/05/2016 15:16:00 S ACME6 DB Not Updated [Master] Table 18/05/2016 15:16:00 S ACME6 REC Not INSERTED [DB TRAN] Table 18/05/2016 15:16:07 S ACME6 Using: 10.16.1.63!svc_prod#uid=demo;pwd=demo 18/05/2016 15:16:07 S ACME6 Opening Database = DRIVER={SQL Server};SERVER=10.16.1.63;network=dbmssocn;address=10.16.1.63,1433;DATABASE=svc_prod;uid=demo;pwd=demo;AnsiNPW=No 18/05/2016 15:16:29 S ACME6 DBMS ERROR : db=10.16.1.63!svc_prod#uid=demo;pwd=demo Err=-11 [Microsoft][ODBC SQL Server Driver][TCP/IP Sockets]General network error. Check your network documentation. 

Here, we can see that there is a variety of messages with different text in each, but there is some structure here. After the date/time stamp and the server name from which the message originates (here ACME6), there is the actual meat of the message, where the application is informing the outside world what is happening at that moment—whether something is being tried or errors are occurring.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset