In this dataset, each document represents a combined Apache log of the following structure:
128.159.122.45 - - [28/Jul/1995:13:31:57 -0400] "GET /ksc.html HTTP/1.0" 200 7280
The preceding log line contains the following information:
- The host making the request, that is, the IP address
- The timestamp of the request
- The request
- The HTTP response status
- The size of the reply in bytes
We'll use Logstash to transform this log file into JSON format, which we can then index into Elasticsearch.