Architecture

Filebeat is made up of key components called inputs, harvesters, and spoolers. These components work in unison in order to tail files and allow you to send event data to the specified output. The input is responsible for identifying the list of files to read logs from. The input is configured with one or many file paths, from which it identifies the files to read logs from; it starts a harvester for each file. The harvester reads the contents of the file. In order to send the content to the output, it reads each file, line by line. It also opens and closes the file, This implies that the descriptor is always open when the harvester runs. Once the harvester starts for one file, it sends the read content – also known as the events – to the spooler. The spooler aggregates the events to the configured outputs.

Each instance of Filebeat can be configured with one or more inputs. As of Filebeat 6.0, there are two types of input the Filebeat supports, that is, log and stdin. Later versions of Filebeats started supporting multiple types of input. As of Filebeat 7.0, the list of inputs that are supported is: Log, Stdin, Redis, UDP, Docker, TCP, Syslog, and NetFlow. For example, if the log is the input file, then the input finds all the related files on the drive that match the predefined glob paths, and then the harvester is started for every file. Every input uses its own Go routine to run. If the type is stdin, it reads from standard inputs and if the input type is UDP/TCP, it reads/captures events over UDP/TCP.

Every time Filebeat reads a file, the state of the last read is offset by the harvester, and if the read line is sent to the output, it is maintained in a registry file which is flushed periodically to a disk. If the output (Elasticsearch, Logstash, Kafka, or Redis) is unreachable, it keeps track of the lines that were sent last and continues to read the file after the output becomes reachable. This is done by keeping the state information in memory by each input when the Filebeat is running. If the Filebeat restarts, the state is built by referring to the registry file.

Filebeat will not consider a log line shipped until the output acknowledges the request. Since the state of the delivery of the lines to the configured output is maintained in the registry file, you can safely assume that events will be delivered to the configured outputs at least once and without any data loss:

(Reference: https://www.elastic.co/guide/en/beats/filebeat/7.0/images/filebeat.png)

The location of registry-js is as follows: data/registry for .tar.gz and .zip archives, /var/lib/filebeat/registry for DEB and RPM packages, and C:ProgramDatafilebeat egistry for the Windows .zip file (if Filebeat is installed as a service).

Table of Contents for Architecture

Create new playlist

Sign In

Sign Up

Table of Contents for
Architecture