When you start managing the application, the log collection and analysis are two of the important routines to keep tracking the application's status.
However, there are some difficulties when the application is managed by Docker/Kubernetes; because the log files are inside the container, it is not easy to access them from outside the container. In addition, if the application has many pods by the replication controller, it will also be difficult to trace or identify in which pod the issue that has happened.
One way to overcome this difficulty is to prepare a centralized log collection platform that accumulates and preserves the application log. This recipe describes one of the popular log collection platforms ELK (Elasticsearch, Logstash, and Kibana).
First, we need to prepare the Elasticsearch server at the beginning. Then, the application will send a log to Elasticsearch using Logstash. We will visualize the analysis result using Kibana.
Elasticsearch (https://www.elastic.co/products/elasticsearch) is one of the popular text indexes and analytic engines. There are some examples YAML files that are provided by the Kubernetes source file; let's download it using the curl
command to set up Elasticsearch:
An example YAML file is located on GitHub at https://github.com/kubernetes/kubernetes/tree/master/examples/elasticsearch.
# curl -L -O https://github.com/kubernetes/kubernetes/releases/download/v1.1.4/kubernetes.tar.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 593 0 593 0 0 1798 0 --:--:-- --:--:-- --:--:-- 1802 100 181M 100 181M 0 0 64.4M 0 0:00:02 0:00:02 --:--:-- 75.5M # tar zxf kubernetes.tar.gz # cd kubernetes/examples/elasticsearch/ # ls es-rc.yaml es-svc.yaml production_cluster README.md service-account.yaml
Create ServiceAccount (service-account.yaml
) and then create the Elasticsearch replication controller (es-rc.yaml
) and service (es-svc.yaml
) as follows:
# kubectl create -f service-account.yaml serviceaccount "elasticsearch" created //As of Kubernetes 1.1.4, it causes validation error //therefore append --validate=false option # kubectl create -f es-rc.yaml --validate=false replicationcontroller "es" created # kubectl create -f es-svc.yaml service elasticsearch" created
Then, you can access the Elasticsearch interface via the Kubernetes service as follows:
//Elasticsearch is open by 192.168.45.152 in this example # kubectl get service NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE elasticsearch 192.168.45.152 9200/TCP,9300/TCP component=elasticsearch 9s kubernetes 192.168.0.1 <none> 443/TCP <none> 110d //access to TCP port 9200 # curl http://192.168.45.152:9200/ { "status" : 200, "name" : "Wallflower", "cluster_name" : "myesdb", "version" : { "number" : "1.7.1", "build_hash" : "b88f43fc40b0bcd7f173a1f9ee2e97816de80b19", "build_timestamp" : "2015-07-29T09:54:16Z", "build_snapshot" : false, "lucene_version" : "4.10.4" }, "tagline" : "You Know, for Search" }
Now, get ready to send an application log to Elasticsearch.
Let's use a sample application, which was introduced in the Moving monolithic to microservices recipe in Chapter 5, Building a Continuous Delivery Pipeline. Prepare a Python Flask program as follows:
# cat entry.py from flask import Flask, request app = Flask(__name__) @app.route("/") def hello(): return "Hello World!" @app.route("/addition/<int:x>/<int:y>") def add(x, y): return "%d" % (x+y) if __name__ == "__main__": app.run(host='0.0.0.0')
Use this application to send a log to Elasticsearch.
Send an application log to Elasticsearch; using Logstash (https://www.elastic.co/products/logstash) is the easiest way, because it converts from a plain text format to the Elasticsearch (JSON) format.
Logstash needs a configuration file that specifies the Elasticsearch IP address and port number. In this recipe, Elasticsearch is managed by Kubernetes service; therefore, the IP address and port number can be found using the environment variable as follows:
Item |
Environment Variable |
Example |
---|---|---|
Elasticsearch IP address |
|
|
Elasticsearch port number |
|
|
However, the Logstash configuration file doesn't support an environment variable directly. Therefore, the Logstash configuration file uses the placeholder as _ES_IP_
and _ES_PORT_
as follows:
# cat logstash.conf.temp input { stdin {} } filter { grok { match => { "message" => "%{IPORHOST:clientip} %{HTTPDUSER:ident} %{USER:auth} [%{DATA:timestamp}] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-)" } } } output { elasticsearch { hosts => ["_ES_IP_:_ES_PORT_"] index => "mycalc-access" } stdout { codec => rubydebug } }
The startup script will read an environment variable, and then replace the placeholder to set the real IP and port number, as follows:
#!/bin/sh TEMPLATE="logstash.conf.temp" LOGSTASH="logstash-2.2.2/bin/logstash" cat $TEMPLATE | sed "s/_ES_IP_/$ELASTICSEARCH_SERVICE_HOST/g" | sed "s/_ES_PORT_/$ELASTICSEARCH_SERVICE_PORT/g" > logstash.conf python entry.py 2>&1 | $LOGSTASH -f logstash.conf
Finally, prepare Dockerfile as follows to build a sample application:
FROM ubuntu:14.04 # Update packages RUN apt-get update -y # Install Python Setuptools RUN apt-get install -y python-setuptools git telnet curl openjdk-7-jre # Install pip RUN easy_install pip # Bundle app source ADD . /src WORKDIR /src # Download LogStash RUN curl -L -O https://download.elastic.co/logstash/logstash/logstash-2.2.2.tar.gz RUN tar -zxf logstash-2.2.2.tar.gz # Add and install Python modules RUN pip install Flask # Expose EXPOSE 5000 # Run CMD ["./startup.sh"]
Let's build a sample application using the docker build
command:
# ls Dockerfile entry.py logstash.conf.temp startup.sh # docker build -t hidetosaito/my-calc-elk . Sending build context to Docker daemon 5.12 kB Step 1 : FROM ubuntu:14.04 ---> 1a094f2972de Step 2 : RUN apt-get update -y ---> Using cache ---> 40ff7cc39c20 Step 3 : RUN apt-get install -y python-setuptools git telnet curl openjdk-7-jre ---> Running in 72df97dcbb9a (skip…) Step 11 : CMD ./startup.sh ---> Running in 642de424ee7b ---> 09f693436005 Removing intermediate container 642de424ee7b Successfully built 09f693436005 //upload to Docker Hub using your Docker account # docker login Username: hidetosaito Password: Email: [email protected] WARNING: login credentials saved in /root/.docker/config.json Login Succeeded //push to Docker Hub # docker push hidetosaito/my-calc-elk The push refers to a repository [docker.io/hidetosaito/my-calc-elk] (len: 1) 09f693436005: Pushed b4ea761f068a: Pushed (skip…) c3eb196f68a8: Image already exists latest: digest: sha256:45c203d6c40398a988d250357f85f1b5ba7b14ae73d449b3ca64b562544cf1d2 size: 22268
Now, use this application by Kubernetes to send a log to Elasticsearch. First, prepare the YAML file to load this application using the replication controller and service as follows:
# cat my-calc-elk.yaml apiVersion: v1 kind: ReplicationController metadata: name: my-calc-elk-rc spec: replicas: 2 selector: app: my-calc-elk template: metadata: labels: app: my-calc-elk spec: containers: - name: my-calc-elk image: hidetosaito/my-calc-elk --- apiVersion: v1 kind: Service metadata: name: my-calc-elk-service spec: ports: - protocol: TCP port: 5000 type: ClusterIP selector: app: my-calc-elk
Use the kubectl
command to create the replication controller and service as follows:
# kubectl create -f my-calc-elk.yaml replicationcontroller "my-calc-elk-rc" created service "my-calc-elk-service" created
Check the Kubernetes service to find an IP address for this application as follows. It indicates 192.168.121.63
:
# kubectl get service NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE elasticsearch 192.168.101.143 9200/TCP,9300/TCP component=elasticsearch 15h kubernetes 192.168.0.1 <none> 443/TCP <none> 19h my-calc-elk-service 192.168.121.63 <none> 5000/TCP app=my-calc-elk 39s
Let's access this application using the curl
command as follows:
# curl http://192.168.121.63:5000/ Hello World! # curl http://192.168.121.63:5000/addition/3/5 8
Kibana (https://www.elastic.co/products/kibana) is a visualization tool for Elasticsearch. Download Kibana, and specify the Elasticsearch IP address, and port number, to launch Kibana:
//Download Kibana 4.1.6 # curl -O https://download.elastic.co/kibana/kibana/kibana-4.1.6-linux-x64.tar.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 17.7M 100 17.7M 0 0 21.1M 0 --:--:-- --:--:-- --:--:-- 21.1M //unarchive # tar -zxf kibana-4.1.6-linux-x64.tar.gz //Find Elasticsearch IP address # kubectl get services NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE elasticsearch 192.168.101.143 9200/TCP,9300/TCP component=elasticsearch 19h kubernetes 192.168.0.1 <none> 443/TCP <none> 23h //specify Elasticsearch IP address # sed -i -e "s/localhost/192.168.101.143/g" kibana-4.1.6-linux-x64/config/kibana.yml //launch Kibana # kibana-4.1.6-linux-x64/bin/kibana
Then, you will see the application log. Create a chart as follows:
This cookbook doesn't cover how to configure Kibana; please visit the official page to learn about the Kibana configuration via https://www.elastic.co/products/kibana.
Now, the application log is captured by Logstash; it is converted into the JSON format and then sent to Elasticsearch.
Since Logstash is bundled with an application container, there are no problems when the replication controller increases the number of replicas (pods). It captures all the application logs with no configuration changes.
All the logs will be stored in Elasticsearch as follows:
This recipe covers how to integrate to the ELK stack. The centralized log collection platform is important for the Kubernetes environment. The container is easy to launch, and is destroyed by the scheduler, but it is not easy to know which node runs which pod. Check out the following recipes: