Organizing a Dockerfile

Even though writing a Dockerfile is pretty much the same as composing a build script, there are some more factors that we should consider to build efficient, secure, and stable images. Moreover, a Dockerfile itself is also a document. Keeping it readable makes it easier to manage.

Let's say we have an application stack that consists of application code, a database, and a cache. The initial Dockerfile of our stack could be the following:

FROM ubuntu
ADD . /proj
RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get install -y redis-server python python-pip mysql-server
ADD /proj/db/my.cnf /etc/mysql/my.cnf
ADD /proj/db/redis.conf /etc/redis/redis.conf
ADD https://example.com/otherteam/dep.tgz /tmp/
RUN -zxf /tmp/dep.tgz -C /usr/src
RUN pip install -r /proj/app/requirements.txt
RUN cd /proj/app ; python setup.py
CMD /proj/start-all-service.sh

The first suggestion is to make sure a container is dedicated to one thing and one thing only. This gives our system better transparency since it helps us clarify the boundaries between components in the system. Also, packing unnecessary packages is discouraged, as it increases the image size, which could slow down the time it takes to build, distribute, and launch the image. We'll remove the installation and configuration of both mysql and redis in our Dockerfile in the beginning. Next, the code is moved into the container with ADD ., which means that we're very likely to move the whole code repository into the container. Usually, there're lots of files that aren't directly relevant to the application, including VCS files, CI server configurations, or even build caches, and we probably wouldn't like to pack them into an image. For this reason, it is suggested to use .dockerignore to filter out these files as well. Lastly, using COPY is preferred over ADD in general, unless we want to extract a file in one step. This is because it is easier to predict the outcome when we use COPY. Now our Dockerfile is simpler, as shown in the following code snippet:

FROM ubuntu
ADD proj/app /app
RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get install -y python python-pip
ADD https://example.com/otherteam/dep.tgz /tmp/
RUN tar -zxf /tmp/dep.tgz -C /usr/src
RUN pip install -r /app/requirements.txt
RUN cd /app ; python setup.py
CMD python app.py

While building an image, the Docker engine will try to reuse the cache layer as much as possible, which notably reduces the build time. In our Dockerfile, we have to go through all the updating and dependency installation processes if any package to be installed needs updating. To benefit from building caches, we'll re-order the directives based on a rule of thumb: run less frequent instructions first.

Additionally, as we've described before, any changes made to the container filesystem result in a new image layer. To be more specific, ADD, RUN, and COPY create layers. Even though we deleted certain files in the consequent layer, these files still occupy image layers as they're still being kept at intermediate layers. Therefore, our next step is to minimize the image layers by simply compacting multiple RUN instructions and cleaning the unused files at the end of the RUN. Moreover, to keep the readability of the Dockerfile, we tend to format the compacted RUN with the line continuation character, . Although ADD can fetch a file from a remote location to the image, it's still not a good idea to do this as this would still occupy a layer in order to store the downloaded file. Downloading files with RUN and wget/curl is more common.

In addition to working with the building mechanisms of Docker, we'd also like to write a maintainable Dockerfile to make it clearer, more predictable, and more stable. Here are some suggestions:

  • Use WORKDIR instead of the inline cd, and use the absolute path for WORKDIR
  • Explicitly expose the required ports
  • Specify a tag for the base image
  • Separate and sort packages line by line
  • Use the exec form to launch an application

The first four suggestions are pretty straightforward, aimed at eliminating ambiguity. The last one refers to how an application is terminated. When a stop request from the Docker daemon is sent to a running container, the main process (PID 1) will receive a stop signal (SIGTERM). If the process is not stopped after a certain period of time, the Docker daemon will send another signal (SIGKILL) to kill the container. The exec form and shell form differ here. In the shell form, the PID 1 process is /bin/sh -c, not the application. Furthermore, different shells don't handle signals in the same way. Some forward the stop signal to the child processes, while some do not. The shell at Alpine Linux doesn't forward them. As a result, to stop and clean up our application properly, using the exec form is encouraged.

Combining those principles, we have the following Dockerfile:

FROM ubuntu:18.04

RUN apt-get update && apt-get upgrade -y
&& apt-get install -y --no-install-recommends
curl
python3.6
python-pip=9.*
&& curl -SL https://example.com/otherteam/dep.tgz
| tar -zxC /usr/src
&& rm -rf /var/lib/apt/lists/*

ENTRYPOINT ["python"]
CMD ["entry.py"]
EXPOSE 5000
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . /app

There are other practices that we can follow to make our Dockerfile better, including starting from a dedicated and smaller base image rather than general-purpose distributions, using users other than root for better security, and removing unnecessary files in the RUN in which they are joined.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset