Avoiding common pitfalls

Till now, we discussed how effectively data volumes can be used to share data between the Docker host and the containers as well as between containers. Data sharing using data volumes is turning out to be a very powerful and essential tool in the Docker paradigm. However, it does carry a few pitfalls that are to be carefully identified and eliminated. In this section, we make an attempt to list out a few common issues associated with data sharing and the ways and means to overcome them.

Directory leaks

Earlier in the data volume section, we learnt that the Docker engine automatically creates directories based on the VOLUME instruction in Dockerfile as well as the -v option of the docker run subcommand. We also understood that the Docker engine does not automatically delete these auto-generated directories in order to preserve the state of the application(s) run inside the container. We can force Docker to remove these directories using the –v option of the docker rm subcommand. This process of manual deletion poses two major challenges enumerated as follows:

  1. Undeleted directories: There could be scenarios where you may intentionally or unintentionally choose not to remove the generated directory while removing the container.
  2. Third-party images: Quite often, we leverage third-party Docker images that could have been built with the VOLUME instruction. Likewise, we might also have our own Docker images with VOLUME inscribed in it. When we launch containers using such Docker images, the Docker engine will auto-generate the prescribed directories. Since we are not aware of the data volume creation, we may not call the docker rm subcommand with the -v option to delete the auto-generated directory.

In the previously mentioned scenarios, once the associated container is removed, there is no direct way to identify the directories whose containers were removed. Here are a few recommendations on how to avoid this pitfall:

  • Always inspect the Docker images using the docker inspect subcommand and check whether any data volume is inscribed in the image or not.
  • Always run the docker rm subcommand with the -v option to remove any data volume (directory) created for the container. Even if the data volume is shared by multiple containers, it is still safe to run the docker rm subcommand with the -v option because the directory associated with the data volume will be deleted only when the last container sharing that data volume is removed.
  • For any reason, if you choose to preserve the auto-generated directory, you must keep a clear record so that you can remove them at a later point of time.
  • Implement an audit framework that will audit and find out the directories that do not have any container association.

The undesirable effect of data volume

As mentioned earlier, Docker enables us to etch data volumes in a Docker image using the VOLUME instruction during the build time. Nonetheless, the data volumes should never be used to store any data during the build time, otherwise it will result in an unwanted effect.

In this section, we will demonstrate the undesirable effect of using the data volume during the build time by crafting a Dockerfile, and then showcase the implication by building this Dockerfile:

The following are the details of Dockerfile:

  1. Build the image using Ubuntu 14.04 as the base image:
    # Use Ubuntu as the base image
    FROM ubuntu:14.04
  2. Create a /MountPointDemo data volume using the VOLUME instruction:
    VOLUME /MountPointDemo
  3. Create a file in the /MountPointDemo data volume using the RUN instruction:
    RUN date > /MountPointDemo/date.txt
  4. Display the file in the /MountPointDemo data volume using the RUN instruction:
    RUN cat /MountPointDemo/date.txt

    Proceed to build an image from this Dockerfile using the docker build subcommand, as shown here:

    $ sudo docker build -t testvol .
    Sending build context to Docker daemon  2.56 kB
    Sending build context to Docker daemon
    Step 0 : FROM ubuntu:14.04
     ---> 9bd07e480c5b
    Step 1 : VOLUME /MountPointDemo
     ---> Using cache
     ---> e8b1799d4969
    Step 2 : RUN date > /MountPointDemo/date.txt
     ---> Using cache
     ---> 8267e251a984
    Step 3 : RUN cat /MountPointDemo/date.txt
     ---> Running in a3e40444de2e
    cat: /MountPointDemo/date.txt: No such file or directory
    2014/12/07 11:32:36 The command [/bin/sh -c cat /MountPointDemo/date.txt] returned a non-zero code: 1
    

In the preceding output of the docker build subcommand, you would have noticed that the build fails at step 3 because it cannot find the file created in step 2. Apparently, the file that was created in step 2 vanishes when it reaches step 3. This undesirable effect is due to the approach Docker uses to build its images. An understanding of the Docker image-building process would unravel the mystery.

In the build process, for every instruction in a Dockerfile, the following steps are followed:

  1. Create a new container by translating the Dockerfile instruction to an equivalent docker run subcommand
  2. Commit the newly-created container to an image
  3. Repeat step 1 and step 2, by treating the newly-created image as the base image for step 1.

When a container is committed, it saves the container's filesystem and, deliberately, does not save the data volume's filesystem. Therefore, any data stored in the data volume will be lost in this process. So never use a data volume as storage during the build process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset