How to write a Dockerfile

Joerg Flade
18 min readAug 28, 2020

--

What is a Dockerfile?

A Dockerfile is a piece of code, similar to a script, that contains all commands, that have to be executed to create a Docker image.
It can be called as “installation instructions of an image as code” similar to ansible or puppet from the conceptional idea, but only for Docker and much simpler.

This Dockerfile will be used and interpreted by the “docker build” command. The commands inside this Dockerfile have a special DSL: https://docs.docker.com/engine/reference/builder/.
The documentation here will only give a rough overview of the major commands.

You should also have a look at the official best practices: https://docs.docker.com/develop/develop-images/dockerfile_best-practices/

FROM definition (must-have)

The “FROM” directive is the only must-have in a Dockerfile. It must be the first entry (except the possibility of an ARG directive).
More detailed instructions can be found here: https://docs.docker.com/engine/reference/builder/#from

This command defines, on which base image (or image) the new image should depend on or (if you want to create a base image) if it is a base image.

Available images can be found at https://hub.docker.com/.

For new images, that should be delivered to customers via private repository you should use a base image from your repo instead of Docker Hub. This gives the control and the possibility for backups of those images. Else it can be hard to rebuild older versions for an audit review if the public repository will delete older tags for example.

Syntax

The “FROM” directive uses the following syntax:

FROM <Docker repository/><Image name><:Tag | @Digest>

Docker repository is optional.

  • If it is not given, then the “docker build” command is trying to resolve the image from https://hub.docker.com/. If it is a DNS name, then it is trying to resolve this as a repository URL.
  • The repository should end with a trailing “/”

The image name is mandatory

  • defines from which (base) image the new image should inherit

The Tag (with leading “:”) or Digest (with leading “@”) are optional.

  • If no tag or digest was given, Docker is trying to resolve the latest image.
  • typically tags will be used instead of Digests.

Base Image

To create a base image, the FROM command MUST look like this:

FROM scratch

In most cases, it makes no sense to create own base images, because nearly every distribution provides its base image already. More information for interest can be found here: https://docs.docker.com/develop/develop-images/baseimages/.

Image

“Normal” images are defining here already existing images.
In this case, the FROM directive tells the Docker builder to use this base image as a starting point (similar to a VM template for Docker).

The Docker builder needs to have access to this image, that will be referenced here, which means, that if this image is for example only available in a Docker repository in Artifactory, which is secured with credentials, you have first to login into this repository before you can build it!

Examples

To use a public image, you can go to Dockerhub and search for the needed image. Then you can use the image name directly without the repository:

FROM alpine

This means for Docker:

To use or to nail the new image to a special version of this image, it is always best practice to use define a tag. Else it can be, that the Dockerfile won’t work after some time if the latest base image has changed too much:

FROM alpine:3.10

This means for Docker:

  • Download the image “alpine” with the tag “3.10” from hub.docker.com.

To use images from private repositories like our Artifactory, it is possible to add the repository name:

FROM private-docker-repo.company.com/alpine:latest

This means for Docker:

  • Download the image “alpine” with the tag “latest” from the repository “private-docker-repo.company.com”.

To define the tag for example as an argument, it is possible to “predefine” this argument before the FROM directive:

ARG tag=1.2.3
FROM private-docker-repo.company.com/myimage:${tag}

This means for Docker:

  • Download the image “myimage” with the tag “1.2.3” from “private-docker-repo.company.com”.

“ARG” is the only allowed directive before “FROM”.

Audit questions

Can you define the FROM directive everywhere in the Dockerfile?

No, It must be the first. The only exception is the ARG directive.

Is it possible to control elements of the FROM directive from outside of the Dockerfile with the build command?

Yes, if the dynamic values are defined as ARG. Then it can be done with the “ — build-args” option.

Is this a correct FROM value:
FROM private-docker-repo.company.com/banking/mymodule/1.2.3

No, because it defines the tag (version number) as an image name. The tag has to be separated with “:”.
The correct one will be:
“FROM private-docker-repo.company.com/banking/mymodule:1.2.3”

ARG and ENV

Those directives are for defining some kind of variables. Those variables can be used inside of the Dockerfile with ${variable name}.
The difference between them is, that the “ARG” directive is only available in the build context and “ENV” is also available in the runtime as a system environment variable, which can be used by applications to configure something.

It is always a good idea to use “ARG” for every build only relevant variable definition and “ENV” only for runtime relevant things.

ARG

https://docs.docker.com/engine/reference/builder/#arg

ARG defines variables that are relevant for the build time only. They can have predefined values or the values can be defined with the “docker build — build-args” command.

ATTENTION: Do not abuse args for passing credentials into the container, because these credentials are visible to any user of the image with the “docker history” command.

Syntax

ARG <key><=value>

The key is mandatory and defines the argument for the “docker build — build-args” option, which is helpful for parametrized and automated image builds.
The value is optional. Here it is possible to set a default, but this can also be overwritten with “ — build-args” while the build execution.

It is also possible to replace the “=” between key and value with space.

Examples

ARG MYIMAGEVERSION
FROM myimage:${MYIMAGEVERSION}
ARG user=app
ARG group=app
ARG uid=2000
ARG gid=2000
RUN addgroup -g ${gid} ${group} && useradd -h /home/${user} -u ${uid} -G ${group} -D ${user}
[...]

This Dockerfile defines an empty “MYIMAGEVERSION” argument to get the right image tag. This argument has to be passed through the “docker build — build-env MYIMAGEVERSION=1.2.3 .” command for example.

The other arguments like “user”, “group”, “uid” and “gid” are predefined. They can be overwritten, but they can also only be used as “once-defined → multiple-times-used” variables.

ENV

https://docs.docker.com/engine/reference/builder/#env

ENV defines variables, that are available through the build process and (the main idea behind this) while the container runtime as system environment variables.
This definition should be used to configure applications. ENV definitions can be overwritten with the “docker run — env <key>=<value>” command or with a .env file (“docker run — env-file=configuration.env”), which makes it easy to define “dummy” entries or default values in the Dockerfile and to inject the real data for the container.

ATTENTION: Do not use ENV variable definitions for defining real-world credentials. It is possible to define a dummy entry there and to set the real credentials only with the “docker run” command.

Syntax

ENV <key><=value>

For the ENV directive, key and value are both mandatory (in opposite to ARG).

It is also possible to replace the “=” between key and value with space.

Examples

FROM myimage:1.2.3ENV MYMODULE_OAUTH_JWT_KEY_URI=null
ENV MYMODULE_DEFAULT_USER=john doe
[...]

This example sets the variables “MYMODULE_OAUTH_JWT_KEY_URI” to null and “MYMODULE_DEFAULT_USER” to “john doe”. An application can now read those environment variables from the system.

Java example:

String defaultUser = System.getenv("MYMODULE_DEFAULT_USER")

Spring applications can read their configuration also directly from environment variables. In this case, it is recommended to write the characters as upper case and then you have to separate the fields (properties with “.” and YAML with “:”) with an underscore.

Example

Spring defines:

mymodule:
oauth:
jwt:
key:
uri: <myurl>
defaultuser: ${mymodule.default.user: test}

Now Spring looks for those environment variable combinations:

  • MYMODULE_OAUTH_JWT_KEY_URI
  • MYMODULE_DEFAULTUSER (because of the YAML structure)
  • MYMODULE_DEFAULT_USER (because of the variable in the YAML structure)

Audit questions

With which directive can you ensure, that the key/value of the variable is not part of the runtime?

With the ARG directive. The information of the defined variables with ARG is not available for running containers.

How can you define recurring values (e.g. username) for the build time?

ARG username
(if the username has to be delivered through the build command)
or
ARG username=myuser
(if the username should be defined as a default)

How can you provide configuration values to the application?

ENV MYCONFIGURATION_KEY=myvalue
(this creates a system environment variable for the container with the name “MYCONFIGURATION_KEY” and the value “myvalue”.

How to provide real-world credentials to the container?

In the best case with a configuration system (e.g. config server, vault, …) or via Kubernetes secrets.
It is also possible to give the credentials to the application via the “ENV” directive. In this case, the credentials in the Dockerfile should be only a placeholder.
The real ones have to be delivered with the “docker run — env <key>=<value>” or the “docker run — env-file=myconfig.env” command.

RUN

https://docs.docker.com/engine/reference/builder/#run

With the “RUN” directive it is possible to execute shell commands while the build time in the container on the current “to-create” image layer.
This makes it possible to install or download things.

The result after a RUN command is a new read-only image layer. This means, that it is always a good idea to aggregate shell commands to one larger RUN command (see the explanation of the second example below).
RUN is executing normally with the “sh” shell. If you need to use “bash” or other shells, you can execute the command like this:

RUN /bin/bash -c 'ls -la'

In this case, bash has to be installed on the base image, which is normally not the default case!

Syntax

RUN <command>

Examples

A simple example:

FROM alpine:latestARG user=app
ARG group=app
ARG uid=2000
ARG gid=2000
RUN addgroup -g ${gid} ${group} && adduser -h /${user} -u ${uid} -G ${group} -D ${user}
[...]

This command creates a new group (addgroup), adds the user (adduser) and changes the owner (chown) and the permissions (chmod) of the /app and /app/entrypoint.sh on one image layer on an Alpine base image.

To install glibc under Alpine the run command can look like this:

FROM alpine:latest
# install base packages
RUN apk update && apk -U upgrade -a && \
apk add --no-cache tar zip unzip wget procps ca-certificates
# GET GLIBC FROM SGERRAND: https://github.com/sgerrand/alpine-pkg-glibc
RUN wget -O /etc/apk/keys/sgerrand.rsa.pub https://alpine-pkgs.sgerrand.com/sgerrand.rsa.pub && \
wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/${GLIBC_VERSION}/glibc-${GLIBC_VERSION}.apk && \
wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/${GLIBC_VERSION}/glibc-bin-${GLIBC_VERSION}.apk && \
wget https://github.com/sgerrand/alpine-pkg-glibc/releases/download/${GLIBC_VERSION}/glibc-i18n-${GLIBC_VERSION}.apk && \
apk add --no-cache glibc-${GLIBC_VERSION}.apk glibc-bin-${GLIBC_VERSION}.apk glibc-i18n-${GLIBC_VERSION}.apk && \
rm -f /etc/apk/keys/sgerrand.* && \
echo "export GLIBC_LANG=${LANG}" > /etc/profile.d/locale.sh && \
echo "LANG=${LANG}" >> /etc/environment && \
/usr/glibc-compat/bin/localedef -i ${GLIBC_LANG} -f UTF-8 ${GLIBC_LANG}.UTF-8 && \
rm *.apk && \
echo "Installing additional packages... done"
[...]

In this case, we create 2 layers because of RUN. The first one is a layer that contains the update of the OS packages and installs some tools, that are required.

The second one downloads the glibc certificates, glibc itself, removes the certificates, exports some system environment variables to /etc file, rebuilds the locale definition, and removes the installation packages. As a result, this layer will contain the installed glibc, but not the downloaded packages. If we had done this in separate RUN executions, the result will be 10+ layers. The visible result from UnionFS perspective will be the same as we had done it in one RUN directive, but all the steps are all part of the image layer chain, which means if you look into each layer, you will find some layers with the later deleted packages. Also, the image result will have a much bigger size.

Hint: Try to combine shell commands together, if they are all working for one goal (install glibc, create user and groups, …)

Audit questions

Is it possible to execute commands with different shells?

Yes, but they have to be installed before. Then you can use the RUN /bin/bash -c ‘ls -la’ command, for example, to execute “ls -la” inside a bash shell.

Can you use RUN multiple times to execute different shell commands?

Yes, it is possible, but every RUN directive creates a new image layer. Only 127 layers are allowed as a maximum at the moment. Also if some operations need a cleanup, it is better to do all in one RUN directive, because then the new image layer contains only the cleaned-up data instead of multiple layers which are “hiding” deleted files later, which has an impact to the final image size.

ADD and COPY

ADD and COPY are very similar.
The main difference is, that COPY can only copy files from the Docker host machine (machine, that builds the Docker image) into the image, and ADD is able to copy files from an URL and it can extract a tar file from the source directly into the destination.

This means you can not use ADD if you want to copy a tar file (or tar.gz) into a directory “as-it-is”. In this case, it is much better to use COPY. If you want to extract it, then ADD is your best friend.

Both directives can only copy files from the current directory or its subdirectories. It is not possible to copy files from parent directories! But you can execute a build from the “main” directory of your project and reference a Dockerfile, which is located in a subdirectory. Then the commands inside of the Dockerfile must be relative to the “main” directory. This means if the Dockerfile is located in a subdirectory called “devops” and there is also the “entrypoint.sh” file located, then you need to tell Docker “COPY devops/entrypoint.sh /” (or ADD) to copy the file from there into the image.

Directory structure:
|- myproject
| |- build
| |- lib
| |- myapplication.jar
| |- devops
| |- Dockerfile
In this case you have to execute the docker build command from the "myproject" directory:
docker build -t myimage -f devops/Dockerfile .
And the Dockerfile has to copy the myapplication like this:
COPY build/lib/myapplication.jar /app

COPY

https://docs.docker.com/engine/reference/builder/#copy

Syntax

COPY <src> <dest>
or for multiple files:
COPY ["<src1>", "<src2>", "<src x>", "<dest>"]

In a simple case, you copy one file or directory from the source to the destination. You also can use wildcards here like “*”. If you need to copy multiple files to one destination, you can write them between “[“ and “]”. In this case, the last element has to be the destination.

It also supports “ — chown=<user>:<group>” as a first argument, which makes it possible to copy the files with the right target owner and you do not need to execute later a “RUN chown” command. Sometimes the “ — chown” does not work properly. Also, the user must exist already!

Examples

FROM alpine:latestCOPY myApplication*.tar.gz /app/
COPY ["libs/myfirstJar.jar", "libs/mysecondJar.jar", "/app/"]
[...]

This copies all files with the name “myApplication*.tar.gz” from the local machine into the /app directory of the image. The tar file will NOT be extracted!
The second copy does a multiple file copy of “myfirstJar.jar” and “mysecondJar.jar” from the local machine into the “/app” directory of the image.

ADD

https://docs.docker.com/engine/reference/builder/#add

Syntax

The syntax is the same as COPY.

As described under the main topic, ADD accepts also URLs as a source and if the source is a tar file, it extracts it directly into the destination.

Examples

FROM alpine:latestADD myApplication.tar.gz /app/
[...]

This copies the file with the name “myApplication.tar.gz” from the local machine into the /app directory of the image and extracts it there.

Audit questions

What happens, if I use the following directive: “ADD myfile.tar /app/”?

This will copy and extract the myfile.tar file into the /app/ directory.

How can I prevent the extraction of the tar file?

With the COPY directive, the tar file will only be copied and not extracted.

How can I change the owner directly to save a new image layer which only changes the owner?

Add the — chown=<user>:<group> flag: COPY|ADD — chown=<user>:<group> srcfile destination

ENTRYPOINT and CMD

ENTRYPOINT and CMD are directives to execute an application when the container was started.
Both are similar, but the difference is, that CMD can be overwritten by the “docker run <image> <command> <args>” command, while ENTRYPOINT can only be overwritten with a special option.
In normal cases, you should provide dynamic start options as CMD and more or less immutable starts of applications as ENTRYPOINT.

They should also only exist once in a Dockerfile, but it is possible to overwrite the directive from images by creating a new one. If there are multiple directives in a Dockerfile, it will only execute the last one!
If an image already contains those directives and this image will be inherited by the FROM directive, it is not necessary to write the same things into the new image. They will be used from the lower layer.

It is possible to use both directives together. Then the ENTRYPOINT will define the start of a static application/command and the CMD will execute something, that is optional and can be overwritten.

CMD

https://docs.docker.com/engine/reference/builder/#cmd

Without this CMD the application no application is running, which can result in a quick start-stop sequence for the container.

If the container should only be alive (for example slave containers for Jenkins in a Kubernetes environment), it is possible to do a dummy action there like “tail -f /dev/null”.

The CMD directive can easily be overwritten with the “docker run <container> <command>” command. The last argument here will overwrite the command, that was defined in the Dockerfile.

CMD gives the possibility to execute a command, which means, it can start the application directly. But it is recommended to use a shell script, that is copied into the container beside the application and to start only this. These start scripts are often called “entrypoint.sh”.

In these scripts, you can do a lot of more stuff that with this simple command. It helps also to create a shell script, that is reusable in other deployments like on classic servers.

Simply: it gives more control and has more advantages to use such an “entrypoint.sh” script.

Please aware, that this script should use “#!/bin/sh” as a shell interpreter, else you have to install bash to the image in most cases!

Syntax

CMD <command>
or
CMD ["executable command", "argument 1", "argument 2", "..."]

Examples

The following example will execute the entry point script “/app/entrypoint.sh myapplication” on the shell. It is possible to overwrite this when you start the container with the “docker run” command.

FROM alpine:latest
[...]
CMD ["/app/entrypoint.sh", "myapplication"]

This example will simply do a “tail -f /dev/null”, which keeps the container alive without executing an application:

FROM alpine:latest
[...]
CMD tail -f /dev/null

ENTRYPOINT

https://docs.docker.com/engine/reference/builder/#entrypoint

This directive has the same task as CMD but, as described above. ENTRYPOINT should start an application and CMD executes only a command.
The command can also start the application, but ENTRYPOINT should be the way to start it.

Example:

You have an application, that has to run and there is a command-line interface or shell script to trigger some actions or to start temporary jobs on startup (like imports).
Then you should use ENTRYPOINT to start the application and CMD to do the initial jobs, that are no longer needed after they are executed.

Syntax

ENTRYPOINT <command>
or
ENTRYPOINT ["executable command", "argument 1", "argument 2", "..."]

Examples

This example of ENTRYPOINT starts the application. This can not be changed with the “docker run” command.

FROM alpine:latest
[...]
ENTRYPOINT ["/app/entrypoint.sh", "myapplication"]

This example will simply do a “tail -f /dev/null”, which keeps the container alive without executing an application and it is not possible to overwrite it with “docker run”:

FROM alpine:latest
[...]
ENTRYPOINT tail -f /dev/null

Additional examples of CMD and ENTRYPOINT together

When you have a container which should start an application and there are, for example, two simple import job inside, which can differ, depending on the customer, you can write a Dockerfile like this:

FROM alpine:latest
[...]
CMD ["/app/importJobA.sh", "/importData/"]
ENTRYPOINT ["/app/entrypoint.sh", "myapplication"]

Now, let’s execute this image and start the container:

docker run --rm myImage

By default, the container now starts the “/app/importJobA.sh /importData/” script (let’s say, it imports the data from this directory) and the application with “/app/entrypoint.sh myapplication”.

If you now want to execute the “importJobB.sh”, then we can start this container like that:

docker run --rm myImage "/app/importJobB.sh /importData"

This starts the same image from the same Dockerfile, but the difference is, that it starts now “/app/importJobB.sh /importData/” instead of “/app/importJobA.sh”. The start of the application (“entrypoint.sh”) is the same as before.

Audit questions

Is it possible to write as many CMD or ENTRYPOINT directives in one Dockerfile to start different tasks?

No. The Dockerfile itself is correct, but only the latest CMD and ENTRYPOINT directive will be executed!
It is a much better idea to use an “entrypoint.sh” script, which executes all necessary tasks on startup.

Is it possible to mix CMD and ENTRYPOINT in one Dockerfile?

Yes, it is. The CMD directive can easily be overwritten from outside with “docker run <image> <command> <args>”, but the ENTRYPOINT should be stable in most cases.

USER

https://docs.docker.com/engine/reference/builder/#user

This directive will execute the following commands in the context of the user (or uid). Else applications are running as root inside of the container.

It is required, that the user exists already in a lower image!

Syntax

USER <user>[:<group>]
or
USER <uid>[:<gid>]

Examples

If a user “app” with the group “app” was created and the application was copied with the chown to “app:app”, it makes sense to start the application with this user:

FROM alpine:latestARG user=app
ARG group=app
ARG uid=2000
ARG gid=2000
RUN addgroup -g ${gid} ${group} && adduser -h /${user} -u ${uid} -G ${group} -D ${user}
COPY --chown=app:app application.jar /home/${user}/app.jar
USER ${user}ENTRYPOINT java -jar /home/{$user}/app.jar

Audit questions

Which user will be used as the default?

root
To avoid to start applications with this user, the directive “USER” can be used to work in another user context.

VOLUME

https://docs.docker.com/engine/reference/builder/#volume

This directive marks a directory, that it will mount an external volume into the container.
With that, it is possible to let an application run in a container, but to back up the data or to provide import data from an external volume.

This is often used to mount for example the data directory of databases or import directories of data-processing applications.

The data is then bi-directional available: The host machine and the container will see the same data in this shared directory.

Syntax

VOLUME <mount point>
or
VOLUME ["<mount point>"]

Examples

If your application stores data in “/var/lib/data” for example, you can define this directory as a VOLUME mount point. With that, it is possible to define the host directory, that should be mounted to this point as a shared directory between container and host machine.

FROM alpine:latest
[...]
VOLUME /var/lib/data
[...]

If we want to share this container directory “/var/lib/data” with our host directory “~/app1data/”, we can start the container like this:

docker run --rm -v ~/app1data:/var/lib/data myImage

This starts the container with the image “myImage” and mounts the local directory “~/app1data” to the container directory “/var/lib/data”.

Audit questions

For which use-cases is the VOLUME directive good to use?

If it is required to share data between the host and the container, this directive should be used.
Good examples are import data or data like database storage and so on.

Conclusion

With the described directives here, it is possible to write the first Dockerfiles and to understand the most existing ones. There are many more possibilities (also for the here described directives).
It makes sense to read also the official site (https://docs.docker.com/engine/reference/builder/), to see what is possible.

Some good additional directives for more complex Dockerfiles are:

Final audit question for Dockerfiles

To check, if you have understood everything, you can try to write a Dockerfile with the following requirements:

  1. use the base image of “alpine” in the version “3.10” from the “docker” repository in the private repository “private-docker-repo.company.com”
  2. make it possible, that the version of the base image can be changed while the build process from outside without changing the Dockerfile.
    -> The container should not see this configuration!
  3. make it possible, that the application can read an initial master password from the “INTITAL_MASTER_PASSWORD” environment variable.
    -> This value must be changeable at the container start
  4. Update the binaries of the container with “apt update” and “apt upgrade”
  5. Install the following tools with “apt install”:
    - grep
    - bash
    - tar
    - unzip
    - get
  6. copy and extract a tar file (“application.tar.gz”) into the image to the directory “/app”
  7. copy a tar file (let’s call it “myData.tar.gz”) into the image to the directory “/app/data”
    -> change the owner of the files to the user “app” with the group “mygroup”
  8. execute the following commands:
    grep /app/hostEntries >> /etc/hosts

    wget http://mydomain.com/additionalpackage.zip -O /app/data/additionalpackage.zip

    unzip /app/data/additionalpackage.zip -d /app/data/

    rm /app/data/additionalpackage.zip
  9. Give the possibility to mount an external directory into the directory “/app/importDir”
  10. Start the application with the command “/app/entrypoint.sh”
  11. Make it possible, that the application can start one of those import jobs on startup and that it can be overwritten on container start:
    /app/importDataFromDirectory.sh

    /app/importDataFromServer.sh

Solution

ARG ALPINE_VERSION=3.10
FROM private-docker-repo.company.com/alpine:${ALPINE_VERSION}
# set the environment variable
ENV INTITAL_MASTER_PASSWORD="mypassword"
# update the binaries and install some tools
RUN apt update && apt upgrade
RUN apt install grep bash tar unzip
# copy and extract the application to /app
ADD application.tar.gz /app
# copy the tar, but do not extract it
COPY --chown=app:mygroup myData.tar.gz /app/data
# this should be done with one RUN instead of executing each command with a single RUN
# to save image layers with unneeded data
RUN grep /app/hostEntries >> /etc/hosts && \
wget http://mydomain.com/additionalpackage.zip -O /app/data/additionalpackage.zip && \
unzip /app/data/additionalpackage.zip -d /app/data/ && \
rm /app/data/additionalpackage.zip
# define a mount point for the import directory
VOLUME /app/importDir
# start the application
ENTRYPOINT /app/entrypoint.sh
# start the import scripts as default, but give the possiblitiy to change the script with "docker run"
CMD /app/importDataFromDirectory.sh

--

--

Responses (2)