Docker — What it is, How Images are structured, Docker vs. VM and some tips (part 1)
Table of content
- Linux Kernel: Namespaces
- Linux Kernel: Control groups
- Linux Kernel: Union file systems
- More detailed information
Does it solve any of my problems?
What is Docker?
Docker is a container engine, which uses Linux Kernel features to create containers on top of the operating system.
Docker itself runs as a daemon, that manages the containers on a server.
To create isolated environments, it uses the following Kernel technologies:
Linux Kernel: Namespaces
The namespaces are responsible to isolate the workspace for a container. For this, is uses the following namespaces:
- The pid namespace: Process isolation (PID: Process ID).
- The net namespace: Managing network interfaces (NET: Networking).
- The ipc namespace: Managing access to IPC resources (IPC: InterProcess Communication).
- The mnt namespace: Managing filesystem mount points (MNT: Mount).
- The uts namespace: Isolating kernel and version identifiers. (UTS: Unix Timesharing System).
Linux Kernel: Control groups
With control groups (cgroups) it is possible to limit an application to a specific set of hardware resources.
Linux Kernel: Union file systems
Union file systems, or UnionFS, are file systems that operate by creating layers, making them very lightweight and fast.
More detailed information
Docker vs. VM
In the opposite of VMs Docker (or a Docker container) does not emulate a whole computer with a BIOS and virtual hardware on which an operating system has to be installed.
It uses the Host OS and starts the applications with a very lightweight overhead in an environment, that is independent of the concrete Host Linux version.
This makes it possible to run different applications with different sets of bins and libs (like glibc, busybox, ….) on one real host OS.
The effective memory and CPU footprint of Docker is here nearly zero, the level of isolation from an application view is similar to a VM.
Docker Image and Container
Docker Image
To pack and transport an application, Docker uses a “Docker image”, which is the file, that contains the isolated environment, that is needed and the application. It is some kind of template for the runtime environment.
A Docker image contains different layers, which are all read-only. Every layer has an ID and can contain “parent IDs” of underlying images.
Every new layer is on top of the older layers and can “overwrite” files of the lower layers.
Every command in the Dockerfile definition will create a new layer image.
An image can be shared between different containers.
The base image is the image, which contains the bootfs of the used Linux for the container (not for the host). This special kind of image must start with a “FROM scratch” definition in the Dockerfile.
More information about base images: https://docs.docker.com/develop/develop-images/baseimages/
As described in the picture, a final image is a layer with a name, that can be used.
In this example, the image “apache” with the tag “1.0.0” depends on the image with curl with the ID “3fa76543”, which is also an image, but without the name, it is not directly usable.
With the help of UnionFS, those images (or layered images = layers) will be aggregated to one image, that can be used by a container to run the apache.
The container will see the files as one file system, but if a layer overwrites a file of another layer, the container will only see the “latest” or “highest” version (top to down).
When a new image will be created and pushed into a repository (Docker Hub/Artifactory….), Docker compares the hashes of each layer with the existing.
If a layer already exists, this layer will only be referenced and not stored twice! This helps to save a lot of storage space.
INFO
Images will be created with every command in the Dockerfile. Every new command creates a new layer, which overwrites the lower. There is a limit of how many layers UnionFS can handle (currently 127 in version 19, can change in further versions)
To have small images for better transportation to customers and faster deployments, ensure, that the steps in the Dockerfile are in the correct order and that not every simple command will be executed in one new layer. For example, to install glibc for Alpine Linux base images, it makes more sense to aggregate the commands in one Dockerfile RUN command and execute them with “command A && command B && command C”. The result is one layer with glibc for this Alpine base image and not x layers which are containing, for example, downloaded files, that will be deleted later by an overlay.
This means: Do not install/download things in one command and uninstall/delete them in a further command because every state after the command will be saved as a read-only image layer that is part of the final container image, which can be very large in such cases.
Docker images should be as small as possible!
Docker container
If Docker should run that image, it creates a container based on the image.
Because of the fact, that the images are read-only, the container is the runtime with all read-only layers from the image (which can be shared over different containers) and a writable layer for the runtime (which is unique for every container).
The content of the container can not be pre-defined and can contain later some logs or PIDs of the application, and so on. If a new container is starting, a new writable layer will be created (like described already in the picture above).
This concept ensures that the environment is the same for every host system, which makes it traceable and exchangeable between development, testing, and production.
If something went wrong in production, it is reproducible in development with the same image and the same configuration.
INFO
To keep the advantage of these concepts, it is NEVER a good idea to modify the image by overwriting, for example, the configuration in a special “production” layer or a special “test” layer.
A Docker container should be always configured with environment variables, that can be provided via command line while executing the “docker run” command.
If somebody wants to modify the images to let the application run without the possibilities, that are given by the CLI of the “docker run” command, then the image is simply crap!
Why use Docker?
- Docker is available on different host OSes and guarantees interoperability (Linux, MacOS, Windows)
- Because of the isolation of the containers, a deployed application
- runs always in the same environment (base image + layered images) independent of the host OS and without the overload of a VM
- has always the same tools/bins/libs in the same version available as the time when the image was created
- a production or test problem can be reproduced on every local machine (if it is not a load problem), because of the reasons above. No “we have another Linux or OS or library version” excuse.
- the deployment in every stage (development/test/acceptance/production environment) is for the application the same
- No complex OS configuration is needed to provide the same environment over different stages
- In opposite of VMs, Docker container start in ms instead of minutes, because it does not need to initialize and emulate BIOS, booting an OS and so on
- Containers are able to scale very fast and (for testing) also on one machine. This helps to scale small(er) units with fewer resources and can help to find multi-instance problems in the development environment
- Effective usage of available resources, because of the minimal overhead in opposite to a VM
- Simplifying the deployment because a “docker pull && docker run” command executes the application already. Only the pure application configuration or volume mounts has to be done if it is needed.
Does it solve any of my problems?
No!
Docker is a simple container that can be executed and which contains the configuration of the system similar to a VM template (but without the disadvantages of VM).
Every image must be configured/build with all the tools and environment libs, that are needed by the application by the Dockerfile developer.
This “application expert” defines with the image the full environment with all things, that have to be done like:
- installing Java
- set the right start parameter (of creating the right start script (entrypoint.sh))
- ensuring that configuration options are given if the application needs some configuration
For this Dockerfile maintainer, the work is similar to the configuration of a well prepared VM, in which the application should be integrated (there are some differences, but also a lot of things have to be done).
The advantage is, that the installation of the application with all configurations are available as code and to create a Docker image, this code (Dockerfile + additional files) this code will be executed by interpreting the Dockerfile.
But: For all consumers (other developers, admins, hosting center…) of this image, it is much easier, because they can simply download this image and start it as a container and everything should be preconfigured.
Only if the Dockerfile creates a clean image, others will have much fewer problems with starting this application.