Docker is an application allowing to manage Linux containers on top of an existing OS. It provides a virtualisation layer (the Docker Engine) and thus any command and operations ran inside the container remain the same regardless of the OS on which Docker is set up. Docker relies solely on the host OS as a result the only compatibility issue we can run into is whether the OS supports Docker: this simplifies greatly sharing of code and transitioning from development to production.
Containers have a great advantage compared to traditional virtual machines is that they are lightweight as they do not include an OS which makes them easy to share. As a matter of fact the Docker Hub is a repository of images shared by developers and contains more than 400K images. As a good practice, I always use images from the official repository.
Docker runs natively on Linux so it is straightforward to setup on a Linux machine. In general, running Docker on anything other than Linux comes with its challenges. I focus on the basic pitfalls when working with the application on MacOS, though I still explain how to set it up on Linux distributions.
Docker can be installed easily on MacOS with homebrew:
$ brew install docker
On MacOS, we need some sort of virtual machine that runs part of Linux required by Docker: the docker-machine. VirtualBox is the virtualisation layer allowing docker-machine to run on MacOS:
$ brew cask install virtualbox
Thus (and this is important to understand especially for networking) the host OS is not MacOS but the virtual machine itself.
On Linux docker can be installed using the native package manager apt-get:
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository “deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable”
$ apt-cache policy docker-ce
$ sudo apt-get install -y docker-ce
Docker reads a Dockerfile that specifies the base image (from Docker Hub) to build a container. The Docker container can be seen as an instance of the image and the Dockerfile as the script file to instantiate the container.
On MacOS, we first need to start the virtual machine before executing any Docker instruction:
$ brew install docker-machine
$ docker-machine create — driver virtualbox <virutal-machine-name>
VirtualBox is the driver required to start the docker-machine:
$ brew services start docker-machine
If need be, the machine can be stopped and restarted:
$ brew services stop docker-machine
$ brew services restart docker-machine
On MacOS it is also necessary to setup the environment using the command:
$ docker-machine env <virutal-machine-name>
$ eval”$(docker-machine env <virutal-machine-name>)”
All Docker commands on MacOS should be ran in a terminal with elevated admin privileges (I am not a huge fan of this but I figure that it is just for early prototyping, the rest of the time I am on Linux).
Now we are all setup and ready to build our first image. We notice how much more work is needed to setup Docker on MacOS. However there is also a possibility for automation.
2. The Dockerfile
As seen previously, a Docker image can be built from a base image by executing the instructions in a Dockerfile.
These files have a specific syntax that Docker reads and executes when the build instruction is called
The first instruction in a Dockerfile is FROM which specifies the base image. It is best to use an image from the official repository of Docker Hub. Among my favourites are:
The Dockerfile informs docker to build a new image with this base image and any new instruction in the rest of the file will modify the base image. The FROM command is imperative as the Dockerfile is invalid without it. For instance to use the latest distribution of Ubuntu as base we use the following code:
You can use any image in the repository as long as you specify their tag (here ubuntu:latest):
To start a new image from scratch, the following command must be added to the Dockerfile:
However I strongly advise to always build your images from existing (official) images. Now from there we can write additional instructions to build our new image. As an example, I build here a python STL workspace with the following softwares / libraries:
- python 3.6
- Jupyter notebook
Inside the Dockerfile, we can run shell scripts using the RUN command. The default shell is /bin/bash for both MacOS and Linux machines:
RUN apt-get update
RUN apt-get install -y python3 python3-pip # install python
RUN pip3 install numpy scipy pandas matplotlib
RUN pip3 install jupyter-notebook
After loading the new image, the software and libraries will be installed on the system at runtime.
To add files to the container, we can use the **ADD** command by specifying the file path and the destination path in the Docker container:
ADD <file-path> <destination-path>
You can use either absolute and relative paths (I find absolute paths cleaner).
The WORKIDIR command sets up a new working directory, as a result all of the following RUN and ADD will have as a default directory the one specified in the last WORKIDIR definition. If the WORKIDIR itself is defined with a relative path, then its default directory will be the last WORKIDIR and if this is the first command, then / i.e. root directory is used.
Since the Docker container we will eventually run is virtually isolated form the host OS, there will be a need to have a connection between both. To do so we need to specify a port of the container that host OS will listen to. The first step is to expose this port using the EXPOSE command:
Another useful command that I often add to my Dockerfiles is the CMD command that runs a script when the container starts. This allows for example to launch Jupyter notebook once my container is live. You can either write the command inline or refer to a .sh file (or something equivalent). Adding this shell file to the container with an ADD is handy.
The two following commands are useful in practice for more advance use so I figured it would be interesting to mention them:
- ENV: sets environment variables in the container shell
- USER: sets the default user
A best practice is to keep the image as lightweight as possible. One of the attractive feature of containers is their portability.
3. Handling containers
Now that our Dockerfile is ready, we can build our custom image using the following shell command and specifying the Dockerfile location
$ docker build <path-to-file>
This adds the new image to your local image lists. You can consult this list in the shell with the command:
$ docker images
And we get the following:
We notice that the images are referred to with is in the IMAGE ID column. Docker image ids are 64 hexadecimal digit strings and only there first 12 characters are displayed. I find easier to give my images custom names as their lifespan is longer then containers and also for versioning:
$ docker tag <IMAGE ID> <image-custom-name>
The CREATED columns indicates when the image was created.
Containers are created by running an image:
$ docker run -it <IMAGE ID>
The -it command specifies that we require an interactive shell into the container we just created. This is not ideal because once the terminal is closed, the container is stopped. We can also let the container run as a background process:
$ docker run -d <IMAGE ID>
You can get the list of all containers running as follows:
$ docker ps
We have the table with the ids of the existing containers, their status (running or down), the time they were created and the port. We can also access the port of a given container as follows:
$ docker port <CONTAINER ID>
The EXPOSE command only exposes a specific port of the container to the host OS, however the host OS port it is bound to is assigned at random. Setting the host OS port can be down when creating the container:
$ docker run -p <host-OS-port>:<container-port> <IMAGE ID>
where the container port is the one exposed is the first place.
I remind you that the host OS port is the virtual machine port, thus it is handy to know the IP address of the virtual machine:
$ docker-machine ip
So now how do we access the container? We can start an “remote” interactive shell when running the image and specify the shell we would like to use:
$ docker run -d <IMAGE ID> /bin/bash
or if se always use the default shell (as I do) you can replace the link by the environment variable for the default shell: $SHELL.
In a nutshell the Docker allows:
- easier and faster development
- easier and faster deployment (application bundled in the Docker image)
- easier and faster sharing (ensures the same behaviour of application regardless of the development environment)
- easier scaling of an application (application can be written as an addition of containerised micro services).
However, as it does not run natively on MacOS there is some additional work to setup an environment but is it worth the trouble as when you want to more your code to production is done very easily. In upcoming posts I go over how to setup a Machine Learning workspace on the cloud with Docker.