In this guide, you will learn everything you need to know about Docker Swarm and how to use it to scale and securely maintain your Docker projects.
With applications needing more and more computing resources and uptime of nearly 100% it becomes very hard to maintain and scale your software without some kind of management system. This is where Docker swarm comes into play. Docker Swarm provides an easy way to scale and maintain your containers and services.
This guide will show you all the important concepts, commands and the structure of the configuration file. It will also give a real-world example of how you can deploy a real-world application at the bottom of the article.
Why care about Docker Swarm?
Before we get into the technical details about what a Swarm actually is and where it can help us let's discuss why someone would use it in the first place.
Swarm has a built-in load balancer that lets you specify how to distribute service and container between your different nodes. You can also expose ports for external load balancing services.
Integrated into the Docker Engine:
Swarm is directly integrated into the Docker CLI and doesn't require any additional orchestration software or other tools to create or manage a swarm.
Swarm lets you define the number of tasks you want to run for each service. This number can be changed using a single command which is handled by the swarm manager.
Swarm lets you apply service updates incrementally which means that it updates a specific amount of replicas at a time and your service will always be up even while updating.
What is a Swarm?
Docker Swarm is a cluster management and orchestration tool that makes it easy to scale and manage your already existing docker services. A swarm consists of multiple Docker hosts that run in the so-called swarm mode and act eighter as managers (managing member relationships) or as workers (run the services). A given Docker host can be a manager, worker or can perform both roles.
When creating a service in a swarm you define the optimal state of your service (number of replicas, ports of the service, network and storage resources, and more). Docker will try to maintain this desired state by restarting/rescheduling unavailable tasks and balancing the load between different nodes.
A node is an instance of the Docker engine participating in the swarm. You can run one or multiple nodes on a single device, but production deployments typically include Docker nodes distributed across multiple physical devices.
Manager nodes distribute and schedule incoming tasks onto the Worker nodes, maintain the cluster state and perform orchestration and cluster management functions. Manager Nodes can also optionally run services for Worker nodes.
Cluster management tasks include:
Maintaining the cluster state
Serving swarm mode to HTTP API endpoints
There should always be multiple manager nodes in your swarm because of the following reasons:
Maintaining high availability
Easily recover from a manager node failure without downtime
That is why Docker recommends you implement an odd number of nodes according to your projects availability requirements.
Note: Docker recommends a maximum of seven manager nodes for a swarm.
Worker nodes are also instances of the Docker Engine whose sole purpose is to execute containers and services as instructed by the Manager Nodes.
To deploy your application to a swarm, you need at least one manager node. By default, all manager nodes are also workers. To prevent the scheduler from placing tasks on your manager node in a multi-node swarm, you need to set the availability to Drain.
A service is the definition of the tasks to execute on the nodes. It is the primary root of user interaction with the swarm.
When you create a service, you specify which container image to use and which commands to execute inside running containers. You also define other options for the service including:
the port you want to expose
CPU and memory limitations
the number of replicas of the image to run in the swarm
a rolling update policy
Here is an example of an HTTP server balancing its load on three replicas:
As you can see the service has three different tasks and each task invokes exactly one container. A task represents a slot where the scheduler can place a container. Once the container is live, the scheduler recognizes that the task is in a running state.
A task carries a Docker container and the command that is executed inside of the container. It is the atomic scheduling unit of the swarm. Tasks are assigned by the manager node to worker nodes according to the number of replicas set in the service.
When a service is created or updated, the orchestrator realizes the desired state by scheduling tasks. Each task is a slot that the scheduler fills by spawning a container which is the instantiation of a task. Now when one of these containers fails its health check or crashes, the orchestrator creates a new replica task that spawns a new container to replace the failing one.
The diagram below shows you how swarm mode accepts services and schedules tasks for the worker nodes.
Replicated and global services:
There are two different ways you can deploy a service, replicated and global.
Replicated services specify the number of identical tasks (replicas) you want to run. These replicas will then be split up on the different worker nodes and each serves the same content.
A global service is a service that runs one task on every node you have in your swarm and doesn't need a pre-specified number of tasks. Global services are usually used for monitor agents or any other type of container that you want to run on every node.
Here is a visual representation of a three-service replica and a global service.
Now that you know the key concepts of Docker swarm we can continue by learning the basics about creating and managing a cluster.
Swarm can be run on nearly any operating system and is very easy to install so let's get into it.
Windows and Mac:
Compose is included in the Windows and Mac Desktop installation and doesn't have to be installed separately. The installation instructions can be found here:
If you are using a physical Linux machine or cloud hosting service as a host, simply follow the installation instructions provided by Docker.
Creating a Swarm:
The first step after installing Docker on your machine is creating a swarm. For that, we need to run the following command.
docker swarm init --advertise-addr <MANAGER-IP>
The MANAGER-IP is the IP that the Swarm node manager will use to advertise the Swarm Cluster Service (If you are using Docker Desktop for Mac or Docker Desktop for Windows to test single-node swarm, simply run docker swarm init with no arguments).
Add nodes to swarm:
With the swarm cluster created, we can now add a new node worker using the docker command provided by the output of the command above.
Drainavailability prevents a node from receiving new tasks from the swarm. It also stops all tasks running on the node and launches replica tasks on the other available node with an active availability.
docker node update --availability drain worker1
Inspect the node:
You can inspect the node to see its availability:
docker node inspect --pretty worker1
Update it back to active:
After you have made the required changes on your node and are ready to enable it again.
docker node update --availability active worker1
The routing mesh is a cluster-wide transport-layer (L4) load balancer that routes all incoming requests in the swarm to published ports of an available container on a node. It allows all the swarm nodes to accept connections on the services published ports.
You can publish a port using the --publish flag or the short version -p:
docker service create -p 80:80 --name nginx --replicas 3 nginx:mainline
More information about how you can bypass the routing mesh and configure an external load balancer can be found on the official documentation.
Logging is a very important topic for containerized applications and is also handled in Docker swarm.
Get the logs of a service:
Getting the logs of a service is very similar to getting the logs of a single container.
docker service logs hellogoogle
Following the logs:
You can also get a real-time view of your logs using the --follow flag.
A secret is a piece of data that should not be transmitted over a network or stored unencrypted. Docker provides a service to centrally manage such data and securely transmit it to only those containers that need access to it.
Secrets can be created, inspected and removed via the command line using the following commands.
Managing a secret:
We can create a secret using the secretcreate command which has two parameters: the name of the secret and the secret itself.
echo "Secret" | docker secret create my_secret -
After creating the secret you can inspect it or display all secrets that are available on your machine.
# List all secrets
docker secret ls
# Inspect a specific secret
docker inspect SECRET_NAME
Lastly, you can remove a secret using the secret rm command.
docker secret rm my_secret
Passing a secret to a service:
Secrets can be added to services on creating or while the containers are running using the following commands.
The --secret tag can be used to add a secret while creating a service.
docker service create --name="nginx" --secret="my_secret" nginx:latest
Adding and removing secrets from running services can be done using the --secret-add and --secret-rm tags on the service update command.
docker service update --secret-rm="my_secret" nginx
Docker Stack is an extension of the already existing docker-compose file which lets you define deployment options for your swarm configuration like the number of replicas or resource limitations of your service.
If you have no experience with docker-compose yet I would recommend looking into this article first.
Creating a stack:
As said before docker stack is an extension of the docker-compose file and just lets you define some extra attributes for your swarm deployment. These attributes can be defined using the deploy key in a compose file.
Important attributes include:
replicas - Defines the number of replicas for a service
update_config - Defines how the service will be updated e.g. parallelism and delay
You can also list all your stacks or the services of a specific stack using the ls and ps command.
# List all stacks
docker stack ls
# List the services of a specific stack
docker stack ps STACK_NAME
Removing a stack is similar to removing a service and can be done using the rm command.
docker stack rm STACK_NAME
Now that we have gone through the theory of Swarm let's see some of the magic we just talked about in action. For that, we are going to deploy a Nestjs GraphQL application which already includes a docker-compose file so we can focus on the swarm configuration.
The project contains a local image that has to be stored in a registry before the file can be run as a swarm. If you don't know what a registry is or why we need one I would recommend reading this article on the docker documentation.
Let's start by cloning the repository with the finished boilerplate from Github.
As indicated by the terminal output, your services are now running and you can check it by either writing docker stack ls in the command line or visiting localhost:3000/graphql. You should see something similar to this: