Deploy Spring Boot Applications with Zero Downtime on Kubernetes
1. Overview
In this article, we will learn how to deploy a Spring Boot application to a Kubernetes cluster without dropping any production traffic, including the ones in flight.
The default Kubernetes deployment strategy is RollingUpdate. This already ensures there’s no protracted downtime, during deployments.
However, it does not sufficiently take care of in-flight requests and/or processes. This is what we will be addressing in this article.
Kubernetes, by default, forcefully terminates a container after 30s grace period. For a system with high traffic, this will cause read timeout or connection reset exceptions for the user whose request is being processed by the abruptly terminated pod.
In my experience, deployment of new changes for such a service is typically planned for off-peak periods or late into the night. This reduces the team’s velocity and increases the cost of managing the service.
You’ll learn how to mitigate this challenge by configuring health check endpoints, termination grace period and making the Spring Boot application listen to the context closed event.
1.1 Prerequisites
To follow this tutorial on deploying a Spring Boot application with zero downtime, you will need the following:
- A DigitalOcean account.
- A running DigitalOcean Kubernetes cluster. If you have not yet set one up, follow this product documentation on How to Create a DigitalOcean Kubernetes Cluster.
- The kubectl command-line tool, installed and configured to connect to your cluster. This process is covered during the cluster setup.
- Java and Maven installed on your local machine to build the Spring Boot application.
- Docker installed on your local machine to build and push the container image. You can follow our guide on How to Install and Use Docker on Ubuntu
- A Docker Hub or private container registry account to store your Docker image. For instructions on setting up a private registry, refer to How to Set Up a Private Docker Registry on Ubuntu.
1.2 Key Takeaways
- The default deployment strategy for Kubernetes does not account for in-flight requests, and the grace period is usually too small to have the desired effect
- Change the deployment’s
spec.template.spec.terminationGracePeriodSecondsto a contextually relevant value to buy more time for the app to complete its processes before being terminated. - Configure the deployment’s health checks to have relevant values. E.g. use
/actuator/health/readinessforreadinessProbeand not just/actuator/health - Implement a listener for the ContextClosed event in SpringBoot; Perform a thread sleep for a duration <= the configured
terminationGracePeriodSeconds - Use the right number of replicas that suits the throughput of the application.
2. Application Setup
For demonstration, we will create a Spring Boot 3.5.11 application with the help of Spring Initilzer.
The application will have two endpoints for creating and listing Pet resource and use an in-memory H2 database.
The pom.xml includes the Google Jib plugin, which we will use to create a Docker image for the Spring Boot application.
Listing 2.1 pom.xml
|
|
To build a Docker image, Docker Desktop or a similar tool needs to be running. Then, we will run the following Maven command:
Listing 2.2 Terminal
|
|
The name and tag of the resulting Docker image will be what we configured in the Google Jib plugin’s <image> element in the pom.xml.
For this demo project, the name of the Docker image will be seunmatt/k8s-zero-downtime:1.0.1.
The resulting image will be visible in your Docker Desktop or other similar tool. From the command line, we can list all the images using:
Listing 2.3 Terminal
|
|
From Docker Desktop, you can push the image to Docker Hub or via the command line:
Listing 2.4 Terminal
|
|
Pushing the Docker image to a remote repository is required for the next steps, as we will need to pull those images from a Kubernetes deployment.
3. Kubernetes Configurations
Let’s start by adding a Kubernetes (K8s) deployment file to our application. We will add it as a new file in the deployment
directory of the source code:
Listing 3.1 deployment.yaml
|
|
We configured the deployment to have two replicas and use the Docker image we published in section 2 above: seunmatt/k8s-zero-downtime:1.0.1.
We will add a load balancer service configuration that will enable us to access the application via HTTP over the internet:
Listing 3.2 service.yaml
|
|
We will be deploying the application to DigitalOcean Kubernetes. If you don’t have an account, you can sign up using this link and get a $200 bonus credit.
From your DigitalOcean dashboard, you can create a Kubernetes cluster for your workloads and
follow the guide to set it up for kubectl access. A cluster with a single basic regular SSD node costs only $24 per month.
Once the cluster is provisioned, we will simply apply the service.yaml and deployment.yaml
files using kubectl from our local machine.
Navigate to the root of the source code and execute the following command to apply the service:
Listing 3.3 Terminal
|
|
The output will have a created message to indicate success:
Listing 3.4 Terminal
|
|
We can execute kubectl get svc to get the status of the creation. The output will be similar to this:
Listing 3.5 Terminal
|
|
Take note of the EXTERNAL-IP of the k8s-zero-downtime service, that’s the IP we will use
to connect to the deployed Spring Boot application.
Next, we will apply the deployment file, which will deploy the application in the K8s cluster:
Listing 3.6 Terminal
|
|
Just like the service creation, the output will have a created message to indicate success:
Listing 3.7 Terminal
|
|
Executing kubectl get pods --watch on the command line allows us to watch the deployment process.
Once we see the READY column having a value of 1/1, it indicates the pods are up successfully:
Listing 3.8 Terminal
|
|
Let’s test the app’s running by invoking one of the endpoints via curl:
Listing 3.9 Terminal
|
|
The HTTP 200 OK response is an indication of success, and now we can proceed to test the behaviour when applying a new change.
The current version of the deployed application will terminate immediately upon receiving a SIGTERM command from the
K8s control plane.
For us to observe this, we will run an Apache JMeter load test, and while the test is still running, we will initiate a new deployment.
We will use simple load parameters of 20 users, a ramp-up period of 20 seconds and 700 loops. This will simulate 20 users submitting concurrent requests to the application at the same time.
The JMeter test will call the POST /pets endpoint to create a Pet and then call the GET /pets endpoint to list all the Pets.
We will run the JMeter test using the following command and observe the summary output on the console:
Listing 3.10 Terminal
|
|
While the test is running, we will increase the value of spec.template.metadata.annotations.deploymentVersion by 1 to simulate a change and
apply the deployment file again using kubectl apply -f deployment/deployment.yaml.
We will then observe the output of the JMeter console as the deployment is being rolled out:
Listing 3.11 Terminal
|
|
The error highlighted in the above listing is because the Spring Boot application terminates immediately it receives the SIGTERM command, regardless of
in-flight HTTP traffic being processed.
This leads to read timeout, connection reset, a connection refused or similar errors. This can create incomplete or duplicate data and an inconsistent experience for the end user.
The severity of this issue is influenced by the industry and the size of user traffic. For critical systems, where this is not acceptable, Engineers devise means like deploying late at night and off-peak periods.
What will happen in the case of an incident that requires a change during peak traffic?
It means we have to disrupt the experience for ALL our customers or wait till midnight, by which time the damage and revenue cost may be severe.
This can easily translate to a high mean-time-to-recover (MTTR), and the organisation may not be able to retain its customers and compete adequately in the marketplace.
What if there’s a way to ensure the application completes its current request before terminating? This will grant Engineering teams the capability to deploy changes at any time of the day without the fear of creating a transient downtime.
4. Zero-Downtime Configurations
The goal is that, in case of new deployments, we want our application to complete whatever request it is processing before terminating.
We will start by adding terminationGracePeriodSeconds to the deployment spec and setting its value to 60.
The termination grace period is how long the K8s control plane will wait for our application to exit gracefully after the first SIGTERM
command. Once this period elapses, the application will be forcefully terminated using the SIGKILL command.
By default, this value is 30 seconds. However, knowing how long it typically takes our application to process a single request, we can buy additional time by providing a custom value that suits our business environment.
It is important that we define the startupProbe, livenessProbe, and readinessProbe
under the containers key in the deployment.yaml file.
We must use the correct health check URL when defining them. Do not just use the generic /actuator/health for everything.
For example, the readinessProbe should be /actuator/health/readiness.
This enables K8s determine the correct runtime state of our application and know when to send new traffic to it or otherwise.
Listing 4.1 deployment.yaml
|
|
The last bit of change is in the Spring Boot application itself. We will add a ContextClosedListener.
This listener will be called when the application receives a SIGTERM command from the container runtime.
The technique here is to sleep the thread for a period equal to the configured terminationGracePeriodSeconds.
This will keep the Spring Boot application from shutting down and allows the current HTTP request to complete processing.
Since we’ve configured the same terminationGracePeriodSeconds in the deployment config, we know that the forceful SIGKILL command
will not come until after the terminationGracePeriodSeconds.
We also know that new requests will not come in to the dying Pod, since its status will change to Terminating
following the first SIGTERM command.
These facts work together to ensure our application completes its current process before exiting, thereby ensuring a smooth experience for the customers.
Listing 4.2 ContextClosedListener.java
|
|
Take note of lines 17 to 20 in the snippet above. We added it to prevent delaying the shutdown on your local machine, while testing in the IDE or running test cases. You can remove it if you want to observe the behaviour on your machine without deploying it.
We will build a new Docker image for the application and tag it as 1.0.2, following the same procedure as in section 2 above.
We will update the deployment.yaml file to use image: seunmatt/k8s-zero-downtime:1.0.1.
To assert this new zero-downtime behaviour, we will run the same JMeter test as before and observe the summary output in the console:
Listing 4.3 Terminal
|
|
From the summary output we can see that there’s no zero errors recorded this time, unlike the one observed in section 3.
5. FAQs
-
Will this not slow down deployment time? No, it won’t. Use the right values for the termination grace period and the health check probes. If the application requires a longer time, maybe it’s time to re-evaluate the business logic, so a single request can be processed in a short time.
Moreover, it’s always a trade-off. This may not apply to ALL your applications. -
Will this work with other platforms or frameworks? The core principle described in this article will work for other applications and frameworks, too. For example, if the application Docker image has a bash shell, we could execute a shell script that will simply sleep longer than or equal to the termination grace period in the
preStoplifecycle of the container. We can even invoke another HTTP endpoint that’ll sleep before returning a response. -
Why am I still seeing downtime during rolling updates? Ensure you configure the deployment to have at least 2 replicas. Depending on the number of requests per second the service handles, you can go up to 4. If the application has a slow startup, configure the container’s startupProbe to include a commensurate initial delay in seconds. Check your APMs for how long it takes to complete a single request, and adjust the termination grace period accordingly. Remember to adjust in the deployment.yaml file and the Spring Boot application config.
-
Do I need readiness and liveness probes for zero downtime? Yes, you do. Without a properly configured readiness, Kubernetes will not be able to tell correctly when a pod is fully started and ready to accept traffic. It can, prematurely, assume that the pod is ready and route traffic to it. This will lead to 503 errors. Furthermore, it will terminate old pods prematurely since it believes the new ones are ready. This will lead to unexpected behaviours and disruption in service for end-customers.
-
Is zero downtime actually possible or just theoretical? Yes, it is possible. It requires the right configurations to be put in place, as discussed extensively in this article.
This article is sponsored by Digital Ocean as part of the DigitalOcean Ripple Writers program. All technical assessments, code, and opinions are my own and are based on my hands-on experience.
6. Conclusion
The terminationGracePeriodSeconds is a very crucial element to this whole setup.
Ensure that you set a value that applies to your business context.
In this article, we’ve used 60 seconds; some applications might need upto 120 seconds to increase confidence. This value should be influenced by how long it will take a normal process to complete.
If the application takes 60 seconds to process a single request, then the terminationGracePeriodSeconds should be greater than
60 seconds.
Without the health probes and the right replica count, the ensemble may not reliably behave as expected, so take care to configure it properly.
The complete source code, including the JMeter load test file is available on GitHub.
Thank you and Happy Coding!