Post Thumbnail

Deploy Spring Boot Applications with Zero Downtime on Kubernetes

1. Overview

In this article, we will learn how to deploy a Spring Boot application to a Kubernetes cluster without dropping any production traffic, including the ones in flight.

The default Kubernetes deployment strategy is RollingUpdate. This already ensures there’s no protracted downtime, during deployments. However, it does not sufficiently take care of in-flight requests and/or processes. This is what we will be addressing in this article.

Kubernetes, by default, forcefully terminates a container after 30s grace period. For a system with high traffic, this will cause read timeout or connection reset exceptions for the user whose request is being processed by the abruptly terminated pod.

In my experience, deployment of new changes for such a service is typically planned for off-peak periods or late into the night. This reduces the team’s velocity and increases the cost of managing the service.

You’ll learn how to mitigate this challenge by configuring health check endpoints, termination grace period and making the Spring Boot application listen to the context closed event.

1.1 Prerequisites

To follow this tutorial on deploying a Spring Boot application with zero downtime, you will need the following:

  1. A DigitalOcean account.
  2. A running DigitalOcean Kubernetes cluster. If you have not yet set one up, follow this product documentation on How to Create a DigitalOcean Kubernetes Cluster.
  3. The kubectl command-line tool, installed and configured to connect to your cluster. This process is covered during the cluster setup.
  4. Java and Maven installed on your local machine to build the Spring Boot application.
  5. Docker installed on your local machine to build and push the container image. You can follow our guide on How to Install and Use Docker on Ubuntu
  6. A Docker Hub or private container registry account to store your Docker image. For instructions on setting up a private registry, refer to How to Set Up a Private Docker Registry on Ubuntu.

1.2 Key Takeaways

  1. The default deployment strategy for Kubernetes does not account for in-flight requests, and the grace period is usually too small to have the desired effect
  2. Change the deployment’s spec.template.spec.terminationGracePeriodSeconds to a contextually relevant value to buy more time for the app to complete its processes before being terminated.
  3. Configure the deployment’s health checks to have relevant values. E.g. use /actuator/health/readiness for readinessProbe and not just /actuator/health
  4. Implement a listener for the ContextClosed event in SpringBoot; Perform a thread sleep for a duration <= the configured terminationGracePeriodSeconds
  5. Use the right number of replicas that suits the throughput of the application.

2. Application Setup

For demonstration, we will create a Spring Boot 3.5.11 application with the help of Spring Initilzer.

The application will have two endpoints for creating and listing Pet resource and use an in-memory H2 database.

The pom.xml includes the Google Jib plugin, which we will use to create a Docker image for the Spring Boot application.

Listing 2.1 pom.xml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
<plugin>
    <groupId>com.google.cloud.tools</groupId>
    <artifactId>jib-maven-plugin</artifactId>
    <version>3.5.1</version>
    <configuration>
        <to>
            <image>seunmatt/${project.artifactId}:${project.version}</image>
        </to>
        <from>
            <image>eclipse-temurin:25-jdk-alpine</image>
            <platforms>
                <platform>
                    <architecture>amd64</architecture>
                    <os>linux</os>
                </platform>
            </platforms>
        </from>
        <container>
            <ports>
                <port>8080</port>
            </ports>
            <jvmFlags>
                <jvmFlag>${jvmFlag.use-container-support}</jvmFlag>
                <jvmFlag>${jvmFlag.max-ram-percentage}</jvmFlag>
            </jvmFlags>
        </container>
    </configuration>
</plugin>

To build a Docker image, Docker Desktop or a similar tool needs to be running. Then, we will run the following Maven command:

Listing 2.2 Terminal

1
mvn compile jib:dockerBuild

The name and tag of the resulting Docker image will be what we configured in the Google Jib plugin’s <image> element in the pom.xml. For this demo project, the name of the Docker image will be seunmatt/k8s-zero-downtime:1.0.1.

The resulting image will be visible in your Docker Desktop or other similar tool. From the command line, we can list all the images using:

Listing 2.3 Terminal

1
docker images

From Docker Desktop, you can push the image to Docker Hub or via the command line:

Listing 2.4 Terminal

1
2
3
docker tag seunmatt/k8s-zero-downtime:1.0.1 registry.hub.docker.com/seunmatt/k8s-zero-downtime:1.0.1

docker push registry.hub.docker.com/seunmatt/k8s-zero-downtime:1.0.1

Pushing the Docker image to a remote repository is required for the next steps, as we will need to pull those images from a Kubernetes deployment.

3. Kubernetes Configurations

Let’s start by adding a Kubernetes (K8s) deployment file to our application. We will add it as a new file in the deployment directory of the source code:

Listing 3.1 deployment.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: k8s-zero-downtime
  name: k8s-zero-downtime
spec:
  replicas: 2
  selector:
    matchLabels:
      app: k8s-zero-downtime
  template:
    metadata:
      labels:
        app: k8s-zero-downtime
      annotations:
        deploymentVersion: '1'
    spec:
      containers:
        - name: k8s-zero-downtime
          image: seunmatt/k8s-zero-downtime:1.0.1
          imagePullPolicy: Always
          resources:
            limits:
              memory: "1Gi"
            requests:
              memory: "500Mi"
              cpu: "100m"
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          startupProbe:
            httpGet:
              path: /actuator/health/readiness
              port: http
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: http
            initialDelaySeconds: 10
            periodSeconds: 5
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5

We configured the deployment to have two replicas and use the Docker image we published in section 2 above: seunmatt/k8s-zero-downtime:1.0.1.

We will add a load balancer service configuration that will enable us to access the application via HTTP over the internet:

Listing 3.2 service.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
apiVersion: v1
kind: Service
metadata:
  labels:
    app: k8s-zero-downtime
  name: k8s-zero-downtime
spec:
  ports:
    - name: "80"
      port: 80
      protocol: TCP
      targetPort: 8080
  selector:
    app: k8s-zero-downtime
  type: LoadBalancer

We will be deploying the application to DigitalOcean Kubernetes. If you don’t have an account, you can sign up using this link and get a $200 bonus credit.

From your DigitalOcean dashboard, you can create a Kubernetes cluster for your workloads and follow the guide to set it up for kubectl access. A cluster with a single basic regular SSD node costs only $24 per month.

Once the cluster is provisioned, we will simply apply the service.yaml and deployment.yaml files using kubectl from our local machine.

Navigate to the root of the source code and execute the following command to apply the service:

Listing 3.3 Terminal

1
kubectl apply -f deployment/service.yaml

The output will have a created message to indicate success:

Listing 3.4 Terminal

1
service/k8s-zero-downtime created

We can execute kubectl get svc to get the status of the creation. The output will be similar to this:

Listing 3.5 Terminal

1
2
3
NAME                TYPE           CLUSTER-IP      EXTERNAL-IP      PORT(S)        AGE
k8s-zero-downtime   LoadBalancer   10.109.12.120   129.212.160.13   80:30415/TCP   107s
kubernetes          ClusterIP      10.109.0.1      <none>           443/TCP        9m39s

Take note of the EXTERNAL-IP of the k8s-zero-downtime service, that’s the IP we will use to connect to the deployed Spring Boot application.

Next, we will apply the deployment file, which will deploy the application in the K8s cluster:

Listing 3.6 Terminal

1
kubectl apply -f deployment/deployment.yaml

Just like the service creation, the output will have a created message to indicate success:

Listing 3.7 Terminal

1
deployment.apps/k8s-zero-downtime created

Executing kubectl get pods --watch on the command line allows us to watch the deployment process. Once we see the READY column having a value of 1/1, it indicates the pods are up successfully:

Listing 3.8 Terminal

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
NAME                                 READY   STATUS    RESTARTS   AGE
k8s-zero-downtime-55c5ddb76f-vwj65   0/1     Pending   0          0s
k8s-zero-downtime-55c5ddb76f-vwj65   0/1     Pending   0          0s
k8s-zero-downtime-55c5ddb76f-hbv89   0/1     Pending   0          0s
k8s-zero-downtime-55c5ddb76f-hbv89   0/1     Pending   0          0s
k8s-zero-downtime-55c5ddb76f-vwj65   0/1     ContainerCreating   0          0s
k8s-zero-downtime-55c5ddb76f-hbv89   0/1     ContainerCreating   0          0s
k8s-zero-downtime-55c5ddb76f-hbv89   0/1     Running             0          8s
k8s-zero-downtime-55c5ddb76f-vwj65   0/1     Running             0          8s
k8s-zero-downtime-55c5ddb76f-vwj65   0/1     Running             0          26s
k8s-zero-downtime-55c5ddb76f-vwj65   1/1     Running             0          26s
k8s-zero-downtime-55c5ddb76f-hbv89   0/1     Running             0          31s
k8s-zero-downtime-55c5ddb76f-hbv89   1/1     Running             0          31s

Let’s test the app’s running by invoking one of the endpoints via curl:

Listing 3.9 Terminal

1
2
3
curl http://129.212.160.13/pets

{"status":true,"message":"Operation Successful","data":[],"code":200}%

The HTTP 200 OK response is an indication of success, and now we can proceed to test the behaviour when applying a new change.

The current version of the deployed application will terminate immediately upon receiving a SIGTERM command from the K8s control plane.

For us to observe this, we will run an Apache JMeter load test, and while the test is still running, we will initiate a new deployment.

We will use simple load parameters of 20 users, a ramp-up period of 20 seconds and 700 loops. This will simulate 20 users submitting concurrent requests to the application at the same time.

The JMeter test will call the POST /pets endpoint to create a Pet and then call the GET /pets endpoint to list all the Pets.

We will run the JMeter test using the following command and observe the summary output on the console:

Listing 3.10 Terminal

1
jmeter -n -t k8s_zero_downtime_test.jmx -l log.jtl -e -o ./outputs

While the test is running, we will increase the value of spec.template.metadata.annotations.deploymentVersion by 1 to simulate a change and apply the deployment file again using kubectl apply -f deployment/deployment.yaml.

We will then observe the output of the JMeter console as the deployment is being rolled out:

Listing 3.11 Terminal

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
WARNING: package sun.awt.X11 not in java.desktop
Mar 08 17:06:14 Creating summariser <summary>
Mar 08 17:06:14 Created the tree successfully using k8s_zero_downtime_test.jmx
Mar 08 17:06:14 Starting standalone test @ 2026 Mar 8 17:06:14 WAT (1772985974768)
Mar 08 17:06:14 Waiting for possible Shutdown/StopTestNow/HeapDump/ThreadDump message on port 4445
Mar 08 17:06:30 summary +    394 in 00:00:15 =   25.9/s Avg:   278 Min:   146 Max:  3449 Err:     0 (0.00%) Active: 15 Started: 15 Finished: 0
Mar 08 17:07:00 summary +   1271 in 00:00:30 =   42.4/s Avg:   461 Min:   141 Max:  8509 Err:     0 (0.00%) Active: 20 Started: 20 Finished: 0
Mar 08 17:07:00 summary =   1665 in 00:00:45 =   36.8/s Avg:   418 Min:   141 Max:  8509 Err:     0 (0.00%)
Mar 08 17:07:30 summary +   1230 in 00:00:30 =   41.0/s Avg:   452 Min:   143 Max:  8355 Err:     0 (0.00%) Active: 20 Started: 20 Finished: 0
Mar 08 17:07:30 summary =   2895 in 00:01:15 =   38.5/s Avg:   433 Min:   141 Max:  8509 Err:     0 (0.00%)
Mar 08 17:08:00 summary +   1321 in 00:00:30 =   44.0/s Avg:   460 Min:    74 Max:  8197 Err:    26 (1.97%) Active: 20 Started: 20 Finished: 0
....

The error highlighted in the above listing is because the Spring Boot application terminates immediately it receives the SIGTERM command, regardless of in-flight HTTP traffic being processed.

This leads to read timeout, connection reset, a connection refused or similar errors. This can create incomplete or duplicate data and an inconsistent experience for the end user.

The severity of this issue is influenced by the industry and the size of user traffic. For critical systems, where this is not acceptable, Engineers devise means like deploying late at night and off-peak periods.

What will happen in the case of an incident that requires a change during peak traffic?

It means we have to disrupt the experience for ALL our customers or wait till midnight, by which time the damage and revenue cost may be severe.

This can easily translate to a high mean-time-to-recover (MTTR), and the organisation may not be able to retain its customers and compete adequately in the marketplace.

What if there’s a way to ensure the application completes its current request before terminating? This will grant Engineering teams the capability to deploy changes at any time of the day without the fear of creating a transient downtime.

4. Zero-Downtime Configurations

The goal is that, in case of new deployments, we want our application to complete whatever request it is processing before terminating.

We will start by adding terminationGracePeriodSeconds to the deployment spec and setting its value to 60.

The termination grace period is how long the K8s control plane will wait for our application to exit gracefully after the first SIGTERM command. Once this period elapses, the application will be forcefully terminated using the SIGKILL command.

By default, this value is 30 seconds. However, knowing how long it typically takes our application to process a single request, we can buy additional time by providing a custom value that suits our business environment.

It is important that we define the startupProbe, livenessProbe, and readinessProbe
under the containers key in the deployment.yaml file.

We must use the correct health check URL when defining them. Do not just use the generic /actuator/health for everything. For example, the readinessProbe should be /actuator/health/readiness.

This enables K8s determine the correct runtime state of our application and know when to send new traffic to it or otherwise.

Listing 4.1 deployment.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: k8s-zero-downtime
  name: k8s-zero-downtime
spec:
  replicas: 2
  selector:
    matchLabels:
      app: k8s-zero-downtime
  template:
    metadata:
      labels:
        app: k8s-zero-downtime
      annotations:
        deploymentVersion: '9'
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: k8s-zero-downtime
          image: seunmatt/k8s-zero-downtime:1.0.2
          imagePullPolicy: Always
          resources:
            limits:
              memory: "1Gi"
            requests:
              memory: "500Mi"
              cpu: "100m"
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          startupProbe:
            httpGet:
              path: /actuator/health/readiness
              port: http
            initialDelaySeconds: 10
            periodSeconds: 5
          livenessProbe:
            httpGet:
              path: /actuator/health/liveness
              port: http
            initialDelaySeconds: 10
            periodSeconds: 5
          readinessProbe:
            httpGet:
              path: /actuator/health/readiness
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5

The last bit of change is in the Spring Boot application itself. We will add a ContextClosedListener. This listener will be called when the application receives a SIGTERM command from the container runtime.

The technique here is to sleep the thread for a period equal to the configured terminationGracePeriodSeconds. This will keep the Spring Boot application from shutting down and allows the current HTTP request to complete processing.

Since we’ve configured the same terminationGracePeriodSeconds in the deployment config, we know that the forceful SIGKILL command will not come until after the terminationGracePeriodSeconds.

We also know that new requests will not come in to the dying Pod, since its status will change to Terminating following the first SIGTERM command.

These facts work together to ensure our application completes its current process before exiting, thereby ensuring a smooth experience for the customers.

Listing 4.2 ContextClosedListener.java

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
@Component
public class ContextClosedListener {

    @Value("${app.lifecycle.preStopDelayInSeconds}")
    private int preStopDelayInSeconds;
    private final Environment environment;
    private static final Logger logger = LoggerFactory.getLogger(ContextClosedListener.class);

    public ContextClosedListener(Environment environment) {
        this.environment = environment;
    }


    @EventListener(ContextClosedEvent.class)
    public void onContextClosed(ContextClosedEvent event) {

        if(Objects.isNull(CloudPlatform.getActive(environment))) {
            //we do not want to delay shutdown when running tests or in a local env.
            return;
        }

        try {
            logger.info("Delaying shutdown for {} seconds", preStopDelayInSeconds);
            Thread.sleep(Duration.ofSeconds(preStopDelayInSeconds));
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }

    }
}

Take note of lines 17 to 20 in the snippet above. We added it to prevent delaying the shutdown on your local machine, while testing in the IDE or running test cases. You can remove it if you want to observe the behaviour on your machine without deploying it.

We will build a new Docker image for the application and tag it as 1.0.2, following the same procedure as in section 2 above. We will update the deployment.yaml file to use image: seunmatt/k8s-zero-downtime:1.0.1.

To assert this new zero-downtime behaviour, we will run the same JMeter test as before and observe the summary output in the console:

Listing 4.3 Terminal

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
jmeter -n -t k8s_zero_downtime_test.jmx -l log.jtl -e -o ./outputs

WARNING: package sun.awt.X11 not in java.desktop
Mar 08 19:09:08 Creating summariser <summary>
Mar 08 19:09:08 Created the tree successfully using k8s_zero_downtime_test.jmx
Mar 08 19:09:08 Starting standalone test @ 2026 Mar 8 19:09:08 WAT (1772993348621)
Mar 08 19:09:08 Waiting for possible Shutdown/StopTestNow/HeapDump/ThreadDump message on port 4445
Mar 08 19:09:30 summary +    555 in 00:00:21 =   26.0/s Avg:   372 Min:   146 Max:  8148 Err:     0 (0.00%) Active: 20 Started: 20 Finished: 0
Mar 08 19:10:00 summary +   1460 in 00:00:30 =   48.7/s Avg:   409 Min:   143 Max:  8928 Err:     0 (0.00%) Active: 20 Started: 20 Finished: 0
Mar 08 19:10:00 summary =   2015 in 00:00:51 =   39.2/s Avg:   399 Min:   143 Max:  8928 Err:     0 (0.00%)
Mar 08 19:10:30 summary +   1221 in 00:00:30 =   40.7/s Avg:   475 Min:   140 Max:  8294 Err:     0 (0.00%) Active: 20 Started: 20 Finished: 0
Mar 08 19:10:30 summary =   3236 in 00:01:21 =   39.8/s Avg:   427 Min:   140 Max:  8928 Err:     0 (0.00%)
Mar 08 19:11:00 summary +   1368 in 00:00:30 =   45.6/s Avg:   440 Min:   140 Max:  9968 Err:     0 (0.00%) Active: 20 Started: 20 Finished: 0
....
Mar 08 19:20:45 Tidying up ...    @ 2026 Mar 8 19:20:45 WAT (1772994045812)
Mar 08 19:20:46 ... end of run

From the summary output we can see that there’s no zero errors recorded this time, unlike the one observed in section 3.

5. FAQs

  1. Will this not slow down deployment time? No, it won’t. Use the right values for the termination grace period and the health check probes. If the application requires a longer time, maybe it’s time to re-evaluate the business logic, so a single request can be processed in a short time.
    Moreover, it’s always a trade-off. This may not apply to ALL your applications.

  2. Will this work with other platforms or frameworks? The core principle described in this article will work for other applications and frameworks, too. For example, if the application Docker image has a bash shell, we could execute a shell script that will simply sleep longer than or equal to the termination grace period in the preStop lifecycle of the container. We can even invoke another HTTP endpoint that’ll sleep before returning a response.

  3. Why am I still seeing downtime during rolling updates? Ensure you configure the deployment to have at least 2 replicas. Depending on the number of requests per second the service handles, you can go up to 4. If the application has a slow startup, configure the container’s startupProbe to include a commensurate initial delay in seconds. Check your APMs for how long it takes to complete a single request, and adjust the termination grace period accordingly. Remember to adjust in the deployment.yaml file and the Spring Boot application config.

  4. Do I need readiness and liveness probes for zero downtime? Yes, you do. Without a properly configured readiness, Kubernetes will not be able to tell correctly when a pod is fully started and ready to accept traffic. It can, prematurely, assume that the pod is ready and route traffic to it. This will lead to 503 errors. Furthermore, it will terminate old pods prematurely since it believes the new ones are ready. This will lead to unexpected behaviours and disruption in service for end-customers.

  5. Is zero downtime actually possible or just theoretical? Yes, it is possible. It requires the right configurations to be put in place, as discussed extensively in this article.

This article is sponsored by Digital Ocean as part of the DigitalOcean Ripple Writers program. All technical assessments, code, and opinions are my own and are based on my hands-on experience.

6. Conclusion

The terminationGracePeriodSeconds is a very crucial element to this whole setup. Ensure that you set a value that applies to your business context.

In this article, we’ve used 60 seconds; some applications might need upto 120 seconds to increase confidence. This value should be influenced by how long it will take a normal process to complete.

If the application takes 60 seconds to process a single request, then the terminationGracePeriodSeconds should be greater than 60 seconds.

Without the health probes and the right replica count, the ensemble may not reliably behave as expected, so take care to configure it properly.

The complete source code, including the JMeter load test file is available on GitHub.

Thank you and Happy Coding!

Seun Matt

Results-driven Engineer, dedicated to building elite teams that consistently achieve business objectives and drive profitability. With over 9 years of experience, spanning different facets of the FinTech space; including digital lending, consumer payment, collections and payment gateway using Java/Spring Boot technologies, PHP and Ruby on Rails