StatefulSets

A StatefulSet provides a unique identity to the Pods that they manage. StatefulSet s are particularly useful when your application requires a unique network identifier or persistent storage across Pod (re)scheduling or when your application needs some guarantee about the ordering of deployment and scaling.

One of the most typical examples of using StatefulSet s is when one needs to deploy primary/secondary servers (i.e database cluster) where you need to know beforehand the hostname of each of the servers to start the cluster. Also, when you scale up and down you want to do it in a specified order (i.e you want to start the primary node first and then the secondary node).

StatefulSet requires a Kubernetes Headless Service instead of a standard Kubernetes service in order for it to be accessed. We will discuss this more below

Preparation

Namespace Setup

Make sure you are in the correct namespace:

You will need to create the myspace if you haven’t already. Check for the existence of the namespace with

kubectl get ns myspace

If the response is:

Error from server (NotFound): namespaces "myspace" not found

Then you can create the namespace with:

kubectl create ns myspace
kubectl config set-context --current --namespace=myspace

Watch Terminal

To be able to observe what’s going on, let’s open another terminal (Terminal 2) and watch what happens as we run our different jobs

  • Terminal 2

watch -n 1 "kubectl get pods -o wide \(1)
  | awk '{print \$1 \" \" \$2 \" \" \$3 \" \" \$5 \" \" \$7}' | column -t" (2)
1 the -o wide option allows us to see the node that the pod is schedule to
2 to keep the line from getting too long we’ll use awk and column to get and format only the columns we want

Multi-node (minikube)

If your cluster is running multiple nodes and you need the stateful service to be assigned to a specific node so that you can connect to the service externally, replace NODE with the name of the node you don’t want to run the stateful service on

NODE=devnation-02 (1)
kubectl taint node ${NODE} app=quarkus-statefulset:NoExecute
1 Replace this and/or repeat for all the nodes in your cluster that you don’t want the stateful set to be assigned to. See also Taints and Affinity section

StatefulSet

StatefulSet is created by using the Kubernetes StatefulSet resource:

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: quarkus-statefulset
  labels:
    app: quarkus-statefulset
spec:
  serviceName: "quarkus" (1)
  replicas: 2
  template:
    metadata:
      labels:
        app: quarkus-statefulset
    spec:
      containers:
      - name: quarkus-statefulset
        image: quay.io/rhdevelopers/quarkus-demo:v1
        ports:
        - containerPort: 8080
          name: web
1 serviceName is the name of the (headless) service that governs this StatefulSet. This service must exist before the StatefulSet, and is responsible for the network identity of the set

We can predict the hostname for any member pod of a StatefulSet by using the following "formula":

StatefulSet.name + - + "ordinal index"

The "ordinal index" is a number starting from 0 for the first pod created by the StatefulSet and is incremented by one for each additional replica pod. So in this instance, the we would expect the first pod of the StatefulSet above to have the hostname:

quarkus-statefulset-0

Finally, as mentioned above, to be able to route traffic to the pods of our StatefulSet, we also need to create a headless service:

apiVersion: v1
kind: Service
metadata:
  name: quarkus (1)
  labels:
    app: quarkus-statefulset
spec:
  ports:
  - port: 8080
    name: web
  clusterIP: None (2)
  selector:
    app: quarkus-statefulset
1 Notice that this matches the serviceName field of the StatefulSet. This must match to create the dns entry
2 Setting clusterIP to None is what makes the service "headless".

Apply the following .yaml to the cluster to create the StatefulSet and the corresponding headless service we looked at above:

kubectl apply -f apps/kubefiles/quarkus-statefulset.yaml

You should then see the following in the watch terminal

  • Terminal 2

NAME                     READY   STATUS    RESTARTS   AGE
quarkus-statefulset-0   1/1     Running   0          12s

Notice that the Pod name is the serviceName with a -0, as it is the first (0 th if you will) instance. This is as we explained above

Now let’s take a look at the stateful set itself

kubectl get statefulsets
NAME                  READY   AGE
quarkus-statefulset   1/1     109s

As with deployments we can scale statefulsets

kubectl scale sts \(1)
  quarkus-statefulset --replicas=3
1 sts is the shortname of the statefulset api-resource

Then in the watch terminal see

  • Terminal 2

NAME                    READY   STATUS    RESTARTS   AGE
quarkus-statefulset-0   1/1     Running   0          95s
quarkus-statefulset-1   1/1     Running   0          2s
quarkus-statefulset-2   1/1     Running   0          1s

Notice that the name of the Pods continues to use the same nomenclature that we called out above

Also, if you check the order of events in the Kubernetes cluster, you’ll notice that the Pod name ending with -1 is created before those with higher ordinal index (e.g. with suffix of -2).

kubectl get events --sort-by=.metadata.creationTimestamp
4m4s        Normal   SuccessfulCreate          statefulset/quarkus-statefulset   create Pod quarkus-statefulset-1 in StatefulSet quarkus-statefulset successful
4m3s        Normal   Pulled                    pod/quarkus-statefulset-1         Container image "quay.io/rhdevelopers/quarkus-demo:v1" already present on machine
4m3s        Normal   Scheduled                 pod/quarkus-statefulset-2         Successfully assigned default/quarkus-statefulset-2 to kube
4m3s        Normal   Created                   pod/quarkus-statefulset-1         Created container quarkus-statefulset
4m3s        Normal   Started                   pod/quarkus-statefulset-1         Started container quarkus-statefulset
4m3s        Normal   SuccessfulCreate          statefulset/quarkus-statefulset   create Pod quarkus-statefulset-2 in StatefulSet quarkus-statefulset successful
4m2s        Normal   Pulled                    pod/quarkus-statefulset-2         Container image "quay.io/rhdevelopers/quarkus-demo:v1" already present on machine
4m2s        Normal   Created                   pod/quarkus-statefulset-2         Created container quarkus-statefulset
4m2s        Normal   Started                   pod/quarkus-statefulset-2         Started container quarkus-statefulset

Stable Network Identities

The reason we created the headless service previously was to ensure that the pods of our stateful set can be found within the cluster (see Exposing StatefulSets for reaching services from outside the cluster).

As each Pod is created, it gets a matching DNS subdomain, taking the form: $(podname).$(governing service domain), where the governing service is defined by the serviceName field on the StatefulSet[1]

We can test this by creating a pod within the cluster and doing an nslookup from within the cluster. Run the following command to create a pod in the namespace in which we can run cluster local nslookup queries

kubectl run -it --restart=Never --rm --image busybox:1.28 dns-test

From within the container, run the folllowing command to see if we can find a pod of our StatefulSet

  • Container

nslookup quarkus-statefulset-0.quarkus

This should yield the following output (though your reported IP address will vary)

Server:    10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name:      quarkus-statefulset-0.quarkus
Address 1: 172.17.0.3 quarkus-statefulset-0.quarkus.myspace.svc.cluster.local (1)
1 Notice that the full address is $(podname).$(governing service domain).$(namespace).svc.cluster.local

You can now exit the pod (causing it to be cleaned up) by issuing the following command:

exit

So with the help of a headless service we can find any pod of the StatefulSet by using its internal DNS name as formulated by the StatefulSet and the headless service.

Exposing StatefulSets

Given that our stateful set needed to use a headless service, you’ll notice that no external IP is assigned that we can use to access our pods from outside the cluster

kubectl describe svc quarkus-statefulset
Name:              quarkus-statefulset
Namespace:         myspace
Labels:            app=quarkus-statefulset
Annotations:       <none>
Selector:          app=quarkus-statefulset
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                None
IPs:               None
Port:              web  8080/TCP
TargetPort:        8080/TCP
Endpoints:         172.17.0.3:8080,172.17.0.4:8080,172.17.0.5:8080
Session Affinity:  None
Events:            <none>

Instead, only (internal) endpoints are assigned

kubectl describe endpoints quarkus-statefulset
Name:         quarkus-statefulset
Namespace:    myspace
Labels:       app=quarkus-statefulset
              service.kubernetes.io/headless=
Annotations:  endpoints.kubernetes.io/last-change-trigger-time: 2021-07-20T04:45:21Z
Subsets:
  Addresses:          172.17.0.3,172.17.0.4,172.17.0.5
  NotReadyAddresses:  <none>
  Ports:
    Name  Port  Protocol
    ----  ----  --------
    web   8080  TCP

Events:  <none>

This kind of makes sense since the whole point of using StatefulSets is so that we can reference a specific pod by a predictable name instead of having them abstracted away by a normal (non-headless) Service. To assist with our ability to access pods by name, kubernetes exposes a label on all StatefulSet pods that we can use as a selector to our service

kubectl describe pod quarkus-statefulset-2

And the abbreviated output shows our label (highlighted)

Name:         quarkus-statefulset-2
Namespace:    myspace
Priority:     0
Node:         devnation/192.168.49.2
Start Time:   Tue, 20 Jul 2021 04:45:04 +0000
Labels:       app=quarkus-statefulset
              controller-revision-hash=quarkus-statefulset-6bf5d59699
              statefulset.kubernetes.io/pod-name=quarkus-statefulset-2
Annotations:  <none>

We can use this label as a selector for a service that targets this specific pod. Take a look at quarkus-statefulset-external-svc.yaml:

If you’re running this from within VSCode you can use CTRL+p (or CMD+p on Mac OSX) to quickly open quarkus-statefulset-external-svc.yaml

quarkus-statefulset-external-svc.yaml
apiVersion: v1
kind: Service
metadata:
  name: quarkus-statefulset-2
spec:
  type: LoadBalancer (1)
  externalTrafficPolicy: Local (2)
  selector:
    statefulset.kubernetes.io/pod-name: quarkus-statefulset-2 (3)
  ports:
  - port: 8080
    name: web
1 Indicate that this service should be exposed via LoadBalancer
2 Prevent excessive hops by routing traffic directly to the node
3 A selector that leverages the label provided automatically by the Kubernetes StatefulSet functionality

Having reviewed the service we can now create it:

kubectl apply -f apps/kubefiles/quarkus-statefulset-external-svc.yaml

Meanwhile, in the main terminal, send a request:

  • Minikube

  • Hosted

IP=$(minikube ip -p devnation)
PORT=$(kubectl get service/quarkus-statefulset-2 -o jsonpath="{.spec.ports[*].nodePort}")

If using a hosted Kubernetes cluster like OpenShift then use curl and the EXTERNAL-IP address with port 8080 or get it using kubectl:

IP=$(kubectl get service quarkus-statefulset-2 -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
PORT=$(kubectl get service quarkus-statefulset-2 -o jsonpath="{.spec.ports[*].port}")
If you are in AWS, you need to get the hostname instead of ip.
IP=$(kubectl get service quarkus-statefulset-2 -o jsonpath="{.status.loadBalancer.ingress[0].hostname}")

Curl the Service:

curl $IP:$PORT

You should receive the following back

Supersonic Subatomic Java with Quarkus quarkus-statefulset-2:1 (1)
1 Notice the hostname of quarkus-statefulset-2. This is part of why we used stateful sets in the first place, so that pods would get predictable hostnames

Scale Down and Cleanup

Finally, if we scale down to two instances, the one that is destroyed is not randomly chosen, but the one started later (quarkus-statefulset-2).

kubectl scale sts quarkus-statefulset --replicas=2
  • Terminal 2

NAME                    READY   STATUS        RESTARTS   AGE
quarkus-statefulset-0   1/1     Running       0          9m22s
quarkus-statefulset-1   1/1     Running       0          7m49s
quarkus-statefulset-2   0/1     Terminating   0          7m48s

Beware when using stateful sets and services that this could break things. Remember that the service we created above referenced that exact pod in the stateful set. If you try to reach it now

curl ${IP}:${PORT}

You’ll get an error (perhaps like this one)

curl: (7) Failed to connect to 192.168.86.58 port 31834: Connection refused

Clean Up

You’ve now reached the end of this section. You can clean up all aspects of the statefulset by deleting the yaml that spawned it (as well as the external service)

kubectl delete -f apps/kubefiles/quarkus-statefulset.yaml
kubectl delete -f apps/kubefiles/quarkus-statefulset-external-svc.yaml

1. See also the official Kubernetes documentation here