StatefulSets
A StatefulSet
provides a unique identity to the Pods that they manage.
StatefulSet
s are particularly useful when your application requires a unique network identifier or persistent storage across Pod (re)scheduling or when your application needs some guarantee about the ordering of deployment and scaling.
One of the most typical examples of using StatefulSet
s is when one needs to deploy primary/secondary servers (i.e database cluster) where you need to know beforehand the hostname of each of the servers to start the cluster.
Also, when you scale up and down you want to do it in a specified order (i.e you want to start the primary node first and then the secondary node).
|
Preparation
Namespace Setup
Make sure you are in the correct namespace:
You will need to create the
If the response is:
Then you can create the namespace with:
|
kubectl config set-context --current --namespace=myspace
Watch Terminal
To be able to observe what’s going on, let’s open another terminal (Terminal 2) and watch
what happens as we run our different jobs
watch -n 1 "kubectl get pods -o wide \(1)
| awk '{print \$1 \" \" \$2 \" \" \$3 \" \" \$5 \" \" \$7}' | column -t" (2)
1 | the -o wide option allows us to see the node that the pod is schedule to |
2 | to keep the line from getting too long we’ll use awk and column to get and format only the columns we want |
Multi-node (minikube)
If your cluster is running multiple nodes and you need the stateful service to be assigned to a specific node so that you can connect to the service externally, replace NODE
with the name of the node you don’t want to run the stateful service on
NODE=devnation-02 (1)
kubectl taint node ${NODE} app=quarkus-statefulset:NoExecute
1 | Replace this and/or repeat for all the nodes in your cluster that you don’t want the stateful set to be assigned to. See also Taints and Affinity section |
StatefulSet
StatefulSet is created by using the Kubernetes StatefulSet
resource:
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: quarkus-statefulset
labels:
app: quarkus-statefulset
spec:
serviceName: "quarkus" (1)
replicas: 2
template:
metadata:
labels:
app: quarkus-statefulset
spec:
containers:
- name: quarkus-statefulset
image: quay.io/rhdevelopers/quarkus-demo:v1
ports:
- containerPort: 8080
name: web
1 | serviceName is the name of the (headless) service that governs this StatefulSet . This service must exist before the StatefulSet, and is responsible for the network identity of the set |
We can predict the hostname for any member pod of a StatefulSet
by using the following "formula":
The "ordinal index" is a number starting from 0
for the first pod created by the StatefulSet
and is incremented by one for each additional replica pod. So in this instance, the we would expect the first pod of the StatefulSet
above to have the hostname:
Finally, as mentioned above, to be able to route traffic to the pods of our StatefulSet, we also need to create a headless service:
apiVersion: v1
kind: Service
metadata:
name: quarkus (1)
labels:
app: quarkus-statefulset
spec:
ports:
- port: 8080
name: web
clusterIP: None (2)
selector:
app: quarkus-statefulset
1 | Notice that this matches the serviceName field of the StatefulSet . This must match to create the dns entry |
2 | Setting clusterIP to None is what makes the service "headless". |
Apply the following .yaml
to the cluster to create the StatefulSet
and the corresponding headless service we looked at above:
kubectl apply -f apps/kubefiles/quarkus-statefulset.yaml
You should then see the following in the watch terminal
Notice that the Pod name is the serviceName
with a -0
, as it is the first (0
th if you will) instance. This is as we explained above
Now let’s take a look at the stateful set itself
kubectl get statefulsets
NAME READY AGE
quarkus-statefulset 1/1 109s
As with deployments
we can scale statefulsets
kubectl scale sts \(1)
quarkus-statefulset --replicas=3
1 | sts is the shortname of the statefulset api-resource |
Then in the watch terminal see
NAME READY STATUS RESTARTS AGE
quarkus-statefulset-0 1/1 Running 0 95s
quarkus-statefulset-1 1/1 Running 0 2s
quarkus-statefulset-2 1/1 Running 0 1s
Notice that the name of the Pods continues to use the same nomenclature that we called out above
Also, if you check the order of events in the Kubernetes cluster, you’ll notice that the Pod name ending with -1
is created before those with higher ordinal index (e.g. with suffix of -2
).
kubectl get events --sort-by=.metadata.creationTimestamp
4m4s Normal SuccessfulCreate statefulset/quarkus-statefulset create Pod quarkus-statefulset-1 in StatefulSet quarkus-statefulset successful
4m3s Normal Pulled pod/quarkus-statefulset-1 Container image "quay.io/rhdevelopers/quarkus-demo:v1" already present on machine
4m3s Normal Scheduled pod/quarkus-statefulset-2 Successfully assigned default/quarkus-statefulset-2 to kube
4m3s Normal Created pod/quarkus-statefulset-1 Created container quarkus-statefulset
4m3s Normal Started pod/quarkus-statefulset-1 Started container quarkus-statefulset
4m3s Normal SuccessfulCreate statefulset/quarkus-statefulset create Pod quarkus-statefulset-2 in StatefulSet quarkus-statefulset successful
4m2s Normal Pulled pod/quarkus-statefulset-2 Container image "quay.io/rhdevelopers/quarkus-demo:v1" already present on machine
4m2s Normal Created pod/quarkus-statefulset-2 Created container quarkus-statefulset
4m2s Normal Started pod/quarkus-statefulset-2 Started container quarkus-statefulset
Stable Network Identities
The reason we created the headless service previously was to ensure that the pods of our stateful set can be found within the cluster (see Exposing StatefulSets for reaching services from outside the cluster).
As each Pod is created, it gets a matching DNS subdomain, taking the form: $(podname).$(governing service domain)
, where the governing service is defined by the serviceName
field on the StatefulSet[1]
We can test this by creating a pod within the cluster and doing an nslookup
from within the cluster. Run the following command to create a pod in the namespace in which we can run cluster local nslookup
queries
kubectl run -it --restart=Never --rm --image busybox:1.28 dns-test
From within the container, run the folllowing command to see if we can find a pod of our StatefulSet
nslookup quarkus-statefulset-0.quarkus
This should yield the following output (though your reported IP address will vary)
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: quarkus-statefulset-0.quarkus
Address 1: 172.17.0.3 quarkus-statefulset-0.quarkus.myspace.svc.cluster.local (1)
1 | Notice that the full address is $(podname).$(governing service domain).$(namespace) .svc.cluster.local |
You can now exit the pod (causing it to be cleaned up) by issuing the following command:
exit
So with the help of a headless service we can find any pod of the StatefulSet by using its internal DNS name as formulated by the StatefulSet and the headless service.
Exposing StatefulSets
Given that our stateful set needed to use a headless service, you’ll notice that no external IP is assigned that we can use to access our pods from outside the cluster
kubectl describe svc quarkus-statefulset
Name: quarkus-statefulset
Namespace: myspace
Labels: app=quarkus-statefulset
Annotations: <none>
Selector: app=quarkus-statefulset
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: None
IPs: None
Port: web 8080/TCP
TargetPort: 8080/TCP
Endpoints: 172.17.0.3:8080,172.17.0.4:8080,172.17.0.5:8080
Session Affinity: None
Events: <none>
Instead, only (internal) endpoints are assigned
kubectl describe endpoints quarkus-statefulset
Name: quarkus-statefulset
Namespace: myspace
Labels: app=quarkus-statefulset
service.kubernetes.io/headless=
Annotations: endpoints.kubernetes.io/last-change-trigger-time: 2021-07-20T04:45:21Z
Subsets:
Addresses: 172.17.0.3,172.17.0.4,172.17.0.5
NotReadyAddresses: <none>
Ports:
Name Port Protocol
---- ---- --------
web 8080 TCP
Events: <none>
This kind of makes sense since the whole point of using StatefulSets
is so that we can reference a specific pod by a predictable name instead of having them abstracted away by a normal (non-headless) Service
. To assist with our ability to access pods by name, kubernetes exposes a label on all StatefulSet
pods that we can use as a selector to our service
kubectl describe pod quarkus-statefulset-2
And the abbreviated output shows our label (highlighted)
Name: quarkus-statefulset-2
Namespace: myspace
Priority: 0
Node: devnation/192.168.49.2
Start Time: Tue, 20 Jul 2021 04:45:04 +0000
Labels: app=quarkus-statefulset
controller-revision-hash=quarkus-statefulset-6bf5d59699
statefulset.kubernetes.io/pod-name=quarkus-statefulset-2
Annotations: <none>
We can use this label as a selector for a service that targets this specific pod. Take a look at quarkus-statefulset-external-svc.yaml
:
If you’re running this from within VSCode you can use CTRL+p (or CMD+p on Mac OSX) to quickly open |
apiVersion: v1
kind: Service
metadata:
name: quarkus-statefulset-2
spec:
type: LoadBalancer (1)
externalTrafficPolicy: Local (2)
selector:
statefulset.kubernetes.io/pod-name: quarkus-statefulset-2 (3)
ports:
- port: 8080
name: web
1 | Indicate that this service should be exposed via LoadBalancer |
2 | Prevent excessive hops by routing traffic directly to the node |
3 | A selector that leverages the label provided automatically by the Kubernetes StatefulSet functionality |
Having reviewed the service we can now create it:
kubectl apply -f apps/kubefiles/quarkus-statefulset-external-svc.yaml
Meanwhile, in the main terminal, send a request:
IP=$(minikube ip -p devnation)
PORT=$(kubectl get service/quarkus-statefulset-2 -o jsonpath="{.spec.ports[*].nodePort}")
If using a hosted Kubernetes cluster like OpenShift then use curl and the EXTERNAL-IP address with port 8080
or get it using kubectl
:
IP=$(kubectl get service quarkus-statefulset-2 -o jsonpath="{.status.loadBalancer.ingress[0].ip}")
PORT=$(kubectl get service quarkus-statefulset-2 -o jsonpath="{.spec.ports[*].port}")
If you are in AWS, you need to get the hostname instead of ip.
|
IP=$(kubectl get service quarkus-statefulset-2 -o jsonpath="{.status.loadBalancer.ingress[0].hostname}")
Curl the Service:
curl $IP:$PORT
You should receive the following back
Supersonic Subatomic Java with Quarkus quarkus-statefulset-2:1 (1)
1 | Notice the hostname of quarkus-statefulset-2 . This is part of why we used stateful sets in the first place, so that pods would get predictable hostnames |
Scale Down and Cleanup
Finally, if we scale down to two instances, the one that is destroyed is not randomly chosen, but the one started later (quarkus-statefulset-2
).
kubectl scale sts quarkus-statefulset --replicas=2
NAME READY STATUS RESTARTS AGE
quarkus-statefulset-0 1/1 Running 0 9m22s
quarkus-statefulset-1 1/1 Running 0 7m49s
quarkus-statefulset-2 0/1 Terminating 0 7m48s
Beware when using stateful sets and services that this could break things. Remember that the service we created above referenced that exact pod in the stateful set. If you try to reach it now
curl ${IP}:${PORT}
You’ll get an error (perhaps like this one)
curl: (7) Failed to connect to 192.168.86.58 port 31834: Connection refused
Clean Up
You’ve now reached the end of this section. You can clean up all aspects of the statefulset by deleting the yaml that spawned it (as well as the external service)
kubectl delete -f apps/kubefiles/quarkus-statefulset.yaml
kubectl delete -f apps/kubefiles/quarkus-statefulset-external-svc.yaml