Jobs & CronJobs

Most of the time, you are using Kubernetes as a platform to run "long" processes where their purpose is to serve responses for a given incoming request.

But Kubernetes also lets you run processes that their purpose is to execute some logic (i.e. update database, batch processing, …) and die.

Kubernetes Jobs are tasks that execute some logic once.

Kubernetes CronJobs are Jobs that are repeated following a Cron pattern.

Preparation

Namespace Setup

Make sure you are in the correct namespace:

You will need to create the myspace if you haven’t already. Check for the existence of the namespace with

kubectl get ns myspace

If the response is:

Error from server (NotFound): namespaces "myspace" not found

Then you can create the namespace with:

kubectl create ns myspace

kubectl config set-context --current --namespace=myspace

Watch Terminal

To be able to observe what’s going on, let’s open another terminal (Terminal 2) and watch what happens as we run our different jobs

Terminal 2

watch -n 1 "kubectl get pods -o wide \(1)
  | awk '{print \$1 \" \" \$2 \" \" \$3 \" \" \$5 \" \" \$7}' | column -t" (2)

1	the `-o wide` option allows us to see the node that the pod is schedule to
2	to keep the line from getting too long we’ll use `awk` and `column` to get and format only the columns we want

Jobs

A Job is created using the Kubernetes Job resource. To examine one, open the whalesay-job.yaml. Here are the interesting aspects of this file:

If you’re running this from within VSCode you can use CTRL+p (or CMD+p on Mac OSX) to quickly open whalesay-job.yaml

whalesay-job.yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: whale-say-job (1)
spec:
  template:
    spec:
      containers:
      - name: whale-say-container
        image: docker/whalesay
        command: ["cowsay","Hello DevNation"]
      restartPolicy: Never

1	The name of the job will be used as the value of a label `job-name` on any pods that are spawned by this job definition.

Terminal 1

kubectl apply -f apps/kubefiles/whalesay-job.yaml

This should yield the following output (in successive refreshes) in Terminal 2

Terminal 2

NAME                 READY  STATUS             AGE  NODE
whale-say-job-m8vxt  0/1    ContainerCreating  14s  devnation-m02

NAME                 READY  STATUS     AGE  NODE
whale-say-job-m8vxt  1/1    Running    80s  devnation-m02

NAME                 READY  STATUS     AGE  NODE
whale-say-job-m8vxt  0/1    Completed  85s  devnation-m02

You can get jobs as any other Kubernetes resource:

Terminal 1

kubectl get jobs

NAME            COMPLETIONS   DURATION   AGE
whale-say-job   1/1           20s        36s

Since the job is run by a pod, to get the output of the job execution, we need only to get the output of the pod’s logs:

kubectl logs \
  -l job-name=whale-say-job \(1)
  --tail=-1 (2)

1	This is allowing us to look for any pod labeled with `job-name` (see above) set to `whale-say-job`
2	`--tail` tells the log command how many lines from the end of the (pod’s) log to return. So that we can see all the whimsy in this job pod’s message, we set this to `-1` to see all the lines^[1]

 _________________
< Hello DevNation >
 -----------------
    \
     \
      \
                    ##        .
              ## ## ##       ==
           ## ## ## ##      ===
       /""""""""""""""""___/ ===
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
       \______ o          __/
        \    \        __/
          \____\______/

Clean Up

Terminal 1

kubectl delete -f apps/kubefiles/whalesay-job.yaml

CronJobs

A CronJob is defined using the Kubernetes CronJob resource. The name cronjob comes from Linux and represents some sort of batch process that is scheduled to run once or repeatedly. This concept has been translated into Kubernetes as we can see in the whalesay-cronjob.yaml file:

If you’re running this from within VSCode you can use CTRL+p (or CMD+p on Mac OSX) to quickly open whalesay-cronjob.yaml

whalesay-cronjob.yaml

apiVersion: batch/v1
kind: CronJob
metadata:
  name: whale-say-cronjob
spec:
  schedule: "* * * * *" (1)
  jobTemplate:
    spec:
      template:
        metadata:
          labels:
            job-type: whale-say (2)
        spec:
          containers:
          - name: whale-say-container
            image: docker/whalesay
            command: ["cowsay","Hello DevNation"]
          restartPolicy: Never

1	This string represents a job is executed every minute.
2	Here we specify our own additional label we’d like applied to `jobs` and `pods` created by the `cronjob`. Even though the `job-name` label will still exist, it will contain a guid on every indication meaning we can’t predict what the value is a priori

Terminal 1

kubectl apply -f apps/kubefiles/whalesay-cronjob.yaml

But then if we look to our watch window in Terminal 2

Terminal 2

NAME                  READY   STATUS      RESTARTS   AGE

No Pod is running as CronJob is setting up (and is checked only once every 10 seconds or so, see warning below)

While we’re waiting for our cronjob to run, we can use Terminal 1 to watch how the cronjob is changing:

Terminal 1

kubectl get cronjobs -w (1)

1	the `-w` flag says to watch the output (sort of like what we’re doing in the Terminal 2) but only post back when the state of the observed resource’s (in this case the `cronjob`) state changes.

Here is some representative output after waiting almost 3 minutes (notice the job restarts)

NAME                SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
whale-say-cronjob   * * * * *   False     1        0s              20s (1)
whale-say-cronjob   * * * * *   False     0        31s             51s
whale-say-cronjob   * * * * *   False     1        0s              80s (2)
whale-say-cronjob   * * * * *   False     0        23s             103s
whale-say-cronjob   * * * * *   False     1        1s              2m21s

1	The first invocation took a while to start, this was not a function of the `cronjob` schedule
2	Notice that the next time the job is active is about 60s after the first job was active (by AGE). And the job after that has an age of ~60s after that

You’ll notice that every time the cronjob moves to ACTIVE (see highlight above),you should see the following in Terminal 2;

Terminal 2

NAME                              READY  STATUS     AGE  NODE
whale-say-cronjob-27108480-2ws6k  0/1    Completed  46s  devnation-m02

Per the official Kubernetes documentation: A cron job creates a job object about once per execution time of its schedule. We say "about" because there are certain circumstances where two jobs might be created, or no job might be created. We attempt to make these rare, but do not completely prevent them. Therefore, jobs should be idempotent.

Let’s examine our cronjob by using the describe subcommand. Use CTRL+c to cancel the kubectl get cronjobs -w command and replace with the following:

kubectl describe cronjobs

You should then see something like this

Name:                          whale-say-cronjob
Namespace:                     myspace
Labels:                        <none>
Annotations:                   <none>
Schedule:                      * * * * *
Concurrency Policy:            Allow
Suspend:                       False
Successful Job History Limit:  3 (1)
Failed Job History Limit:      1
Starting Deadline Seconds:     <unset>
Selector:                      <unset>
Parallelism:                   <unset>
Completions:                   <unset>
Pod Template:
  Labels:  job-type=whale-say
  Containers:
   whale-say-container:
    Image:      docker/whalesay
    Port:       <none>
    Host Port:  <none>
    Command:
      cowsay
      Hello DevNation
    Environment:     <none>
    Mounts:          <none>
  Volumes:           <none>
Last Schedule Time:  Sat, 17 Jul 2021 08:06:00 +0000 (2)
Active Jobs:         whale-say-cronjob-27108486
Events:
  Type    Reason            Age    From                Message
  ----    ------            ----   ----                -------
  Normal  SuccessfulCreate  6m21s  cronjob-controller  Created job whale-say-cronjob-27108480
  Normal  SawCompletedJob   6m1s   cronjob-controller  Saw completed job: whale-say-cronjob-27108480, status: Complete
  Normal  SuccessfulCreate  5m21s  cronjob-controller  Created job whale-say-cronjob-27108481
  Normal  SawCompletedJob   4m56s  cronjob-controller  Saw completed job: whale-say-cronjob-27108481, status: Complete
  Normal  SuccessfulCreate  4m21s  cronjob-controller  Created job whale-say-cronjob-27108482
  Normal  SawCompletedJob   3m56s  cronjob-controller  Saw completed job: whale-say-cronjob-27108482, status: Complete
  Normal  SuccessfulCreate  3m21s  cronjob-controller  Created job whale-say-cronjob-27108483
  Normal  SawCompletedJob   2m48s  cronjob-controller  Saw completed job: whale-say-cronjob-27108483, status: Complete
  Normal  SuccessfulDelete  2m46s  cronjob-controller  Deleted job whale-say-cronjob-27108480
  Normal  SuccessfulCreate  2m20s  cronjob-controller  Created job whale-say-cronjob-27108484
  Normal  SawCompletedJob   104s   cronjob-controller  Saw completed job: whale-say-cronjob-27108484, status: Complete
  Normal  SuccessfulDelete  101s   cronjob-controller  Deleted job whale-say-cronjob-27108481
  Normal  SuccessfulCreate  81s    cronjob-controller  Created job whale-say-cronjob-27108485
  Normal  SawCompletedJob   54s    cronjob-controller  Saw completed job: whale-say-cronjob-27108485, status: Complete
  Normal  SuccessfulDelete  52s    cronjob-controller  Deleted job whale-say-cronjob-27108482
  Normal  SuccessfulCreate  21s    cronjob-controller  Created job whale-say-cronjob-27108486
  Normal  SawCompletedJob   1s     cronjob-controller  Saw completed job: whale-say-cronjob-27108486, status: Complete

1	Kubernetes cleans up jobs after a certain amount of time
2	Notice that the Last Schedule Time shows the last time a job was executed.

It is important to notice that a CronJob creates a job (which, in turn, creates pods) whenever the schedule is activated:

Terminal 1

kubectl get jobs

With example output after the cronjob has been around for more than 3 minutes:

NAME                         COMPLETIONS   DURATION   AGE
whale-say-cronjob-27108487   1/1           19s        2m37s
whale-say-cronjob-27108488   1/1           20s        97s
whale-say-cronjob-27108489   1/1           21s        37s

Finally, we can see the effect of job history by logging for all our jobs

Terminal 1

kubectl logs \
  -l job-type=whale-say \(1)
  --tail=-1

1	This time we’re looking to get the logs on anything created with the label `job-type` (our custom label from above) set to `whale`

NOTE

It would less specific but we could find out whale job logs without a custom label by instead not looking to match the value on the label like this:

kubectl logs -l job-name --tail=-1

This basically states that we should match any pod with a label named job-name

 _________________
< Hello DevNation >
 -----------------
    \
     \
      \
                    ##        .
              ## ## ##       ==
           ## ## ## ##      ===
       /""""""""""""""""___/ ===
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
       \______ o          __/
        \    \        __/
          \____\______/
 _________________
< Hello DevNation >
 -----------------
    \
     \
      \
                    ##        .
              ## ## ##       ==
           ## ## ## ##      ===
       /""""""""""""""""___/ ===
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
       \______ o          __/
        \    \        __/
          \____\______/
 _________________
< Hello DevNation >
 -----------------
    \
     \
      \
                    ##        .
              ## ## ##       ==
           ## ## ## ##      ===
       /""""""""""""""""___/ ===
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
       \______ o          __/
        \    \        __/
          \____\______/

Clean Up

Terminal 1

kubectl delete -f apps/kubefiles/whalesay-cronjob.yaml

1. Normally --tail is set to -1 by default, but that’s only when requesting logs from a single specific resource. When there is the potential to return multiple resources' logs (as is the case here when we’re asking for logs by label) the number of lines returned from each resource’s logs are limited to 10 by default