Skip to content

Jobs & CronJobs: Deep Dive

This document explains how the Job controller manages run-to-completion workloads, why the different completion modes exist, and when to use CronJobs for scheduled work. It connects the demo manifests to the broader batch processing model in Kubernetes.


Deployments run pods forever. If a pod exits, the Deployment replaces it immediately. The goal is “keep N replicas running at all times.”

Jobs flip this model. A Job runs pods until they succeed. When a pod exits with status 0, the Job counts it as a completion. When enough pods have succeeded, the Job is done. Pods that exit with non-zero status are failures, and the Job may retry them.

This distinction matters. Deployments are for services. Jobs are for tasks.


The Job controller runs inside kube-controller-manager. It watches Job objects and their owned pods. On every reconciliation cycle:

  1. Count active pods. How many pods are currently running?
  2. Count succeeded pods. How many have exited 0?
  3. Count failed pods. How many have exited non-zero?
  4. Decide. Should it create more pods? Should it mark the Job as complete? Should it mark the Job as failed?

The controller creates pods up to the parallelism limit. It stops creating pods once completions successful exits have been recorded.


The demo’s simplest Job calculates digits of pi:

apiVersion: batch/v1
kind: Job
metadata:
name: pi-calculator
namespace: jobs-demo
spec:
template:
spec:
containers:
- name: pi
image: perl:5.38-slim
command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
resources:
requests:
cpu: 100m
memory: 64Mi
limits:
cpu: 500m
memory: 128Mi
restartPolicy: Never
backoffLimit: 3

Key observations:

  • No completions or parallelism specified. Both default to 1. One pod must succeed once.
  • restartPolicy: Never. Job pods cannot use Always. They must use Never or OnFailure. With Never, a failed pod stays around for log inspection. With OnFailure, the kubelet restarts the container in the same pod.
  • backoffLimit: 3. The Job retries up to 3 times before marking itself as failed.

Each successful pod counts as one completion. Pods are interchangeable. The Job does not care which pod finishes first. It simply counts successes.

The demo’s parallel job uses this mode:

apiVersion: batch/v1
kind: Job
metadata:
name: batch-processor
namespace: jobs-demo
spec:
completions: 5
parallelism: 2
template:
spec:
containers:
- name: worker
image: busybox:1.36
command:
- /bin/sh
- -c
- |
TASK_ID=$((RANDOM % 1000))
echo "Worker $(hostname) processing task $TASK_ID"
sleep $((RANDOM % 5 + 1))
echo "Task $TASK_ID completed"
restartPolicy: Never
backoffLimit: 3

With completions: 5 and parallelism: 2, the controller runs up to 2 pods at a time. When a pod succeeds, it starts another one. This continues until 5 total successes are recorded.

Introduced in Kubernetes 1.21. Each pod gets a unique index from 0 to completions - 1. The index is available in the JOB_COMPLETION_INDEX environment variable.

spec:
completions: 10
parallelism: 3
completionMode: Indexed

Each index must succeed exactly once. If index 4 fails, the controller retries index 4 specifically, not just “any pod.”

Indexed mode is useful when each task processes a distinct chunk of data:

Terminal window
# Inside the pod
CHUNK=$JOB_COMPLETION_INDEX
process_data --chunk=$CHUNK --total-chunks=10

Completions = 1, Parallelism = 1 (Default)

Section titled “Completions = 1, Parallelism = 1 (Default)”

This is a single-task job. One pod runs. If it succeeds, the Job is done. If it fails, the controller retries (up to backoffLimit).

Completions Unset, Parallelism >= 1 (Work Queue)

Section titled “Completions Unset, Parallelism >= 1 (Work Queue)”

If you omit completions entirely but set parallelism, the Job enters work queue mode. Pods run in parallel and the Job considers itself complete when any one pod exits successfully. The remaining pods are terminated.

This pattern works with external work queues (RabbitMQ, Redis). Each pod pulls tasks from the queue. When the queue is empty, one pod exits 0, and the Job finishes.


The demo’s flaky job shows retry behavior:

apiVersion: batch/v1
kind: Job
metadata:
name: flaky-job
namespace: jobs-demo
spec:
backoffLimit: 3
activeDeadlineSeconds: 60
template:
spec:
containers:
- name: flaky
image: busybox:1.36
command:
- /bin/sh
- -c
- |
ROLL=$((RANDOM % 3))
if [ "$ROLL" -eq 0 ]; then
echo "Success on attempt"
exit 0
else
echo "Failed on attempt (rolled $ROLL)"
exit 1
fi
restartPolicy: Never

backoffLimit: 3 means the Job tolerates 3 failures before giving up. If 4 pods fail, the Job is marked as Failed and no more pods are created.

The Job controller uses exponential backoff between retries. The delay starts at 10 seconds and doubles: 10s, 20s, 40s, 80s, up to a cap of 6 minutes. This prevents a broken job from flooding the cluster with pods.

The backoff resets when a pod runs long enough (currently about 10 minutes) before failing. This distinguishes between immediate crashes and long-running work that occasionally fails.

A hard timeout for the entire Job. If the Job has not completed after this many seconds, all active pods are terminated and the Job is marked as Failed.

In the demo, activeDeadlineSeconds: 60 means the flaky job has one minute total to produce a successful pod, regardless of how many retries remain.

This is a safety net. Without it, a Job with a high backoffLimit and slow exponential backoff could run for hours.

For finer control, you can define rules based on exit codes or container status:

spec:
podFailurePolicy:
rules:
- action: Ignore
onExitCodes:
containerName: worker
operator: In
values: [42]
- action: FailJob
onExitCodes:
containerName: worker
operator: In
values: [1, 2, 3]
  • Ignore: Do not count this failure against backoffLimit.
  • FailJob: Immediately fail the entire Job.
  • Count: Count it normally (default behavior).

This lets you distinguish between transient failures (retry) and permanent failures (stop immediately).


Completed and failed Jobs leave behind pods and Job objects. Over time, these accumulate. The ttlSecondsAfterFinished field tells the TTL controller to clean up automatically:

spec:
ttlSecondsAfterFinished: 3600

One hour after the Job completes (or fails), the TTL controller deletes the Job object and all its pods. This is the recommended approach for automated cleanup.

Without this field, you must delete Jobs manually or build a garbage collection process.


CronJobs create Jobs on a schedule. The demo runs a health report every 2 minutes:

apiVersion: batch/v1
kind: CronJob
metadata:
name: health-reporter
namespace: jobs-demo
spec:
schedule: "*/2 * * * *"
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: reporter
image: busybox:1.36
command:
- /bin/sh
- -c
- |
echo "=== Health Report ==="
echo "Time: $(date -u)"
echo "Hostname: $(hostname)"
echo "Uptime: $(cat /proc/uptime | cut -d' ' -f1)s"
echo "Memory: $(cat /proc/meminfo | head -1)"
echo "===================="
restartPolicy: OnFailure

The schedule field uses standard five-field cron syntax:

┌───────────── minute (0-59)
│ ┌───────────── hour (0-23)
│ │ ┌───────────── day of month (1-31)
│ │ │ ┌───────────── month (1-12)
│ │ │ │ ┌───────────── day of week (0-6, Sunday=0)
│ │ │ │ │
* * * * *

Common patterns:

ScheduleMeaning
*/2 * * * *Every 2 minutes
0 * * * *Every hour at minute 0
0 2 * * *Daily at 2:00 AM
0 0 * * 1Every Monday at midnight
0 0 1 * *First day of every month

By default, CronJobs use the kube-controller-manager’s timezone (usually UTC). Since Kubernetes 1.27, you can set a timezone explicitly:

spec:
schedule: "0 2 * * *"
timeZone: "America/New_York"

This runs the job at 2:00 AM Eastern, accounting for daylight saving time transitions. Without timeZone, 2:00 AM UTC might be midnight or 1:00 AM in your local zone, depending on the season.


The concurrencyPolicy field controls what happens when a new scheduled run triggers while the previous one is still active.

Multiple Jobs can run concurrently. If the 2:00 PM run is still going at 2:02 PM, the 2:02 run starts anyway. This can cause resource contention.

The demo uses Forbid:

concurrencyPolicy: Forbid

If the previous Job is still running, the new scheduled run is skipped entirely. The CronJob controller logs that it skipped the run. This prevents overlapping work.

The previous Job is terminated and a new one starts. Use this when you only care about the most recent run. For example, a cache warm-up job where stale results from the previous run are no longer useful.


CronJobs retain a configurable number of completed and failed Jobs:

successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1

This keeps the last 3 successful Jobs and 1 failed Job. Older ones are deleted automatically. Setting these to 0 means no history is retained. This keeps the namespace clean but makes it harder to debug past failures.


You can pause a CronJob without deleting it:

spec:
suspend: true

Or via kubectl:

Terminal window
kubectl patch cronjob health-reporter -n jobs-demo -p '{"spec":{"suspend":true}}'

While suspended, no new Jobs are created on schedule. Existing running Jobs are not affected. Resume by setting suspend: false.

For regular Jobs (not CronJobs), Kubernetes 1.21 introduced the suspend field:

apiVersion: batch/v1
kind: Job
spec:
suspend: true

A suspended Job pauses pod creation. Active pods are deleted. Resuming the Job recreates the pods. This is useful for quota management or manual gating.


Process a large dataset by splitting it into chunks:

apiVersion: batch/v1
kind: Job
metadata:
name: image-processor
spec:
completions: 100
parallelism: 10
completionMode: Indexed
template:
spec:
containers:
- name: processor
image: my-processor:latest
command: ["process-chunk"]
env:
- name: CHUNK_INDEX
valueFrom:
fieldRef:
fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']

Each of the 100 pods processes one chunk. 10 pods run in parallel at any time.

Multiple workers pull from a shared queue:

spec:
parallelism: 5
# No completions field - work queue mode
template:
spec:
containers:
- name: worker
image: my-worker:latest
env:
- name: QUEUE_URL
value: "redis://queue-server:6379"

Workers pull tasks from Redis. When the queue is empty, a worker exits 0. The Job controller sees a success and terminates remaining workers.

Run a migration once and never again:

apiVersion: batch/v1
kind: Job
metadata:
name: db-migrate-v42
spec:
backoffLimit: 0
ttlSecondsAfterFinished: 86400
template:
spec:
containers:
- name: migrate
image: my-app:v42
command: ["python", "manage.py", "migrate"]

backoffLimit: 0 means no retries. If the migration fails, you want to investigate manually, not retry blindly. ttlSecondsAfterFinished: 86400 cleans up after 24 hours.

apiVersion: batch/v1
kind: CronJob
metadata:
name: db-backup
spec:
schedule: "0 3 * * *"
timeZone: "UTC"
concurrencyPolicy: Forbid
jobTemplate:
spec:
activeDeadlineSeconds: 3600
template:
spec:
containers:
- name: backup
image: postgres:16
command: ["pg_dump", "-h", "db-server", "-U", "backup", "production"]

Runs at 3:00 AM UTC daily. Forbid ensures a slow backup does not overlap with the next one. activeDeadlineSeconds: 3600 kills the backup if it takes longer than an hour.


Kubernetes 1.26+ uses finalizers for accurate Job tracking. The controller adds a batch.kubernetes.io/job-tracking finalizer to each pod. This prevents the pod from being garbage collected before the controller counts it.

Without this mechanism, there was a race condition: a pod could complete and be deleted before the controller noticed, causing the Job to create extra pods. The finalizer approach eliminates this.


For CronJobs, startingDeadlineSeconds controls how late a Job can start:

spec:
schedule: "0 * * * *"
startingDeadlineSeconds: 600

If the controller misses the scheduled time (because the controller was down or the cluster was overloaded), it will still create the Job as long as fewer than 600 seconds have passed. After 600 seconds, the run is skipped.

If the controller was down for a long time and more than 100 missed schedules have accumulated, the CronJob is not started at all and logs an error. This prevents a flood of Jobs after a controller restart.


Both are valid for Jobs. The difference:

  • Never: Failed pods stay around for inspection. The Job controller creates a new pod for each retry. You end up with multiple completed/failed pods.
  • OnFailure: The kubelet restarts the container within the same pod. You get fewer pods but lose access to logs from previous attempts.

The demo uses Never for the failing job so you can inspect each attempt’s logs individually. The CronJob uses OnFailure because there is less need to debug individual attempts.


The demo manifests illustrate the full spectrum of Job behavior:

  1. simple-job.yaml: Single task, single pod, one completion.
  2. parallel-job.yaml: 5 completions with 2 parallel workers.
  3. failing-job.yaml: Demonstrates backoffLimit and activeDeadlineSeconds.
  4. cronjob.yaml: Scheduled execution with Forbid concurrency and history limits.

Each manifest strips away complexity to focus on one concept.