Jobs & CronJobs: Deep Dive

This document explains how the Job controller manages run-to-completion workloads, why the different completion modes exist, and when to use CronJobs for scheduled work. It connects the demo manifests to the broader batch processing model in Kubernetes.

Jobs vs Long-Running Workloads

Deployments run pods forever. If a pod exits, the Deployment replaces it immediately. The goal is “keep N replicas running at all times.”

Jobs flip this model. A Job runs pods until they succeed. When a pod exits with status 0, the Job counts it as a completion. When enough pods have succeeded, the Job is done. Pods that exit with non-zero status are failures, and the Job may retry them.

This distinction matters. Deployments are for services. Jobs are for tasks.

The Job Controller Internals

The Job controller runs inside kube-controller-manager. It watches Job objects and their owned pods. On every reconciliation cycle:

Count active pods. How many pods are currently running?
Count succeeded pods. How many have exited 0?
Count failed pods. How many have exited non-zero?
Decide. Should it create more pods? Should it mark the Job as complete? Should it mark the Job as failed?

The controller creates pods up to the parallelism limit. It stops creating pods once completions successful exits have been recorded.

Simple Job Anatomy

The demo’s simplest Job calculates digits of pi:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi-calculator
  namespace: jobs-demo
spec:
  template:
    spec:
      containers:
        - name: pi
          image: perl:5.38-slim
          command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"]
          resources:
            requests:
              cpu: 100m
              memory: 64Mi
            limits:
              cpu: 500m
              memory: 128Mi
      restartPolicy: Never
  backoffLimit: 3

Key observations:

No completions or parallelism specified. Both default to 1. One pod must succeed once.
restartPolicy: Never. Job pods cannot use Always. They must use Never or OnFailure. With Never, a failed pod stays around for log inspection. With OnFailure, the kubelet restarts the container in the same pod.
backoffLimit: 3. The Job retries up to 3 times before marking itself as failed.

Completion Modes

NonIndexed (Default)

Each successful pod counts as one completion. Pods are interchangeable. The Job does not care which pod finishes first. It simply counts successes.

The demo’s parallel job uses this mode:

apiVersion: batch/v1
kind: Job
metadata:
  name: batch-processor
  namespace: jobs-demo
spec:
  completions: 5
  parallelism: 2
  template:
    spec:
      containers:
        - name: worker
          image: busybox:1.36
          command:
            - /bin/sh
            - -c
            - |
              TASK_ID=$((RANDOM % 1000))
              echo "Worker $(hostname) processing task $TASK_ID"
              sleep $((RANDOM % 5 + 1))
              echo "Task $TASK_ID completed"
      restartPolicy: Never
  backoffLimit: 3

With completions: 5 and parallelism: 2, the controller runs up to 2 pods at a time. When a pod succeeds, it starts another one. This continues until 5 total successes are recorded.

Indexed Completion Mode

Introduced in Kubernetes 1.21. Each pod gets a unique index from 0 to completions - 1. The index is available in the JOB_COMPLETION_INDEX environment variable.

spec:
  completions: 10
  parallelism: 3
  completionMode: Indexed

Each index must succeed exactly once. If index 4 fails, the controller retries index 4 specifically, not just “any pod.”

Indexed mode is useful when each task processes a distinct chunk of data:

# Inside the pod
CHUNK=$JOB_COMPLETION_INDEX
process_data --chunk=$CHUNK --total-chunks=10

Completions = 1, Parallelism = 1 (Default)

This is a single-task job. One pod runs. If it succeeds, the Job is done. If it fails, the controller retries (up to backoffLimit).

Completions Unset, Parallelism >= 1 (Work Queue)

If you omit completions entirely but set parallelism, the Job enters work queue mode. Pods run in parallel and the Job considers itself complete when any one pod exits successfully. The remaining pods are terminated.

This pattern works with external work queues (RabbitMQ, Redis). Each pod pulls tasks from the queue. When the queue is empty, one pod exits 0, and the Job finishes.

Failure Handling

backoffLimit

The demo’s flaky job shows retry behavior:

apiVersion: batch/v1
kind: Job
metadata:
  name: flaky-job
  namespace: jobs-demo
spec:
  backoffLimit: 3
  activeDeadlineSeconds: 60
  template:
    spec:
      containers:
        - name: flaky
          image: busybox:1.36
          command:
            - /bin/sh
            - -c
            - |
              ROLL=$((RANDOM % 3))
              if [ "$ROLL" -eq 0 ]; then
                echo "Success on attempt"
                exit 0
              else
                echo "Failed on attempt (rolled $ROLL)"
                exit 1
              fi
      restartPolicy: Never

backoffLimit: 3 means the Job tolerates 3 failures before giving up. If 4 pods fail, the Job is marked as Failed and no more pods are created.

Exponential Backoff

The Job controller uses exponential backoff between retries. The delay starts at 10 seconds and doubles: 10s, 20s, 40s, 80s, up to a cap of 6 minutes. This prevents a broken job from flooding the cluster with pods.

The backoff resets when a pod runs long enough (currently about 10 minutes) before failing. This distinguishes between immediate crashes and long-running work that occasionally fails.

activeDeadlineSeconds

A hard timeout for the entire Job. If the Job has not completed after this many seconds, all active pods are terminated and the Job is marked as Failed.

In the demo, activeDeadlineSeconds: 60 means the flaky job has one minute total to produce a successful pod, regardless of how many retries remain.

This is a safety net. Without it, a Job with a high backoffLimit and slow exponential backoff could run for hours.

Pod Failure Policy (Kubernetes 1.26+)

For finer control, you can define rules based on exit codes or container status:

spec:
  podFailurePolicy:
    rules:
      - action: Ignore
        onExitCodes:
          containerName: worker
          operator: In
          values: [42]
      - action: FailJob
        onExitCodes:
          containerName: worker
          operator: In
          values: [1, 2, 3]

Ignore: Do not count this failure against backoffLimit.
FailJob: Immediately fail the entire Job.
Count: Count it normally (default behavior).

This lets you distinguish between transient failures (retry) and permanent failures (stop immediately).

TTL-After-Finished

Completed and failed Jobs leave behind pods and Job objects. Over time, these accumulate. The ttlSecondsAfterFinished field tells the TTL controller to clean up automatically:

spec:
  ttlSecondsAfterFinished: 3600

One hour after the Job completes (or fails), the TTL controller deletes the Job object and all its pods. This is the recommended approach for automated cleanup.

Without this field, you must delete Jobs manually or build a garbage collection process.

CronJob Scheduling

CronJobs create Jobs on a schedule. The demo runs a health report every 2 minutes:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: health-reporter
  namespace: jobs-demo
spec:
  schedule: "*/2 * * * *"
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
            - name: reporter
              image: busybox:1.36
              command:
                - /bin/sh
                - -c
                - |
                  echo "=== Health Report ==="
                  echo "Time: $(date -u)"
                  echo "Hostname: $(hostname)"
                  echo "Uptime: $(cat /proc/uptime | cut -d' ' -f1)s"
                  echo "Memory: $(cat /proc/meminfo | head -1)"
                  echo "===================="
          restartPolicy: OnFailure

Cron Syntax

The schedule field uses standard five-field cron syntax:

┌───────────── minute (0-59)
│ ┌───────────── hour (0-23)
│ │ ┌───────────── day of month (1-31)
│ │ │ ┌───────────── month (1-12)
│ │ │ │ ┌───────────── day of week (0-6, Sunday=0)
│ │ │ │ │
* * * * *

Common patterns:

Schedule	Meaning
`/2 * * *`	Every 2 minutes
`0 * * * *`	Every hour at minute 0
`0 2 * * *`	Daily at 2:00 AM
`0 0 * * 1`	Every Monday at midnight
`0 0 1 * *`	First day of every month

Timezone Handling

By default, CronJobs use the kube-controller-manager’s timezone (usually UTC). Since Kubernetes 1.27, you can set a timezone explicitly:

spec:
  schedule: "0 2 * * *"
  timeZone: "America/New_York"

This runs the job at 2:00 AM Eastern, accounting for daylight saving time transitions. Without timeZone, 2:00 AM UTC might be midnight or 1:00 AM in your local zone, depending on the season.

Concurrency Policies

The concurrencyPolicy field controls what happens when a new scheduled run triggers while the previous one is still active.

Allow (Default)

Multiple Jobs can run concurrently. If the 2:00 PM run is still going at 2:02 PM, the 2:02 run starts anyway. This can cause resource contention.

Forbid

The demo uses Forbid:

concurrencyPolicy: Forbid

If the previous Job is still running, the new scheduled run is skipped entirely. The CronJob controller logs that it skipped the run. This prevents overlapping work.

Replace

The previous Job is terminated and a new one starts. Use this when you only care about the most recent run. For example, a cache warm-up job where stale results from the previous run are no longer useful.

History Limits

CronJobs retain a configurable number of completed and failed Jobs:

successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1

This keeps the last 3 successful Jobs and 1 failed Job. Older ones are deleted automatically. Setting these to 0 means no history is retained. This keeps the namespace clean but makes it harder to debug past failures.

Suspend and Resume

You can pause a CronJob without deleting it:

spec:
  suspend: true

Or via kubectl:

kubectl patch cronjob health-reporter -n jobs-demo -p '{"spec":{"suspend":true}}'

While suspended, no new Jobs are created on schedule. Existing running Jobs are not affected. Resume by setting suspend: false.

For regular Jobs (not CronJobs), Kubernetes 1.21 introduced the suspend field:

apiVersion: batch/v1
kind: Job
spec:
  suspend: true

A suspended Job pauses pod creation. Active pods are deleted. Resuming the Job recreates the pods. This is useful for quota management or manual gating.

Real-World Patterns

Fan-Out Processing

Process a large dataset by splitting it into chunks:

apiVersion: batch/v1
kind: Job
metadata:
  name: image-processor
spec:
  completions: 100
  parallelism: 10
  completionMode: Indexed
  template:
    spec:
      containers:
        - name: processor
          image: my-processor:latest
          command: ["process-chunk"]
          env:
            - name: CHUNK_INDEX
              valueFrom:
                fieldRef:
                  fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']

Each of the 100 pods processes one chunk. 10 pods run in parallel at any time.

Work Queue Pattern

Multiple workers pull from a shared queue:

spec:
  parallelism: 5
  # No completions field - work queue mode
  template:
    spec:
      containers:
        - name: worker
          image: my-worker:latest
          env:
            - name: QUEUE_URL
              value: "redis://queue-server:6379"

Workers pull tasks from Redis. When the queue is empty, a worker exits 0. The Job controller sees a success and terminates remaining workers.

Database Migrations

Run a migration once and never again:

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migrate-v42
spec:
  backoffLimit: 0
  ttlSecondsAfterFinished: 86400
  template:
    spec:
      containers:
        - name: migrate
          image: my-app:v42
          command: ["python", "manage.py", "migrate"]

backoffLimit: 0 means no retries. If the migration fails, you want to investigate manually, not retry blindly. ttlSecondsAfterFinished: 86400 cleans up after 24 hours.

Scheduled Backups

apiVersion: batch/v1
kind: CronJob
metadata:
  name: db-backup
spec:
  schedule: "0 3 * * *"
  timeZone: "UTC"
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      activeDeadlineSeconds: 3600
      template:
        spec:
          containers:
            - name: backup
              image: postgres:16
              command: ["pg_dump", "-h", "db-server", "-U", "backup", "production"]

Runs at 3:00 AM UTC daily. Forbid ensures a slow backup does not overlap with the next one. activeDeadlineSeconds: 3600 kills the backup if it takes longer than an hour.

Job Tracking with Finalizers

Kubernetes 1.26+ uses finalizers for accurate Job tracking. The controller adds a batch.kubernetes.io/job-tracking finalizer to each pod. This prevents the pod from being garbage collected before the controller counts it.

Without this mechanism, there was a race condition: a pod could complete and be deleted before the controller noticed, causing the Job to create extra pods. The finalizer approach eliminates this.

Starting Deadline Seconds

For CronJobs, startingDeadlineSeconds controls how late a Job can start:

spec:
  schedule: "0 * * * *"
  startingDeadlineSeconds: 600

If the controller misses the scheduled time (because the controller was down or the cluster was overloaded), it will still create the Job as long as fewer than 600 seconds have passed. After 600 seconds, the run is skipped.

If the controller was down for a long time and more than 100 missed schedules have accumulated, the CronJob is not started at all and logs an error. This prevents a flood of Jobs after a controller restart.

restartPolicy: Never vs OnFailure

Both are valid for Jobs. The difference:

Never: Failed pods stay around for inspection. The Job controller creates a new pod for each retry. You end up with multiple completed/failed pods.
OnFailure: The kubelet restarts the container within the same pod. You get fewer pods but lose access to logs from previous attempts.

The demo uses Never for the failing job so you can inspect each attempt’s logs individually. The CronJob uses OnFailure because there is less need to debug individual attempts.

Connection to the Demo

The demo manifests illustrate the full spectrum of Job behavior:

simple-job.yaml: Single task, single pod, one completion.
parallel-job.yaml: 5 completions with 2 parallel workers.
failing-job.yaml: Demonstrates backoffLimit and activeDeadlineSeconds.
cronjob.yaml: Scheduled execution with Forbid concurrency and history limits.

Each manifest strips away complexity to focus on one concept.