Jobs & CronJobs: Deep Dive
This document explains how the Job controller manages run-to-completion workloads, why the different completion modes exist, and when to use CronJobs for scheduled work. It connects the demo manifests to the broader batch processing model in Kubernetes.
Jobs vs Long-Running Workloads
Section titled “Jobs vs Long-Running Workloads”Deployments run pods forever. If a pod exits, the Deployment replaces it immediately. The goal is “keep N replicas running at all times.”
Jobs flip this model. A Job runs pods until they succeed. When a pod exits with status 0, the Job counts it as a completion. When enough pods have succeeded, the Job is done. Pods that exit with non-zero status are failures, and the Job may retry them.
This distinction matters. Deployments are for services. Jobs are for tasks.
The Job Controller Internals
Section titled “The Job Controller Internals”The Job controller runs inside kube-controller-manager. It watches Job objects and their owned pods. On every reconciliation cycle:
- Count active pods. How many pods are currently running?
- Count succeeded pods. How many have exited 0?
- Count failed pods. How many have exited non-zero?
- Decide. Should it create more pods? Should it mark the Job as complete? Should it mark the Job as failed?
The controller creates pods up to the parallelism limit. It stops creating pods once
completions successful exits have been recorded.
Simple Job Anatomy
Section titled “Simple Job Anatomy”The demo’s simplest Job calculates digits of pi:
apiVersion: batch/v1kind: Jobmetadata: name: pi-calculator namespace: jobs-demospec: template: spec: containers: - name: pi image: perl:5.38-slim command: ["perl", "-Mbignum=bpi", "-wle", "print bpi(2000)"] resources: requests: cpu: 100m memory: 64Mi limits: cpu: 500m memory: 128Mi restartPolicy: Never backoffLimit: 3Key observations:
- No
completionsorparallelismspecified. Both default to 1. One pod must succeed once. restartPolicy: Never. Job pods cannot useAlways. They must useNeverorOnFailure. WithNever, a failed pod stays around for log inspection. WithOnFailure, the kubelet restarts the container in the same pod.backoffLimit: 3. The Job retries up to 3 times before marking itself as failed.
Completion Modes
Section titled “Completion Modes”NonIndexed (Default)
Section titled “NonIndexed (Default)”Each successful pod counts as one completion. Pods are interchangeable. The Job does not care which pod finishes first. It simply counts successes.
The demo’s parallel job uses this mode:
apiVersion: batch/v1kind: Jobmetadata: name: batch-processor namespace: jobs-demospec: completions: 5 parallelism: 2 template: spec: containers: - name: worker image: busybox:1.36 command: - /bin/sh - -c - | TASK_ID=$((RANDOM % 1000)) echo "Worker $(hostname) processing task $TASK_ID" sleep $((RANDOM % 5 + 1)) echo "Task $TASK_ID completed" restartPolicy: Never backoffLimit: 3With completions: 5 and parallelism: 2, the controller runs up to 2 pods at a time. When
a pod succeeds, it starts another one. This continues until 5 total successes are recorded.
Indexed Completion Mode
Section titled “Indexed Completion Mode”Introduced in Kubernetes 1.21. Each pod gets a unique index from 0 to completions - 1. The
index is available in the JOB_COMPLETION_INDEX environment variable.
spec: completions: 10 parallelism: 3 completionMode: IndexedEach index must succeed exactly once. If index 4 fails, the controller retries index 4 specifically, not just “any pod.”
Indexed mode is useful when each task processes a distinct chunk of data:
# Inside the podCHUNK=$JOB_COMPLETION_INDEXprocess_data --chunk=$CHUNK --total-chunks=10Completions = 1, Parallelism = 1 (Default)
Section titled “Completions = 1, Parallelism = 1 (Default)”This is a single-task job. One pod runs. If it succeeds, the Job is done. If it fails, the
controller retries (up to backoffLimit).
Completions Unset, Parallelism >= 1 (Work Queue)
Section titled “Completions Unset, Parallelism >= 1 (Work Queue)”If you omit completions entirely but set parallelism, the Job enters work queue mode. Pods
run in parallel and the Job considers itself complete when any one pod exits successfully.
The remaining pods are terminated.
This pattern works with external work queues (RabbitMQ, Redis). Each pod pulls tasks from the queue. When the queue is empty, one pod exits 0, and the Job finishes.
Failure Handling
Section titled “Failure Handling”backoffLimit
Section titled “backoffLimit”The demo’s flaky job shows retry behavior:
apiVersion: batch/v1kind: Jobmetadata: name: flaky-job namespace: jobs-demospec: backoffLimit: 3 activeDeadlineSeconds: 60 template: spec: containers: - name: flaky image: busybox:1.36 command: - /bin/sh - -c - | ROLL=$((RANDOM % 3)) if [ "$ROLL" -eq 0 ]; then echo "Success on attempt" exit 0 else echo "Failed on attempt (rolled $ROLL)" exit 1 fi restartPolicy: NeverbackoffLimit: 3 means the Job tolerates 3 failures before giving up. If 4 pods fail, the Job
is marked as Failed and no more pods are created.
Exponential Backoff
Section titled “Exponential Backoff”The Job controller uses exponential backoff between retries. The delay starts at 10 seconds and doubles: 10s, 20s, 40s, 80s, up to a cap of 6 minutes. This prevents a broken job from flooding the cluster with pods.
The backoff resets when a pod runs long enough (currently about 10 minutes) before failing. This distinguishes between immediate crashes and long-running work that occasionally fails.
activeDeadlineSeconds
Section titled “activeDeadlineSeconds”A hard timeout for the entire Job. If the Job has not completed after this many seconds, all active pods are terminated and the Job is marked as Failed.
In the demo, activeDeadlineSeconds: 60 means the flaky job has one minute total to produce
a successful pod, regardless of how many retries remain.
This is a safety net. Without it, a Job with a high backoffLimit and slow exponential
backoff could run for hours.
Pod Failure Policy (Kubernetes 1.26+)
Section titled “Pod Failure Policy (Kubernetes 1.26+)”For finer control, you can define rules based on exit codes or container status:
spec: podFailurePolicy: rules: - action: Ignore onExitCodes: containerName: worker operator: In values: [42] - action: FailJob onExitCodes: containerName: worker operator: In values: [1, 2, 3]Ignore: Do not count this failure againstbackoffLimit.FailJob: Immediately fail the entire Job.Count: Count it normally (default behavior).
This lets you distinguish between transient failures (retry) and permanent failures (stop immediately).
TTL-After-Finished
Section titled “TTL-After-Finished”Completed and failed Jobs leave behind pods and Job objects. Over time, these accumulate. The
ttlSecondsAfterFinished field tells the TTL controller to clean up automatically:
spec: ttlSecondsAfterFinished: 3600One hour after the Job completes (or fails), the TTL controller deletes the Job object and all its pods. This is the recommended approach for automated cleanup.
Without this field, you must delete Jobs manually or build a garbage collection process.
CronJob Scheduling
Section titled “CronJob Scheduling”CronJobs create Jobs on a schedule. The demo runs a health report every 2 minutes:
apiVersion: batch/v1kind: CronJobmetadata: name: health-reporter namespace: jobs-demospec: schedule: "*/2 * * * *" concurrencyPolicy: Forbid successfulJobsHistoryLimit: 3 failedJobsHistoryLimit: 1 jobTemplate: spec: template: spec: containers: - name: reporter image: busybox:1.36 command: - /bin/sh - -c - | echo "=== Health Report ===" echo "Time: $(date -u)" echo "Hostname: $(hostname)" echo "Uptime: $(cat /proc/uptime | cut -d' ' -f1)s" echo "Memory: $(cat /proc/meminfo | head -1)" echo "====================" restartPolicy: OnFailureCron Syntax
Section titled “Cron Syntax”The schedule field uses standard five-field cron syntax:
┌───────────── minute (0-59)│ ┌───────────── hour (0-23)│ │ ┌───────────── day of month (1-31)│ │ │ ┌───────────── month (1-12)│ │ │ │ ┌───────────── day of week (0-6, Sunday=0)│ │ │ │ │* * * * *Common patterns:
| Schedule | Meaning |
|---|---|
*/2 * * * * | Every 2 minutes |
0 * * * * | Every hour at minute 0 |
0 2 * * * | Daily at 2:00 AM |
0 0 * * 1 | Every Monday at midnight |
0 0 1 * * | First day of every month |
Timezone Handling
Section titled “Timezone Handling”By default, CronJobs use the kube-controller-manager’s timezone (usually UTC). Since Kubernetes 1.27, you can set a timezone explicitly:
spec: schedule: "0 2 * * *" timeZone: "America/New_York"This runs the job at 2:00 AM Eastern, accounting for daylight saving time transitions. Without
timeZone, 2:00 AM UTC might be midnight or 1:00 AM in your local zone, depending on the
season.
Concurrency Policies
Section titled “Concurrency Policies”The concurrencyPolicy field controls what happens when a new scheduled run triggers while the
previous one is still active.
Allow (Default)
Section titled “Allow (Default)”Multiple Jobs can run concurrently. If the 2:00 PM run is still going at 2:02 PM, the 2:02 run starts anyway. This can cause resource contention.
Forbid
Section titled “Forbid”The demo uses Forbid:
concurrencyPolicy: ForbidIf the previous Job is still running, the new scheduled run is skipped entirely. The CronJob controller logs that it skipped the run. This prevents overlapping work.
Replace
Section titled “Replace”The previous Job is terminated and a new one starts. Use this when you only care about the most recent run. For example, a cache warm-up job where stale results from the previous run are no longer useful.
History Limits
Section titled “History Limits”CronJobs retain a configurable number of completed and failed Jobs:
successfulJobsHistoryLimit: 3failedJobsHistoryLimit: 1This keeps the last 3 successful Jobs and 1 failed Job. Older ones are deleted automatically. Setting these to 0 means no history is retained. This keeps the namespace clean but makes it harder to debug past failures.
Suspend and Resume
Section titled “Suspend and Resume”You can pause a CronJob without deleting it:
spec: suspend: trueOr via kubectl:
kubectl patch cronjob health-reporter -n jobs-demo -p '{"spec":{"suspend":true}}'While suspended, no new Jobs are created on schedule. Existing running Jobs are not affected.
Resume by setting suspend: false.
For regular Jobs (not CronJobs), Kubernetes 1.21 introduced the suspend field:
apiVersion: batch/v1kind: Jobspec: suspend: trueA suspended Job pauses pod creation. Active pods are deleted. Resuming the Job recreates the pods. This is useful for quota management or manual gating.
Real-World Patterns
Section titled “Real-World Patterns”Fan-Out Processing
Section titled “Fan-Out Processing”Process a large dataset by splitting it into chunks:
apiVersion: batch/v1kind: Jobmetadata: name: image-processorspec: completions: 100 parallelism: 10 completionMode: Indexed template: spec: containers: - name: processor image: my-processor:latest command: ["process-chunk"] env: - name: CHUNK_INDEX valueFrom: fieldRef: fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']Each of the 100 pods processes one chunk. 10 pods run in parallel at any time.
Work Queue Pattern
Section titled “Work Queue Pattern”Multiple workers pull from a shared queue:
spec: parallelism: 5 # No completions field - work queue mode template: spec: containers: - name: worker image: my-worker:latest env: - name: QUEUE_URL value: "redis://queue-server:6379"Workers pull tasks from Redis. When the queue is empty, a worker exits 0. The Job controller sees a success and terminates remaining workers.
Database Migrations
Section titled “Database Migrations”Run a migration once and never again:
apiVersion: batch/v1kind: Jobmetadata: name: db-migrate-v42spec: backoffLimit: 0 ttlSecondsAfterFinished: 86400 template: spec: containers: - name: migrate image: my-app:v42 command: ["python", "manage.py", "migrate"]backoffLimit: 0 means no retries. If the migration fails, you want to investigate manually,
not retry blindly. ttlSecondsAfterFinished: 86400 cleans up after 24 hours.
Scheduled Backups
Section titled “Scheduled Backups”apiVersion: batch/v1kind: CronJobmetadata: name: db-backupspec: schedule: "0 3 * * *" timeZone: "UTC" concurrencyPolicy: Forbid jobTemplate: spec: activeDeadlineSeconds: 3600 template: spec: containers: - name: backup image: postgres:16 command: ["pg_dump", "-h", "db-server", "-U", "backup", "production"]Runs at 3:00 AM UTC daily. Forbid ensures a slow backup does not overlap with the next one.
activeDeadlineSeconds: 3600 kills the backup if it takes longer than an hour.
Job Tracking with Finalizers
Section titled “Job Tracking with Finalizers”Kubernetes 1.26+ uses finalizers for accurate Job tracking. The controller adds a
batch.kubernetes.io/job-tracking finalizer to each pod. This prevents the pod from being
garbage collected before the controller counts it.
Without this mechanism, there was a race condition: a pod could complete and be deleted before the controller noticed, causing the Job to create extra pods. The finalizer approach eliminates this.
Starting Deadline Seconds
Section titled “Starting Deadline Seconds”For CronJobs, startingDeadlineSeconds controls how late a Job can start:
spec: schedule: "0 * * * *" startingDeadlineSeconds: 600If the controller misses the scheduled time (because the controller was down or the cluster was overloaded), it will still create the Job as long as fewer than 600 seconds have passed. After 600 seconds, the run is skipped.
If the controller was down for a long time and more than 100 missed schedules have accumulated, the CronJob is not started at all and logs an error. This prevents a flood of Jobs after a controller restart.
restartPolicy: Never vs OnFailure
Section titled “restartPolicy: Never vs OnFailure”Both are valid for Jobs. The difference:
- Never: Failed pods stay around for inspection. The Job controller creates a new pod for each retry. You end up with multiple completed/failed pods.
- OnFailure: The kubelet restarts the container within the same pod. You get fewer pods but lose access to logs from previous attempts.
The demo uses Never for the failing job so you can inspect each attempt’s logs individually.
The CronJob uses OnFailure because there is less need to debug individual attempts.
Connection to the Demo
Section titled “Connection to the Demo”The demo manifests illustrate the full spectrum of Job behavior:
- simple-job.yaml: Single task, single pod, one completion.
- parallel-job.yaml: 5 completions with 2 parallel workers.
- failing-job.yaml: Demonstrates backoffLimit and activeDeadlineSeconds.
- cronjob.yaml: Scheduled execution with Forbid concurrency and history limits.
Each manifest strips away complexity to focus on one concept.
Further Reading
Section titled “Further Reading”- Kubernetes Jobs documentation
- CronJob documentation
- Indexed Jobs KEP
- Pod Failure Policy KEP
- TTL After Finished