Skip to content

Probes & Lifecycle: Deep Dive

This document explains how the kubelet executes probes, how the three probe types interact, the full pod shutdown sequence, and common misconfigurations that cause production incidents.

The kubelet runs probes, not the API server. Each node’s kubelet is responsible for probing every container on that node. The kubelet runs probes in a dedicated goroutine per container per probe type.

A pod with all three probes configured (startup, liveness, readiness) has three independent goroutines checking that container. These goroutines run on their own timers. They do not coordinate with each other.

The kubelet performs probes from inside the node. For HTTP probes, the request comes from the node’s IP, not from another pod. For exec probes, the kubelet invokes the command inside the container’s namespace using the container runtime API.

The kubelet sends an HTTP GET request to the specified path and port. Any status code between 200 and 399 counts as success. Anything else is failure.

livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 0
periodSeconds: 10
failureThreshold: 3

The kubelet does not follow redirects. A 301 or 302 response is a success (it falls within 200-399), but the kubelet will not follow the redirect to the new location. If your health endpoint redirects, that redirect counts as a passing probe.

You can set custom HTTP headers:

livenessProbe:
httpGet:
path: /health
port: 8080
httpHeaders:
- name: Authorization
value: Bearer token123

The host defaults to the pod IP. You can override it with host, but this is rarely needed and can break things if set incorrectly.

The kubelet runs a command inside the container. Exit code 0 means success. Any other exit code means failure.

livenessProbe:
exec:
command: ["test", "-f", "/tmp/healthy"]
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3

The demo’s liveness-fail pod uses an exec probe to check for a file:

containers:
- name: app
image: busybox:1.36
command:
- /bin/sh
- -c
- |
touch /tmp/healthy
sleep 30
echo "Simulating crash - removing health file"
rm /tmp/healthy
sleep infinity
livenessProbe:
exec:
command: ["test", "-f", "/tmp/healthy"]
initialDelaySeconds: 5
periodSeconds: 5
failureThreshold: 3

After 30 seconds, the health file disappears. The liveness probe fails three consecutive times (at 5-second intervals), and the kubelet restarts the container.

Exec probes create a new process inside the container on every check. For high-frequency probes, this adds CPU overhead. The spawned process inherits the container’s resource limits.

One subtle issue: if the exec command hangs (for example, connecting to a database that is stalled), it will block until the timeoutSeconds expires. The default timeout is 1 second.

The kubelet attempts a TCP connection to the specified port. If the connection succeeds (TCP handshake completes), the probe passes. If the connection is refused or times out, the probe fails.

livenessProbe:
tcpSocket:
port: 5432
periodSeconds: 10

TCP probes are useful for non-HTTP services like databases, message brokers, or custom TCP servers. They confirm the port is open but say nothing about whether the service is actually processing requests.

Since Kubernetes v1.27 (stable), the kubelet can perform native gRPC health checks. The target service must implement the gRPC Health Checking Protocol.

livenessProbe:
grpc:
port: 50051
service: "my.package.MyService" # Optional, defaults to ""
periodSeconds: 10

If service is empty, the probe checks overall server health. If specified, it checks the health of that particular service. The gRPC response status SERVING means success. Everything else means failure.

Every probe type accepts the same timing parameters:

ParameterDefaultWhat It Controls
initialDelaySeconds0Wait this long after container start before first probe
periodSeconds10Time between consecutive probes
timeoutSeconds1Max time to wait for a probe response
successThreshold1Consecutive successes to consider healthy
failureThreshold3Consecutive failures to consider unhealthy

The effective time from first failure to action is:

failureThreshold * periodSeconds

For the demo’s healthy-app liveness probe:

livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 0
periodSeconds: 10
failureThreshold: 3

Time from first failure to container restart: 3 * 10 = 30 seconds.

For readiness probes, successThreshold matters. A pod removed from Service endpoints must pass successThreshold consecutive checks before it is added back. The default of 1 means a single success re-adds the pod.

The startup probe exists to solve one problem: slow-starting containers.

Before startup probes existed, developers set large initialDelaySeconds on liveness probes to give containers time to start. This created a blind spot where a truly stuck container would not be detected for a long time.

The startup probe works differently from liveness and readiness:

  1. While the startup probe is active, the kubelet disables both liveness and readiness probes.
  2. The startup probe runs until it succeeds once or exceeds its failure threshold.
  3. On success, the startup probe is permanently disabled and liveness/readiness take over.
  4. On failure (exceeding the threshold), the container is killed and restarted.

The demo’s slow-start app shows this clearly:

containers:
- name: app
startupProbe:
httpGet:
path: /
port: 8080
failureThreshold: 20
periodSeconds: 2
livenessProbe:
httpGet:
path: /
port: 8080
periodSeconds: 10
readinessProbe:
exec:
command: ["test", "-f", "/tmp/healthy"]
periodSeconds: 3

The startup probe allows up to 20 * 2 = 40 seconds for the app to start. During this window, liveness and readiness are completely paused. Once the startup probe passes, the liveness probe kicks in with a tight 10-second interval.

This gives you the best of both worlds: tolerance for slow starts and fast detection of runtime failures.

A pod moves through defined phases:

PhaseMeaning
PendingAccepted by API server, waiting to be scheduled or pull images
RunningAt least one container is running or starting
SucceededAll containers terminated with exit code 0
FailedAll containers terminated, at least one with non-zero exit
UnknownNode communication lost

Within the Running phase, individual containers can be in states: Waiting, Running, or Terminated. A pod can be Running but have a container in Waiting (for example, during a restart after a liveness failure).

When a pod is deleted, the following happens in order:

The API server sets deletionTimestamp on the pod. The pod enters Terminating state. The endpoints controller immediately removes the pod’s IP from all Service endpoints.

If defined, the preStop hook runs. The demo uses this to simulate removing from a load balancer:

lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- |
echo "[$(date)] preStop hook: removing from load balancer"
sleep 3

The preStop hook runs before SIGTERM is sent. This is the place to drain connections, deregister from service discovery, or close resources.

The hook can be:

  • An exec command (runs inside the container)
  • An httpGet request (calls an endpoint on the container)

After the preStop hook completes, the kubelet sends SIGTERM to the container’s PID 1. The application should handle this signal to perform graceful shutdown:

command:
- /bin/sh
- -c
- |
cleanup() {
echo "[$(date)] Received SIGTERM, starting graceful shutdown..."
echo "[$(date)] Draining connections..."
sleep 5
echo "[$(date)] Saving state..."
sleep 2
echo "[$(date)] Shutdown complete."
exit 0
}
trap cleanup TERM
echo "[$(date)] Application started (PID $$)"
while true; do
sleep 1
done

The terminationGracePeriodSeconds timer starts when SIGTERM is sent (not when the preStop hook starts, as many assume). Wait: that is actually incorrect in practice.

The truth is nuanced. The grace period starts when the kubelet begins the termination sequence. Both the preStop hook and the SIGTERM handling share the same grace period window. If your preStop hook takes 20 seconds and your grace period is 30, the application only has 10 seconds to handle SIGTERM before SIGKILL.

spec:
terminationGracePeriodSeconds: 30

The default is 30 seconds. Set this based on how long your application needs to drain.

If the process is still running after the grace period, the kubelet sends SIGKILL. This cannot be caught or handled. The process is immediately terminated.

There is a well-known race condition in Kubernetes shutdown. The endpoints controller and the kubelet operate independently. When a pod enters Terminating:

  1. The endpoints controller removes the pod from Service endpoints.
  2. The kubelet starts the preStop hook.
  3. kube-proxy on each node updates its iptables rules.

Steps 1 and 2 happen concurrently. Step 3 takes additional time. During this window, traffic may still be routed to the pod even though it is shutting down.

The fix: use a preStop hook with a short sleep (3-5 seconds) before beginning shutdown. This gives kube-proxy time to update its routing rules.

lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 5"]

Kubernetes supports two container lifecycle hooks:

Runs immediately after the container is created. It runs concurrently with the container’s entrypoint. There is no guarantee that postStart finishes before the container starts accepting traffic.

lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo started > /tmp/startup"]

If postStart fails, the container is killed. But because it runs asynchronously with the container entrypoint, the container might have already started doing work.

Runs before SIGTERM. Blocks SIGTERM until it completes (or the grace period expires). This is the correct place for graceful shutdown preparation.

preStop and SIGTERM are complementary. Use preStop for infrastructure work (deregister, drain). Use SIGTERM handlers for application work (close database connections, flush caches).

1. Liveness Probe Checks External Dependencies

Section titled “1. Liveness Probe Checks External Dependencies”

The single most common probe misconfiguration. If your liveness probe checks a database or external API, and that dependency goes down, the kubelet restarts your container. This does not fix the external dependency. It creates a cascade of restarts across all pods.

Rule: Liveness probes should check only the container’s own internal state. Can the process respond? Is it deadlocked? That is all.

Use readiness probes for dependency checks. A failing readiness probe removes the pod from traffic without killing it.

2. Missing Startup Probe for Slow Starters

Section titled “2. Missing Startup Probe for Slow Starters”

Without a startup probe, the liveness probe runs during startup. If the app takes 60 seconds to start and the liveness probe allows 30 seconds before restarting, the container never starts. It enters a restart loop.

Setting a high initialDelaySeconds on the liveness probe is the old workaround. The startup probe is better because it does not create a blind spot for runtime failures.

3. Identical Liveness and Readiness Probes

Section titled “3. Identical Liveness and Readiness Probes”

If your liveness and readiness probes check the same endpoint with the same thresholds, a failing probe both removes the pod from traffic AND restarts it simultaneously. The readiness removal is pointless because the container is about to be killed anyway.

Make them different. Readiness can be more sensitive (lower threshold). Liveness should have a higher tolerance for transient failures.

Setting periodSeconds: 1 and failureThreshold: 1 means a single 1-second hiccup restarts the container. This causes unnecessary restarts during brief load spikes or GC pauses.

Reasonable defaults for production:

livenessProbe:
periodSeconds: 10
failureThreshold: 3
timeoutSeconds: 3
readinessProbe:
periodSeconds: 5
failureThreshold: 1
timeoutSeconds: 3

Many applications do not handle SIGTERM by default. The process receives the signal and does nothing. After 30 seconds (grace period), SIGKILL forces an abrupt termination. In-flight requests are dropped. Database transactions are left open.

Most web frameworks require explicit SIGTERM handling. FastAPI, Flask, Express, Spring Boot all have their own shutdown hooks. Configure them.

If the preStop hook takes longer than terminationGracePeriodSeconds, SIGTERM is never sent. The process goes straight from preStop hook timeout to SIGKILL. The application gets no chance to clean up.

Keep preStop hooks short. Increase the grace period if needed.

startupProbe:
httpGet:
path: /health/startup
port: 8080
failureThreshold: 30
periodSeconds: 2 # Allows up to 60s for startup
livenessProbe:
httpGet:
path: /health/live
port: 8080
periodSeconds: 15
failureThreshold: 3
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /health/ready
port: 8080
periodSeconds: 5
failureThreshold: 2
successThreshold: 1
timeoutSeconds: 3

Use three different health endpoints:

  • /health/startup: Checks that initialization is complete (migrations run, caches warmed).
  • /health/live: Checks the process is alive (not deadlocked, not OOM).
  • /health/ready: Checks dependencies (database connected, downstream services reachable).
startupProbe:
exec:
command: ["pg_isready", "-U", "postgres"]
failureThreshold: 60
periodSeconds: 5 # Allows up to 5 minutes for recovery
livenessProbe:
exec:
command: ["pg_isready", "-U", "postgres"]
periodSeconds: 30
failureThreshold: 5
timeoutSeconds: 10

Databases need more tolerance. A PostgreSQL vacuum or WAL replay can make the server temporarily unresponsive. Aggressive liveness probes would kill it during recovery.

livenessProbe:
exec:
command:
- /bin/sh
- -c
- "kill -0 $(cat /var/run/worker.pid)"
periodSeconds: 30
failureThreshold: 3

For processes without HTTP endpoints, use exec probes that check the process is alive or that a heartbeat file was recently updated.

Beyond probes, Kubernetes supports readiness gates. These are custom conditions that must be true for a pod to be considered ready. External controllers set these conditions via the pod status API.

spec:
readinessGates:
- conditionType: "my-custom-condition"

Until an external controller sets my-custom-condition to True, the pod remains not-ready even if all readiness probes pass. This is used by service meshes and load balancer controllers to add their own readiness logic.

ProbeWhen It RunsFailure ActionUse For
StartupBefore liveness/readinessKill + restartSlow-starting containers
LivenessAfter startup passesKill + restartDeadlock detection
ReadinessAfter startup passesRemove from ServiceDependency health

The probes and lifecycle hooks together form the contract between Kubernetes and your application. Get them right and your services handle failures gracefully. Get them wrong and a single dependency blip cascades into a cluster-wide outage.