Skip to content

kubectl Debug: Deep Dive

This document explains ephemeral containers, process namespace sharing, debug profiles, the --copy-to mechanism, node-level debugging, and systematic workflows for troubleshooting common Kubernetes failure modes.

Ephemeral containers are a special container type that can be added to a running pod. They are defined in the EphemeralContainerSpec and differ from regular containers in important ways.

FeatureRegular ContainerEphemeral Container
Defined at creationYesNo, added to running pods
Resource limitsRequired (with quota)Optional
Port mappingsYesNo
Liveness/readiness probesYesNo
Restart policyAppliesNever restarts
Lifecycle hooksYesNo
Shows in kubectl get podsYesOnly with -o yaml

Ephemeral containers cannot be removed once added. They run until they exit. If you exit the shell, the container stops. The pod is not modified in any other way.

When you run kubectl debug -it <pod> --image=busybox, kubectl sends a PATCH request to the pod’s ephemeralContainers subresource:

PATCH /api/v1/namespaces/{ns}/pods/{pod}/ephemeralcontainers

The payload adds a new entry to spec.ephemeralContainers:

ephemeralContainers:
- name: debugger-abc123
image: busybox:1.36
stdin: true
tty: true
targetContainerName: app # Share process namespace with this container

The kubelet pulls the debug image and starts the container in the pod’s existing network and IPC namespaces.

An ephemeral container shares the pod’s:

  • Network namespace: Same IP, same ports, same DNS
  • IPC namespace: Same shared memory segments
  • Volume mounts: Only if explicitly configured (not automatic)

It does NOT share by default:

  • PID namespace: Processes are isolated unless shareProcessNamespace is true or targetContainerName is set
  • Filesystem: The debug container has its own root filesystem from its image

Process namespace sharing is the key feature that makes ephemeral containers useful for debugging.

When you specify --target=<container> in kubectl debug, the ephemeral container joins the target container’s process namespace:

Terminal window
kubectl debug -it deploy/distroless-app -n debug-demo \
--image=busybox:1.36 \
--target=app

With process namespace sharing:

  • ps aux in the debug container shows processes from the target container
  • You can inspect /proc/<pid>/root/ to see the target container’s filesystem
  • You can send signals to the target container’s processes
  • You can read /proc/<pid>/environ to see environment variables
  • You can use strace -p <pid> to trace system calls (if capabilities allow)

Without --target, the debug container only sees its own processes.

The pod spec also supports process namespace sharing at the pod level:

spec:
shareProcessNamespace: true

When this is set, all containers in the pod share a single PID namespace. Every container can see every other container’s processes. PID 1 is the pause container, not your application.

This is useful for sidecar patterns but changes application behavior. Some applications expect to be PID 1 and behave differently when they are not.

Kubernetes v1.30+ supports debug profiles that configure the security context of debug containers.

No restrictions. The debug container runs with whatever security context the runtime provides. This is the most permissive and the default.

Applies baseline Pod Security Standards to the debug container:

  • No privileged mode
  • No host namespaces
  • Default capabilities only

Applies restricted Pod Security Standards:

  • Non-root user
  • No privilege escalation
  • Capabilities dropped
  • Seccomp RuntimeDefault

This may limit what you can do in the debug container. Tools that need root access (tcpdump, strace) will not work.

Adds NET_ADMIN and NET_RAW capabilities. Useful for network debugging:

Terminal window
kubectl debug -it <pod> --image=nicolaka/netshoot --profile=netadmin

With netadmin, you can run:

  • tcpdump to capture packets
  • iptables to inspect firewall rules
  • ss / netstat to inspect connections
  • ip to inspect routing
Terminal window
# Default (general) profile
kubectl debug -it <pod> --image=busybox:1.36
# Restricted profile (for security-hardened namespaces)
kubectl debug -it <pod> --image=busybox:1.36 --profile=restricted
# Network admin profile
kubectl debug -it <pod> --image=nicolaka/netshoot --profile=netadmin

The --copy-to flag creates a copy of the target pod instead of adding an ephemeral container. This is essential for debugging crashed containers.

You cannot add an ephemeral container to a pod in CrashLoopBackOff. The container keeps crashing and restarting. The ephemeral container would start, but the pod might restart before you can investigate.

The demo’s crashing pod:

apiVersion: v1
kind: Pod
metadata:
name: crash-loop
namespace: debug-demo
spec:
containers:
- name: app
image: busybox:1.36
command:
- /bin/sh
- -c
- |
echo "Starting..."
echo "Loading config from /config/app.yaml"
if [ ! -f /config/app.yaml ]; then
echo "ERROR: Config file not found!"
exit 1
fi

This pod crashes because /config/app.yaml does not exist. You need to get inside to investigate.

Terminal window
kubectl debug crash-loop -n debug-demo -it \
--copy-to=crash-debug \
--container=app \
-- /bin/sh

This:

  1. Creates a new pod called crash-debug with the same spec as crash-loop.
  2. Overrides the command of the app container with /bin/sh.
  3. Attaches an interactive terminal.

The copy has the same volumes, environment variables, and image as the original. But because the command is overridden to /bin/sh, the container starts a shell instead of crashing.

Inside the copy, you can investigate:

Terminal window
ls /config/ # See what files exist
cat /config/app.yaml # Check if config was supposed to be mounted
env # Check environment variables
mount # Check volume mounts

You can also change the image in the copy:

Terminal window
kubectl debug crash-loop -n debug-demo -it \
--copy-to=crash-debug \
--image=ubuntu:22.04 \
--container=app \
-- bash

This replaces the container’s image with Ubuntu, giving you access to tools like apt, curl, dig, etc. The volumes and env vars from the original pod are preserved.

The copied pod is a new pod. It gets a new IP address. It is not behind the same Service. It is a diagnostic tool, not a live replacement.

Remember to clean up copies:

Terminal window
kubectl delete pod crash-debug -n debug-demo

kubectl debug node/ creates a privileged pod on a specific node with access to the host filesystem:

Terminal window
kubectl debug node/minikube -it --image=busybox:1.36

The debug pod mounts the host root filesystem at /host:

Terminal window
# Inside the debug pod
chroot /host ps aux # Host processes
chroot /host df -h # Host disk usage
chroot /host journalctl -u kubelet # Kubelet logs

The chroot /host command changes the root directory to the host filesystem. After chroot, you are effectively running commands on the host.

Check kubelet status:

Terminal window
chroot /host systemctl status kubelet
chroot /host journalctl -u kubelet --no-pager | /usr/bin/tail -50

Check disk pressure:

Terminal window
chroot /host df -h
chroot /host du -sh /var/lib/kubelet
chroot /host du -sh /var/lib/containers

Check container runtime:

Terminal window
chroot /host crictl ps
chroot /host crictl images
chroot /host crictl logs <container-id>

Check network:

Terminal window
chroot /host ip addr
chroot /host ip route
chroot /host iptables -t nat -L -n

CrashLoopBackOff means the container starts, crashes, and keeps restarting with increasing backoff delays (10s, 20s, 40s, up to 5 minutes).

1. Check logs:

Terminal window
kubectl logs crash-loop -n debug-demo
kubectl logs crash-loop -n debug-demo --previous

--previous shows logs from the last crash. Without it, you might see logs from the current (still-starting) instance.

2. Check events:

Terminal window
kubectl describe pod crash-loop -n debug-demo

Look at the Events section. Common messages:

  • Back-off restarting failed container: The container exited with non-zero
  • Error: ImagePullBackOff: Cannot pull the image (wrong name, auth failure)
  • OOMKilled: Container exceeded memory limit

3. Check exit code:

Terminal window
kubectl get pod crash-loop -n debug-demo -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'

Common exit codes:

CodeMeaning
0Success (but containers should not exit in a Deployment)
1General error (application error)
2Misuse of shell command
126Command not executable
127Command not found
128+NKilled by signal N (e.g., 137 = SIGKILL = 128+9)
137OOMKilled or SIGKILL
139Segfault (SIGSEGV = 128+11)
143SIGTERM (128+15, graceful shutdown)

4. Use kubectl debug —copy-to:

Terminal window
kubectl debug crash-loop -n debug-demo -it \
--copy-to=debug-crash \
--container=app \
-- /bin/sh

The kubelet cannot pull the container image.

Wrong image name or tag:

Terminal window
kubectl describe pod <pod> | grep "Image:"
kubectl describe pod <pod> | grep "Failed"

Authentication failure:

Terminal window
kubectl get pod <pod> -o jsonpath='{.spec.imagePullSecrets}'
kubectl get secret <pull-secret> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d

Registry not reachable:

Terminal window
kubectl debug node/minikube -it --image=busybox:1.36
# Inside: wget -O- https://registry.example.com/v2/

Rate limiting (Docker Hub): Docker Hub limits pulls to 100 per 6 hours for anonymous users. Use authenticated pulls or mirror images locally.

A pod stays in Pending when the scheduler cannot find a node.

Terminal window
kubectl describe pod <pod> -n <namespace>

Check the Events section for:

Insufficient resources:

0/3 nodes are available: 3 Insufficient cpu

The cluster does not have enough free CPU. Either add nodes, reduce requests, or delete other workloads.

Node affinity/selector mismatch:

0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector

The pod requires a node label that no node has. Check nodeSelector or nodeAffinity in the pod spec.

Taints and tolerations:

0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerate

All nodes have taints that the pod does not tolerate. Add tolerations or remove taints.

PVC not bound:

0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims

The PVC is waiting for a PersistentVolume. Check PVC status and StorageClass provisioner.

ResourceQuota exceeded:

exceeded quota: compute-quota

The namespace has hit its resource quota. Free up resources or increase the quota.

The container was killed by the kernel OOM killer because it exceeded its memory limit.

Terminal window
kubectl describe pod <pod> | grep -A 5 "Last State"

Look for:

Last State: Terminated
Reason: OOMKilled
Exit Code: 137

Fix options:

  1. Increase the container’s memory limit
  2. Fix the memory leak in the application
  3. Use a memory profiler inside a debug container
Terminal window
kubectl debug -it <pod> --image=alpine --target=app -- sh
# Inside: watch cat /proc/1/status | grep -i vm

This shows the target container’s virtual memory stats in real time.

Use nicolaka/netshoot with --profile=netadmin for network debugging. Inside: nslookup for DNS, curl -v for HTTP, nc -zv for TCP, tcpdump for packet capture, ip route for routing.

Key kubectl logs flags: -f for streaming, --previous for last crash, --since=1h for time-based, --tail=100 for line limits, --timestamps for timing, --all-containers for multi-container pods.

Use kubectl top pods and kubectl top nodes (requires metrics-server) for current consumption.

For images like registry.k8s.io/pause:3.9 (no shell), ephemeral containers are the only option. Use --target=app to share the process namespace, then inspect via /proc/1/root/ (filesystem), /proc/1/environ (env vars), and /proc/1/cmdline (command).

ImageSizeUse Case
busybox:1.36~4 MBBasic shell, file operations
alpine:3.19~7 MBShell + package manager (apk)
nicolaka/netshoot~350 MBFull network debugging toolkit
curlimages/curl~15 MBHTTP debugging
ubuntu:22.04~75 MBGeneral purpose with apt
registry.k8s.io/e2e-test-images/agnhost~30 MBKubernetes-aware debugging

Choose the smallest image that has the tools you need. In production clusters with image pull restrictions, pre-pull debug images or use an internal registry.

Ephemeral containers bypass the pod’s original security posture. A pod running with restricted PSS can have a debug container injected with general profile that runs as root with all capabilities.

This is by design. Debugging requires elevated access. But it means:

  1. RBAC on pods/ephemeralcontainers controls who can debug.
  2. Audit logs capture debug container creation.
  3. Debug containers in production should be time-limited and reviewed.

Lock down ephemeral container creation with RBAC:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-debugger
rules:
- apiGroups: [""]
resources: ["pods/ephemeralcontainers"]
verbs: ["patch"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list"]

Only users bound to this role can create debug containers. Others can only view pods.