kubectl Debug: Deep Dive
This document explains ephemeral containers, process namespace sharing, debug profiles, the --copy-to mechanism, node-level debugging, and systematic workflows for troubleshooting common Kubernetes failure modes.
Ephemeral Containers
Section titled “Ephemeral Containers”Ephemeral containers are a special container type that can be added to a running pod. They are defined in the EphemeralContainerSpec and differ from regular containers in important ways.
How They Differ from Regular Containers
Section titled “How They Differ from Regular Containers”| Feature | Regular Container | Ephemeral Container |
|---|---|---|
| Defined at creation | Yes | No, added to running pods |
| Resource limits | Required (with quota) | Optional |
| Port mappings | Yes | No |
| Liveness/readiness probes | Yes | No |
| Restart policy | Applies | Never restarts |
| Lifecycle hooks | Yes | No |
Shows in kubectl get pods | Yes | Only with -o yaml |
Ephemeral containers cannot be removed once added. They run until they exit. If you exit the shell, the container stops. The pod is not modified in any other way.
The API Behind kubectl debug
Section titled “The API Behind kubectl debug”When you run kubectl debug -it <pod> --image=busybox, kubectl sends a PATCH request to the pod’s ephemeralContainers subresource:
PATCH /api/v1/namespaces/{ns}/pods/{pod}/ephemeralcontainersThe payload adds a new entry to spec.ephemeralContainers:
ephemeralContainers: - name: debugger-abc123 image: busybox:1.36 stdin: true tty: true targetContainerName: app # Share process namespace with this containerThe kubelet pulls the debug image and starts the container in the pod’s existing network and IPC namespaces.
What Ephemeral Containers Can Access
Section titled “What Ephemeral Containers Can Access”An ephemeral container shares the pod’s:
- Network namespace: Same IP, same ports, same DNS
- IPC namespace: Same shared memory segments
- Volume mounts: Only if explicitly configured (not automatic)
It does NOT share by default:
- PID namespace: Processes are isolated unless
shareProcessNamespaceis true ortargetContainerNameis set - Filesystem: The debug container has its own root filesystem from its image
Process Namespace Sharing
Section titled “Process Namespace Sharing”Process namespace sharing is the key feature that makes ephemeral containers useful for debugging.
targetContainerName
Section titled “targetContainerName”When you specify --target=<container> in kubectl debug, the ephemeral container joins the target container’s process namespace:
kubectl debug -it deploy/distroless-app -n debug-demo \ --image=busybox:1.36 \ --target=appWith process namespace sharing:
ps auxin the debug container shows processes from the target container- You can inspect
/proc/<pid>/root/to see the target container’s filesystem - You can send signals to the target container’s processes
- You can read
/proc/<pid>/environto see environment variables - You can use
strace -p <pid>to trace system calls (if capabilities allow)
Without --target, the debug container only sees its own processes.
Pod-Level Process Sharing
Section titled “Pod-Level Process Sharing”The pod spec also supports process namespace sharing at the pod level:
spec: shareProcessNamespace: trueWhen this is set, all containers in the pod share a single PID namespace. Every container can see every other container’s processes. PID 1 is the pause container, not your application.
This is useful for sidecar patterns but changes application behavior. Some applications expect to be PID 1 and behave differently when they are not.
Debug Profiles
Section titled “Debug Profiles”Kubernetes v1.30+ supports debug profiles that configure the security context of debug containers.
general (default)
Section titled “general (default)”No restrictions. The debug container runs with whatever security context the runtime provides. This is the most permissive and the default.
baseline
Section titled “baseline”Applies baseline Pod Security Standards to the debug container:
- No privileged mode
- No host namespaces
- Default capabilities only
restricted
Section titled “restricted”Applies restricted Pod Security Standards:
- Non-root user
- No privilege escalation
- Capabilities dropped
- Seccomp RuntimeDefault
This may limit what you can do in the debug container. Tools that need root access (tcpdump, strace) will not work.
netadmin
Section titled “netadmin”Adds NET_ADMIN and NET_RAW capabilities. Useful for network debugging:
kubectl debug -it <pod> --image=nicolaka/netshoot --profile=netadminWith netadmin, you can run:
tcpdumpto capture packetsiptablesto inspect firewall rulesss/netstatto inspect connectionsipto inspect routing
Using Profiles
Section titled “Using Profiles”# Default (general) profilekubectl debug -it <pod> --image=busybox:1.36
# Restricted profile (for security-hardened namespaces)kubectl debug -it <pod> --image=busybox:1.36 --profile=restricted
# Network admin profilekubectl debug -it <pod> --image=nicolaka/netshoot --profile=netadminThe —copy-to Mechanism
Section titled “The —copy-to Mechanism”The --copy-to flag creates a copy of the target pod instead of adding an ephemeral container. This is essential for debugging crashed containers.
Why Copies Are Needed
Section titled “Why Copies Are Needed”You cannot add an ephemeral container to a pod in CrashLoopBackOff. The container keeps crashing and restarting. The ephemeral container would start, but the pod might restart before you can investigate.
The demo’s crashing pod:
apiVersion: v1kind: Podmetadata: name: crash-loop namespace: debug-demospec: containers: - name: app image: busybox:1.36 command: - /bin/sh - -c - | echo "Starting..." echo "Loading config from /config/app.yaml" if [ ! -f /config/app.yaml ]; then echo "ERROR: Config file not found!" exit 1 fiThis pod crashes because /config/app.yaml does not exist. You need to get inside to investigate.
How —copy-to Works
Section titled “How —copy-to Works”kubectl debug crash-loop -n debug-demo -it \ --copy-to=crash-debug \ --container=app \ -- /bin/shThis:
- Creates a new pod called
crash-debugwith the same spec ascrash-loop. - Overrides the command of the
appcontainer with/bin/sh. - Attaches an interactive terminal.
The copy has the same volumes, environment variables, and image as the original. But because the command is overridden to /bin/sh, the container starts a shell instead of crashing.
Inside the copy, you can investigate:
ls /config/ # See what files existcat /config/app.yaml # Check if config was supposed to be mountedenv # Check environment variablesmount # Check volume mountsCopy with Image Override
Section titled “Copy with Image Override”You can also change the image in the copy:
kubectl debug crash-loop -n debug-demo -it \ --copy-to=crash-debug \ --image=ubuntu:22.04 \ --container=app \ -- bashThis replaces the container’s image with Ubuntu, giving you access to tools like apt, curl, dig, etc. The volumes and env vars from the original pod are preserved.
Copy Limitations
Section titled “Copy Limitations”The copied pod is a new pod. It gets a new IP address. It is not behind the same Service. It is a diagnostic tool, not a live replacement.
Remember to clean up copies:
kubectl delete pod crash-debug -n debug-demoNode-Level Debugging
Section titled “Node-Level Debugging”kubectl debug node/ creates a privileged pod on a specific node with access to the host filesystem:
kubectl debug node/minikube -it --image=busybox:1.36What the Node Debug Pod Gets
Section titled “What the Node Debug Pod Gets”The debug pod mounts the host root filesystem at /host:
# Inside the debug podchroot /host ps aux # Host processeschroot /host df -h # Host disk usagechroot /host journalctl -u kubelet # Kubelet logsThe chroot /host command changes the root directory to the host filesystem. After chroot, you are effectively running commands on the host.
Common Node Debugging Tasks
Section titled “Common Node Debugging Tasks”Check kubelet status:
chroot /host systemctl status kubeletchroot /host journalctl -u kubelet --no-pager | /usr/bin/tail -50Check disk pressure:
chroot /host df -hchroot /host du -sh /var/lib/kubeletchroot /host du -sh /var/lib/containersCheck container runtime:
chroot /host crictl pschroot /host crictl imageschroot /host crictl logs <container-id>Check network:
chroot /host ip addrchroot /host ip routechroot /host iptables -t nat -L -nTroubleshooting CrashLoopBackOff
Section titled “Troubleshooting CrashLoopBackOff”CrashLoopBackOff means the container starts, crashes, and keeps restarting with increasing backoff delays (10s, 20s, 40s, up to 5 minutes).
Diagnostic Steps
Section titled “Diagnostic Steps”1. Check logs:
kubectl logs crash-loop -n debug-demokubectl logs crash-loop -n debug-demo --previous--previous shows logs from the last crash. Without it, you might see logs from the current (still-starting) instance.
2. Check events:
kubectl describe pod crash-loop -n debug-demoLook at the Events section. Common messages:
Back-off restarting failed container: The container exited with non-zeroError: ImagePullBackOff: Cannot pull the image (wrong name, auth failure)OOMKilled: Container exceeded memory limit
3. Check exit code:
kubectl get pod crash-loop -n debug-demo -o jsonpath='{.status.containerStatuses[0].lastState.terminated.exitCode}'Common exit codes:
| Code | Meaning |
|---|---|
| 0 | Success (but containers should not exit in a Deployment) |
| 1 | General error (application error) |
| 2 | Misuse of shell command |
| 126 | Command not executable |
| 127 | Command not found |
| 128+N | Killed by signal N (e.g., 137 = SIGKILL = 128+9) |
| 137 | OOMKilled or SIGKILL |
| 139 | Segfault (SIGSEGV = 128+11) |
| 143 | SIGTERM (128+15, graceful shutdown) |
4. Use kubectl debug —copy-to:
kubectl debug crash-loop -n debug-demo -it \ --copy-to=debug-crash \ --container=app \ -- /bin/shTroubleshooting ImagePullBackOff
Section titled “Troubleshooting ImagePullBackOff”The kubelet cannot pull the container image.
Common Causes
Section titled “Common Causes”Wrong image name or tag:
kubectl describe pod <pod> | grep "Image:"kubectl describe pod <pod> | grep "Failed"Authentication failure:
kubectl get pod <pod> -o jsonpath='{.spec.imagePullSecrets}'kubectl get secret <pull-secret> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -dRegistry not reachable:
kubectl debug node/minikube -it --image=busybox:1.36# Inside: wget -O- https://registry.example.com/v2/Rate limiting (Docker Hub): Docker Hub limits pulls to 100 per 6 hours for anonymous users. Use authenticated pulls or mirror images locally.
Troubleshooting Pending Pods
Section titled “Troubleshooting Pending Pods”A pod stays in Pending when the scheduler cannot find a node.
Diagnostic Steps
Section titled “Diagnostic Steps”kubectl describe pod <pod> -n <namespace>Check the Events section for:
Insufficient resources:
0/3 nodes are available: 3 Insufficient cpuThe cluster does not have enough free CPU. Either add nodes, reduce requests, or delete other workloads.
Node affinity/selector mismatch:
0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selectorThe pod requires a node label that no node has. Check nodeSelector or nodeAffinity in the pod spec.
Taints and tolerations:
0/3 nodes are available: 3 node(s) had taints that the pod didn't tolerateAll nodes have taints that the pod does not tolerate. Add tolerations or remove taints.
PVC not bound:
0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaimsThe PVC is waiting for a PersistentVolume. Check PVC status and StorageClass provisioner.
ResourceQuota exceeded:
exceeded quota: compute-quotaThe namespace has hit its resource quota. Free up resources or increase the quota.
Troubleshooting OOMKilled
Section titled “Troubleshooting OOMKilled”The container was killed by the kernel OOM killer because it exceeded its memory limit.
Diagnostic Steps
Section titled “Diagnostic Steps”kubectl describe pod <pod> | grep -A 5 "Last State"Look for:
Last State: Terminated Reason: OOMKilled Exit Code: 137Fix options:
- Increase the container’s memory limit
- Fix the memory leak in the application
- Use a memory profiler inside a debug container
kubectl debug -it <pod> --image=alpine --target=app -- sh# Inside: watch cat /proc/1/status | grep -i vmThis shows the target container’s virtual memory stats in real time.
Common Debugging Workflows
Section titled “Common Debugging Workflows”Network Connectivity
Section titled “Network Connectivity”Use nicolaka/netshoot with --profile=netadmin for network debugging. Inside: nslookup for DNS, curl -v for HTTP, nc -zv for TCP, tcpdump for packet capture, ip route for routing.
Application Logs
Section titled “Application Logs”Key kubectl logs flags: -f for streaming, --previous for last crash, --since=1h for time-based, --tail=100 for line limits, --timestamps for timing, --all-containers for multi-container pods.
Resource Usage
Section titled “Resource Usage”Use kubectl top pods and kubectl top nodes (requires metrics-server) for current consumption.
Distroless Containers
Section titled “Distroless Containers”For images like registry.k8s.io/pause:3.9 (no shell), ephemeral containers are the only option. Use --target=app to share the process namespace, then inspect via /proc/1/root/ (filesystem), /proc/1/environ (env vars), and /proc/1/cmdline (command).
Useful Debug Images
Section titled “Useful Debug Images”| Image | Size | Use Case |
|---|---|---|
busybox:1.36 | ~4 MB | Basic shell, file operations |
alpine:3.19 | ~7 MB | Shell + package manager (apk) |
nicolaka/netshoot | ~350 MB | Full network debugging toolkit |
curlimages/curl | ~15 MB | HTTP debugging |
ubuntu:22.04 | ~75 MB | General purpose with apt |
registry.k8s.io/e2e-test-images/agnhost | ~30 MB | Kubernetes-aware debugging |
Choose the smallest image that has the tools you need. In production clusters with image pull restrictions, pre-pull debug images or use an internal registry.
Security Considerations
Section titled “Security Considerations”Ephemeral containers bypass the pod’s original security posture. A pod running with restricted PSS can have a debug container injected with general profile that runs as root with all capabilities.
This is by design. Debugging requires elevated access. But it means:
- RBAC on
pods/ephemeralcontainerscontrols who can debug. - Audit logs capture debug container creation.
- Debug containers in production should be time-limited and reviewed.
Lock down ephemeral container creation with RBAC:
apiVersion: rbac.authorization.k8s.io/v1kind: Rolemetadata: name: pod-debuggerrules: - apiGroups: [""] resources: ["pods/ephemeralcontainers"] verbs: ["patch"] - apiGroups: [""] resources: ["pods"] verbs: ["get", "list"]Only users bound to this role can create debug containers. Others can only view pods.