Skip to content

DaemonSet: Deep Dive

This document explains how DaemonSets schedule pods on every node, why their update strategies differ from Deployments, and when to use tolerations, node selectors, and host access. It connects the demo manifests to production patterns for logging, monitoring, and networking.


A DaemonSet ensures that every node (or a selected subset) runs exactly one copy of a pod. When a new node joins the cluster, the DaemonSet controller automatically schedules a pod on it. When a node is removed, the pod is garbage collected.

This is fundamentally different from Deployments and StatefulSets, which scale by replica count regardless of node topology. A Deployment with 3 replicas might land all 3 pods on the same node. A DaemonSet always produces exactly one pod per qualifying node.


Before Kubernetes 1.12, the DaemonSet controller bypassed the default scheduler entirely. It set the nodeName field directly on the pod spec, which forced the pod onto a specific node without going through the scheduler.

This caused problems. The pod skipped all scheduler predicates (resource checks, affinity rules, taints). It could land on nodes that were already overcommitted.

Since Kubernetes 1.12, DaemonSet pods go through the default scheduler. The DaemonSet controller creates pods with a nodeAffinity that targets a specific node:

affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchFields:
- key: metadata.name
operator: In
values: ["node-1"]

The scheduler then evaluates this pod like any other. It checks resource availability, taints, and other constraints. If the node cannot accommodate the pod, the pod stays Pending.

This approach integrates DaemonSets with the scheduler’s priority and preemption system. A high-priority DaemonSet pod can preempt lower-priority pods on a node.


Taints are node-level markers that repel pods. Tolerations are pod-level declarations that allow a pod to run on tainted nodes.

The demo’s node-monitor uses a toleration to run on control-plane nodes:

apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-monitor
namespace: daemonset-demo
spec:
selector:
matchLabels:
app: node-monitor
template:
metadata:
labels:
app: node-monitor
spec:
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
containers:
- name: monitor
image: busybox:1.36
command: ["/bin/sh", "/scripts/monitor.sh"]
volumeMounts:
- name: scripts
mountPath: /scripts
- name: host-log
mountPath: /var/log/containers
readOnly: true

Control-plane nodes typically carry this taint:

node-role.kubernetes.io/control-plane:NoSchedule

Without a matching toleration, no regular pod can be scheduled there. The node-monitor’s toleration says: “I accept nodes tainted with node-role.kubernetes.io/control-plane, regardless of the taint value.”

The operator: Exists means the toleration matches the key regardless of value. The effect: NoSchedule means it only tolerates the NoSchedule effect, not NoExecute.

EffectBehavior
NoScheduleNew pods are not scheduled. Existing pods stay.
PreferNoScheduleScheduler avoids the node but does not guarantee it.
NoExecuteNew pods are not scheduled. Existing pods are evicted.

The DaemonSet controller automatically adds several tolerations to pods it creates:

  • node.kubernetes.io/not-ready:NoExecute (tolerate not-ready nodes)
  • node.kubernetes.io/unreachable:NoExecute (tolerate unreachable nodes)
  • node.kubernetes.io/disk-pressure:NoSchedule
  • node.kubernetes.io/memory-pressure:NoSchedule
  • node.kubernetes.io/pid-pressure:NoSchedule
  • node.kubernetes.io/unschedulable:NoSchedule

These are added automatically because DaemonSets need to run on every node, even nodes under pressure. A monitoring agent is most useful precisely when a node is having problems.


Node Selection: nodeSelector vs Node Affinity

Section titled “Node Selection: nodeSelector vs Node Affinity”

The simplest way to restrict a DaemonSet to specific nodes. It uses label matching:

spec:
template:
spec:
nodeSelector:
disk: ssd

Only nodes with the label disk=ssd get a pod. This is an AND operation. All specified labels must match.

A more expressive alternative. Node affinity supports In, NotIn, Exists, DoesNotExist, Gt, and Lt operators:

spec:
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values: ["linux"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["m5.large", "m5.xlarge"]

Node affinity also supports preferredDuringSchedulingIgnoredDuringExecution, which expresses a preference without making it a hard requirement.

Use nodeSelector for simple label matches. Use node affinity when you need OR logic (multiple nodeSelectorTerms), negative matching (NotIn, DoesNotExist), or soft preferences.


The demo’s log-collector uses a rolling update:

apiVersion: apps/v1
kind: DaemonSet
metadata:
name: log-collector
namespace: daemonset-demo
spec:
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
app: log-collector
spec:
containers:
- name: collector
image: busybox:1.36

When you update the pod template (change the image, add an env var, etc.), the DaemonSet controller rolls out the change one node at a time. It terminates the old pod on a node, waits for it to be fully gone, then creates the new pod.

maxUnavailable: 1 means at most 1 node at a time lacks a running DaemonSet pod during the rollout. You can increase this to speed up large rollouts:

rollingUpdate:
maxUnavailable: 25%

On a 100-node cluster, this updates 25 nodes simultaneously.

Starting in Kubernetes 1.22, DaemonSet RollingUpdate supports maxSurge:

rollingUpdate:
maxSurge: 1
maxUnavailable: 0

With maxSurge: 1, the controller creates the new pod before deleting the old one. This means a node temporarily runs two DaemonSet pods. The old pod is removed only after the new one is Ready.

This is useful for zero-downtime updates of DaemonSet workloads. Without maxSurge, there is always a gap between the old pod terminating and the new pod starting.

Note: maxSurge and maxUnavailable cannot both be zero. At least one must be positive.

With OnDelete, the DaemonSet controller never automatically replaces pods. You must manually delete each pod to trigger its replacement:

spec:
updateStrategy:
type: OnDelete

This gives you full control over when each node gets updated. It is useful for critical infrastructure like CNI plugins where a bad update could take down networking.


Host Access: hostPath, hostNetwork, hostPID

Section titled “Host Access: hostPath, hostNetwork, hostPID”

DaemonSet pods often need access to the host system. The demo uses hostPath to read container logs from the node filesystem:

volumes:
- name: host-log
hostPath:
path: /var/log/containers
type: Directory

Mounts a file or directory from the host node’s filesystem into the pod. Common use cases:

PathPurpose
/var/logNode and container logs
/var/log/containersContainer log files
/sysKernel parameters and hardware info
/procProcess information
/etc/machine-idUnique node identifier
/var/run/docker.sockContainer runtime socket (legacy)

The type field validates the path:

TypeBehavior
""No check (default)
DirectoryOrCreateCreates the directory if it does not exist
DirectoryMust be an existing directory
FileOrCreateCreates the file if it does not exist
FileMust be an existing file
SocketMust be an existing Unix socket
CharDeviceMust be an existing character device
BlockDeviceMust be an existing block device
spec:
template:
spec:
hostNetwork: true

The pod uses the host’s network namespace. It shares the node’s IP address and can bind to host ports directly. This is used by CNI plugins and some monitoring agents that need to see all network traffic on the node.

spec:
template:
spec:
hostPID: true

The pod shares the host’s PID namespace. It can see all processes on the node. This is useful for monitoring agents that need to inspect process trees or send signals to host processes.

All three (hostPath, hostNetwork, hostPID) break pod isolation. A pod with hostPath can read sensitive files. A pod with hostNetwork can bind to any port. A pod with hostPID can see all processes.

In production, these should be paired with:

  • SecurityContext: Run as non-root, drop capabilities.
  • PodSecurityAdmission: Use restricted or baseline profiles.
  • Read-only mounts: Set readOnly: true on hostPath volume mounts.
  • RBAC: Restrict which ServiceAccounts can create pods with host access.

The demo’s log-collector mounts /var/log/containers as read-only:

volumeMounts:
- name: varlog
mountPath: /var/log/containers
readOnly: true

DaemonSet pods compete for resources with application pods. On a busy node, a DaemonSet pod without resource requests might get evicted or starved.

Both demo DaemonSets set conservative resource requests:

resources:
requests:
cpu: 25m
memory: 16Mi
limits:
cpu: 50m
memory: 32Mi

This reserves a small slice of the node for monitoring. The low limits prevent a runaway monitoring script from consuming excessive resources.

In production, set requests based on observed usage. A Fluentd log collector might need 200m CPU and 256Mi memory. An underfunded Fluentd will drop logs under load.


Log Collection (Fluentd, Fluent Bit, Vector)

Section titled “Log Collection (Fluentd, Fluent Bit, Vector)”

The most common DaemonSet use case. A log collector runs on every node, reads container logs from the node filesystem, and ships them to a central system (Elasticsearch, Loki, CloudWatch).

Key design points:

  • Mount /var/log and /var/lib/docker/containers (or /var/log/pods for CRI-based runtimes).
  • Use readOnly: true for safety.
  • Set memory limits carefully. Log collectors can buffer large volumes of data.
  • Tolerate all taints so logs are collected from every node, including control-plane nodes.

Monitoring Agents (Prometheus Node Exporter, Datadog Agent)

Section titled “Monitoring Agents (Prometheus Node Exporter, Datadog Agent)”

A monitoring agent collects node-level metrics: CPU, memory, disk, network. It exposes a metrics endpoint that Prometheus scrapes.

Key design points:

  • Mount /proc and /sys for system metrics.
  • Use hostNetwork: true if the agent needs to see node-level network metrics.
  • Use hostPID: true if the agent needs to see all processes.
  • Set appropriate resource requests. Monitoring should not compete with application pods.

Container Network Interface plugins run as DaemonSets. They configure networking for every pod on the node.

Key design points:

  • Use hostNetwork: true because the CNI plugin manages the network itself.
  • Use OnDelete update strategy because a broken CNI update can take down all networking on the node.
  • Set priorityClassName: system-node-critical to ensure the CNI pod is never evicted.
  • Tolerate all taints, including NoExecute.

CSI node plugins run as DaemonSets. They handle mounting and unmounting volumes on each node.

Key design points:

  • Mount the host’s /var/lib/kubelet directory.
  • Use privileged: true security context for mount operations.
  • Set priorityClassName: system-node-critical.

DaemonSet vs Deployment with Anti-Affinity

Section titled “DaemonSet vs Deployment with Anti-Affinity”

You could approximate DaemonSet behavior with a Deployment that has pod anti-affinity and a replica count matching the node count. But this is fragile:

  • You must manually adjust the replica count when nodes are added or removed.
  • Pod anti-affinity is a scheduling hint, not a guarantee (for preferred mode).
  • The scheduler does not understand “one per node” as a first-class concept.

DaemonSets handle node topology natively. They track node membership and reconcile automatically. Use them whenever you need exactly one pod per node.


The DaemonSet controller runs inside kube-controller-manager. On each reconciliation:

  1. List all nodes. Determine which nodes qualify (based on nodeSelector, affinity, taints).
  2. List all DaemonSet pods. Find pods owned by this DaemonSet.
  3. Compare. For each qualifying node, check if a pod exists.
  4. Create missing pods. If a qualifying node has no pod, create one.
  5. Delete extra pods. If a non-qualifying node has a pod (label changed, taint added), delete it.
  6. Handle updates. If the pod template has changed and the update strategy is RollingUpdate, replace pods according to maxUnavailable and maxSurge.

The controller also watches for node events (new node, node deletion, label changes) to trigger reconciliation immediately.


DaemonSet pods should use priorityClassName to ensure they are not evicted by application pods:

spec:
template:
spec:
priorityClassName: system-node-critical

Built-in priority classes:

Priority ClassValuePurpose
system-cluster-critical2000000000Cluster-level infrastructure
system-node-critical2000001000Node-level infrastructure

DaemonSet pods with system-node-critical priority can preempt lower-priority pods. This ensures critical node infrastructure (logging, monitoring, networking) always has resources.


DaemonSets maintain revision history, similar to Deployments. You can roll back to a previous version:

Terminal window
# Check revision history
kubectl rollout history daemonset/node-monitor -n daemonset-demo
# Roll back to the previous revision
kubectl rollout undo daemonset/node-monitor -n daemonset-demo
# Roll back to a specific revision
kubectl rollout undo daemonset/node-monitor --to-revision=2 -n daemonset-demo

The controller applies the old pod template and rolls out the change using the configured update strategy.


The demo manifests illustrate two common patterns:

  1. node-monitor: A monitoring agent that tolerates control-plane taints and mounts host paths for log counting. It shows how to access node-level information from a pod.

  2. log-collector: A log shipper that uses RollingUpdate with maxUnavailable: 1 and mounts /var/log/containers read-only. It shows the minimal setup for collecting container logs.

Both DaemonSets run one pod per node. On a single-node minikube cluster, you see one pod each. Adding a second node with minikube node add demonstrates automatic scheduling.


If your DaemonSet pod is missing from a node, check the node’s taints:

Terminal window
kubectl describe node <node-name> | grep Taints

Add matching tolerations to the DaemonSet pod spec.

DaemonSet pods without resource requests can be evicted under memory pressure. Always set resource requests, even if they are small.

Some host paths require root access. If the container runs as non-root, it may get permission denied errors when reading /proc or /sys. Use securityContext.runAsUser: 0 or adjust file permissions.

Using RollingUpdate for a CNI plugin can be dangerous. If the new version has a bug, nodes lose networking as the old pod is replaced. Use OnDelete for critical infrastructure and test updates manually.