DaemonSet: Deep Dive

This document explains how DaemonSets schedule pods on every node, why their update strategies differ from Deployments, and when to use tolerations, node selectors, and host access. It connects the demo manifests to production patterns for logging, monitoring, and networking.

What DaemonSets Guarantee

A DaemonSet ensures that every node (or a selected subset) runs exactly one copy of a pod. When a new node joins the cluster, the DaemonSet controller automatically schedules a pod on it. When a node is removed, the pod is garbage collected.

This is fundamentally different from Deployments and StatefulSets, which scale by replica count regardless of node topology. A Deployment with 3 replicas might land all 3 pods on the same node. A DaemonSet always produces exactly one pod per qualifying node.

How DaemonSet Scheduling Works

The Old Way (Pre-1.12)

Before Kubernetes 1.12, the DaemonSet controller bypassed the default scheduler entirely. It set the nodeName field directly on the pod spec, which forced the pod onto a specific node without going through the scheduler.

This caused problems. The pod skipped all scheduler predicates (resource checks, affinity rules, taints). It could land on nodes that were already overcommitted.

The Current Way

Since Kubernetes 1.12, DaemonSet pods go through the default scheduler. The DaemonSet controller creates pods with a nodeAffinity that targets a specific node:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchFields:
            - key: metadata.name
              operator: In
              values: ["node-1"]

The scheduler then evaluates this pod like any other. It checks resource availability, taints, and other constraints. If the node cannot accommodate the pod, the pod stays Pending.

This approach integrates DaemonSets with the scheduler’s priority and preemption system. A high-priority DaemonSet pod can preempt lower-priority pods on a node.

Tolerations and Taints

Taints are node-level markers that repel pods. Tolerations are pod-level declarations that allow a pod to run on tainted nodes.

The demo’s node-monitor uses a toleration to run on control-plane nodes:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: node-monitor
  namespace: daemonset-demo
spec:
  selector:
    matchLabels:
      app: node-monitor
  template:
    metadata:
      labels:
        app: node-monitor
    spec:
      tolerations:
        - key: node-role.kubernetes.io/control-plane
          operator: Exists
          effect: NoSchedule
      containers:
        - name: monitor
          image: busybox:1.36
          command: ["/bin/sh", "/scripts/monitor.sh"]
          volumeMounts:
            - name: scripts
              mountPath: /scripts
            - name: host-log
              mountPath: /var/log/containers
              readOnly: true

How Taints and Tolerations Interact

Control-plane nodes typically carry this taint:

node-role.kubernetes.io/control-plane:NoSchedule

Without a matching toleration, no regular pod can be scheduled there. The node-monitor’s toleration says: “I accept nodes tainted with node-role.kubernetes.io/control-plane, regardless of the taint value.”

The operator: Exists means the toleration matches the key regardless of value. The effect: NoSchedule means it only tolerates the NoSchedule effect, not NoExecute.

Common Taint Effects

Effect	Behavior
`NoSchedule`	New pods are not scheduled. Existing pods stay.
`PreferNoSchedule`	Scheduler avoids the node but does not guarantee it.
`NoExecute`	New pods are not scheduled. Existing pods are evicted.

DaemonSet-Specific Tolerations

The DaemonSet controller automatically adds several tolerations to pods it creates:

node.kubernetes.io/not-ready:NoExecute (tolerate not-ready nodes)
node.kubernetes.io/unreachable:NoExecute (tolerate unreachable nodes)
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unschedulable:NoSchedule

These are added automatically because DaemonSets need to run on every node, even nodes under pressure. A monitoring agent is most useful precisely when a node is having problems.

Node Selection: nodeSelector vs Node Affinity

nodeSelector

The simplest way to restrict a DaemonSet to specific nodes. It uses label matching:

spec:
  template:
    spec:
      nodeSelector:
        disk: ssd

Only nodes with the label disk=ssd get a pod. This is an AND operation. All specified labels must match.

Node Affinity

A more expressive alternative. Node affinity supports In, NotIn, Exists, DoesNotExist, Gt, and Lt operators:

spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: kubernetes.io/os
                    operator: In
                    values: ["linux"]
                  - key: node.kubernetes.io/instance-type
                    operator: In
                    values: ["m5.large", "m5.xlarge"]

Node affinity also supports preferredDuringSchedulingIgnoredDuringExecution, which expresses a preference without making it a hard requirement.

When to Use Which

Use nodeSelector for simple label matches. Use node affinity when you need OR logic (multiple nodeSelectorTerms), negative matching (NotIn, DoesNotExist), or soft preferences.

Update Strategies

RollingUpdate (Default)

The demo’s log-collector uses a rolling update:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: log-collector
  namespace: daemonset-demo
spec:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
  template:
    metadata:
      labels:
        app: log-collector
    spec:
      containers:
        - name: collector
          image: busybox:1.36

When you update the pod template (change the image, add an env var, etc.), the DaemonSet controller rolls out the change one node at a time. It terminates the old pod on a node, waits for it to be fully gone, then creates the new pod.

maxUnavailable: 1 means at most 1 node at a time lacks a running DaemonSet pod during the rollout. You can increase this to speed up large rollouts:

rollingUpdate:
  maxUnavailable: 25%

On a 100-node cluster, this updates 25 nodes simultaneously.

maxSurge

Starting in Kubernetes 1.22, DaemonSet RollingUpdate supports maxSurge:

rollingUpdate:
  maxSurge: 1
  maxUnavailable: 0

With maxSurge: 1, the controller creates the new pod before deleting the old one. This means a node temporarily runs two DaemonSet pods. The old pod is removed only after the new one is Ready.

This is useful for zero-downtime updates of DaemonSet workloads. Without maxSurge, there is always a gap between the old pod terminating and the new pod starting.

Note: maxSurge and maxUnavailable cannot both be zero. At least one must be positive.

OnDelete

With OnDelete, the DaemonSet controller never automatically replaces pods. You must manually delete each pod to trigger its replacement:

spec:
  updateStrategy:
    type: OnDelete

This gives you full control over when each node gets updated. It is useful for critical infrastructure like CNI plugins where a bad update could take down networking.

Host Access: hostPath, hostNetwork, hostPID

DaemonSet pods often need access to the host system. The demo uses hostPath to read container logs from the node filesystem:

volumes:
  - name: host-log
    hostPath:
      path: /var/log/containers
      type: Directory

hostPath

Mounts a file or directory from the host node’s filesystem into the pod. Common use cases:

Path	Purpose
`/var/log`	Node and container logs
`/var/log/containers`	Container log files
`/sys`	Kernel parameters and hardware info
`/proc`	Process information
`/etc/machine-id`	Unique node identifier
`/var/run/docker.sock`	Container runtime socket (legacy)

The type field validates the path:

Type	Behavior
`""`	No check (default)
`DirectoryOrCreate`	Creates the directory if it does not exist
`Directory`	Must be an existing directory
`FileOrCreate`	Creates the file if it does not exist
`File`	Must be an existing file
`Socket`	Must be an existing Unix socket
`CharDevice`	Must be an existing character device
`BlockDevice`	Must be an existing block device

hostNetwork

spec:
  template:
    spec:
      hostNetwork: true

The pod uses the host’s network namespace. It shares the node’s IP address and can bind to host ports directly. This is used by CNI plugins and some monitoring agents that need to see all network traffic on the node.

hostPID

spec:
  template:
    spec:
      hostPID: true

The pod shares the host’s PID namespace. It can see all processes on the node. This is useful for monitoring agents that need to inspect process trees or send signals to host processes.

Security Implications

All three (hostPath, hostNetwork, hostPID) break pod isolation. A pod with hostPath can read sensitive files. A pod with hostNetwork can bind to any port. A pod with hostPID can see all processes.

In production, these should be paired with:

SecurityContext: Run as non-root, drop capabilities.
PodSecurityAdmission: Use restricted or baseline profiles.
Read-only mounts: Set readOnly: true on hostPath volume mounts.
RBAC: Restrict which ServiceAccounts can create pods with host access.

The demo’s log-collector mounts /var/log/containers as read-only:

volumeMounts:
  - name: varlog
    mountPath: /var/log/containers
    readOnly: true

Resource Requests and Limits

DaemonSet pods compete for resources with application pods. On a busy node, a DaemonSet pod without resource requests might get evicted or starved.

Both demo DaemonSets set conservative resource requests:

resources:
  requests:
    cpu: 25m
    memory: 16Mi
  limits:
    cpu: 50m
    memory: 32Mi

This reserves a small slice of the node for monitoring. The low limits prevent a runaway monitoring script from consuming excessive resources.

In production, set requests based on observed usage. A Fluentd log collector might need 200m CPU and 256Mi memory. An underfunded Fluentd will drop logs under load.

Production Patterns

Log Collection (Fluentd, Fluent Bit, Vector)

The most common DaemonSet use case. A log collector runs on every node, reads container logs from the node filesystem, and ships them to a central system (Elasticsearch, Loki, CloudWatch).

Key design points:

Mount /var/log and /var/lib/docker/containers (or /var/log/pods for CRI-based runtimes).
Use readOnly: true for safety.
Set memory limits carefully. Log collectors can buffer large volumes of data.
Tolerate all taints so logs are collected from every node, including control-plane nodes.

Monitoring Agents (Prometheus Node Exporter, Datadog Agent)

A monitoring agent collects node-level metrics: CPU, memory, disk, network. It exposes a metrics endpoint that Prometheus scrapes.

Key design points:

Mount /proc and /sys for system metrics.
Use hostNetwork: true if the agent needs to see node-level network metrics.
Use hostPID: true if the agent needs to see all processes.
Set appropriate resource requests. Monitoring should not compete with application pods.

CNI Plugins (Calico, Cilium, Flannel)

Container Network Interface plugins run as DaemonSets. They configure networking for every pod on the node.

Key design points:

Use hostNetwork: true because the CNI plugin manages the network itself.
Use OnDelete update strategy because a broken CNI update can take down all networking on the node.
Set priorityClassName: system-node-critical to ensure the CNI pod is never evicted.
Tolerate all taints, including NoExecute.

Storage Drivers (CSI Node Plugins)

CSI node plugins run as DaemonSets. They handle mounting and unmounting volumes on each node.

Key design points:

Mount the host’s /var/lib/kubelet directory.
Use privileged: true security context for mount operations.
Set priorityClassName: system-node-critical.

DaemonSet vs Deployment with Anti-Affinity

You could approximate DaemonSet behavior with a Deployment that has pod anti-affinity and a replica count matching the node count. But this is fragile:

You must manually adjust the replica count when nodes are added or removed.
Pod anti-affinity is a scheduling hint, not a guarantee (for preferred mode).
The scheduler does not understand “one per node” as a first-class concept.

DaemonSets handle node topology natively. They track node membership and reconcile automatically. Use them whenever you need exactly one pod per node.

DaemonSet Controller Internals

The DaemonSet controller runs inside kube-controller-manager. On each reconciliation:

List all nodes. Determine which nodes qualify (based on nodeSelector, affinity, taints).
List all DaemonSet pods. Find pods owned by this DaemonSet.
Compare. For each qualifying node, check if a pod exists.
Create missing pods. If a qualifying node has no pod, create one.
Delete extra pods. If a non-qualifying node has a pod (label changed, taint added), delete it.
Handle updates. If the pod template has changed and the update strategy is RollingUpdate, replace pods according to maxUnavailable and maxSurge.

The controller also watches for node events (new node, node deletion, label changes) to trigger reconciliation immediately.

Priority and Preemption

DaemonSet pods should use priorityClassName to ensure they are not evicted by application pods:

spec:
  template:
    spec:
      priorityClassName: system-node-critical

Built-in priority classes:

Priority Class	Value	Purpose
`system-cluster-critical`	2000000000	Cluster-level infrastructure
`system-node-critical`	2000001000	Node-level infrastructure

DaemonSet pods with system-node-critical priority can preempt lower-priority pods. This ensures critical node infrastructure (logging, monitoring, networking) always has resources.

Rollback

DaemonSets maintain revision history, similar to Deployments. You can roll back to a previous version:

# Check revision history
kubectl rollout history daemonset/node-monitor -n daemonset-demo

# Roll back to the previous revision
kubectl rollout undo daemonset/node-monitor -n daemonset-demo

# Roll back to a specific revision
kubectl rollout undo daemonset/node-monitor --to-revision=2 -n daemonset-demo

The controller applies the old pod template and rolls out the change using the configured update strategy.

Connection to the Demo

The demo manifests illustrate two common patterns:

node-monitor: A monitoring agent that tolerates control-plane taints and mounts host paths for log counting. It shows how to access node-level information from a pod.
log-collector: A log shipper that uses RollingUpdate with maxUnavailable: 1 and mounts /var/log/containers read-only. It shows the minimal setup for collecting container logs.

Both DaemonSets run one pod per node. On a single-node minikube cluster, you see one pod each. Adding a second node with minikube node add demonstrates automatic scheduling.

Common Pitfalls

Missing Tolerations

If your DaemonSet pod is missing from a node, check the node’s taints:

kubectl describe node <node-name> | grep Taints

Add matching tolerations to the DaemonSet pod spec.

Resource Starvation

DaemonSet pods without resource requests can be evicted under memory pressure. Always set resource requests, even if they are small.

hostPath Permissions

Some host paths require root access. If the container runs as non-root, it may get permission denied errors when reading /proc or /sys. Use securityContext.runAsUser: 0 or adjust file permissions.

Update Strategy Mismatch

Using RollingUpdate for a CNI plugin can be dangerous. If the new version has a bug, nodes lose networking as the old pod is replaced. Use OnDelete for critical infrastructure and test updates manually.

DaemonSet: Deep Dive

What DaemonSets Guarantee

How DaemonSet Scheduling Works

The Old Way (Pre-1.12)

The Current Way

Tolerations and Taints

How Taints and Tolerations Interact

Common Taint Effects

DaemonSet-Specific Tolerations

Node Selection: nodeSelector vs Node Affinity

nodeSelector

Node Affinity

When to Use Which

Update Strategies

RollingUpdate (Default)

maxSurge

OnDelete

Host Access: hostPath, hostNetwork, hostPID

hostPath

hostNetwork

hostPID

Security Implications

Resource Requests and Limits

Production Patterns

Log Collection (Fluentd, Fluent Bit, Vector)

Monitoring Agents (Prometheus Node Exporter, Datadog Agent)

CNI Plugins (Calico, Cilium, Flannel)

Storage Drivers (CSI Node Plugins)

DaemonSet vs Deployment with Anti-Affinity

DaemonSet Controller Internals

Priority and Preemption

Rollback

Connection to the Demo

Common Pitfalls

Missing Tolerations

Resource Starvation

hostPath Permissions

Update Strategy Mismatch

Further Reading

See Also