Kubernetes Pod Scheduling and Node Affinity
Scheduling Workloads
Section titled “Scheduling Workloads”Control exactly where your pods run in the cluster using Node Affinity, Taints, and Tolerations. This guide explains how to target specific nodes (like an ARM64 worker) while ensuring high availability.
Concepts
Section titled “Concepts”Node Labels
Section titled “Node Labels”Nodes can be labeled with key-value pairs. Standard labels include:
kubernetes.io/arch:amd64,arm64kubernetes.io/os:linuxnode-role.kubernetes.io/worker:worker
You can add custom labels:
kubectl label node <node-name> disk=ssdNode Affinity
Section titled “Node Affinity”Affinity allows you to constrain which nodes your pod is eligible to be scheduled on.
- Required (
requiredDuringSchedulingIgnoredDuringExecution): The pod must run on a matching node. If no match is found, the pod stays Pending. - Preferred (
preferredDuringSchedulingIgnoredDuringExecution): The scheduler tries to find a matching node. If not found, it schedules the pod anywhere.
Zero Downtime Strategy
Section titled “Zero Downtime Strategy”When moving workloads or updating deployments, ensure zero downtime by configuring the RollingUpdate strategy correctly.
- maxUnavailable: 0: Ensure no old pods are killed until new ones are Ready.
- maxSurge: 1: Allow creating 1 extra pod above the desired count during updates.
Example: Preferring a Specific Worker Node
Section titled “Example: Preferring a Specific Worker Node”This configuration encourages the pod to run on any node labeled with node-role.kubernetes.io/worker=worker (for example, a dedicated worker), but allows it to run on the control plane if the worker is down.
apiVersion: apps/v1kind: Deploymentmetadata: name: my-appspec: replicas: 1 strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 0 maxSurge: 1 template: spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 preference: matchExpressions: - key: node-role.kubernetes.io/worker operator: In values: - worker containers: - name: my-app image: my-image:latestTaints and Tolerations
Section titled “Taints and Tolerations”Taints repel pods from nodes. A pod must have a matching toleration to be scheduled on a tainted node.
Example Taint:
kubectl taint nodes node1 key=value:NoScheduleExample Toleration:
tolerations:- key: "key" operator: "Equal" value: "value" effect: "NoSchedule"Best Practices: Keep it Flexible
Section titled “Best Practices: Keep it Flexible”Avoid pinning every application to a specific node.
Adding affinity or nodeSelector to everything create rigid constraints that fight against Kubernetes’ ability to self-heal.
When to use Affinity
Section titled “When to use Affinity”- Hardware requirements: The app needs a specific GPU, USB device, or architecture (ARM64 vs AMD64) present only on certain nodes.
- Performance/cost: Offload non-critical apps to a cheaper/slower node.
- Data gravity: Kubernetes handles this for you. If a pod uses a PVC on a specific node, Kubernetes naturally schedules the pod there.
The “Filebrowser Lesson” (Anti-Pattern)
Section titled “The “Filebrowser Lesson” (Anti-Pattern)”Do not force a pod to a node if it shares dependencies with other pods.
- Scenario: Filebrowser shared a volume with Jellyfin.
- Constraint 1: Jellyfin must run on
node-1(GPU). - Constraint 2: Filebrowser’s volume is therefore locked to
node-1. - Error: Forcing Filebrowser to
node-2caused a deadlock because the volume could not follow it. - Fix: Remove the affinity. Let Kubernetes co-locate them naturally.