Kubernetes Pod Scaling: A Comprehensive Guide

Overview

Kubernetes provides multiple mechanisms for scaling pods to meet varying workload demands. This document explores the different scaling approaches, their components, and how they work together to maintain application availability and performance.

Types of Pod Scaling

1. Manual Scaling

Direct manipulation of replica counts through kubectl or API calls.

kubectl scale deployment my-app --replicas=5

2. Horizontal Pod Autoscaling (HPA)

Automatically scales the number of pod replicas based on metrics.

3. Vertical Pod Autoscaling (VPA)

Adjusts CPU and memory requests/limits for containers.

4. Cluster Autoscaling

Scales the number of nodes in the cluster based on pod resource requirements.

Horizontal Pod Autoscaler (HPA) Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Metrics API   │    │ HPA Controller  │    │   Deployment    │
│                 │    │                 │    │                 │
│ ┌─────────────┐ │    │ ┌─────────────┐ │    │ ┌─────────────┐ │
│ │   CPU/Mem   │ │◄───┤ │ Decision    │ │────┤ │ Replica Set │ │
│ │   Metrics   │ │    │ │ Engine      │ │    │ │             │ │
│ └─────────────┘ │    │ └─────────────┘ │    │ └─────────────┘ │
│                 │    │                 │    │                 │
│ ┌─────────────┐ │    │ ┌─────────────┐ │    │ ┌─────────────┐ │
│ │ Custom      │ │    │ │ Scale       │ │    │ │    Pods     │ │
│ │ Metrics     │ │    │ │ Calculator  │ │    │ │   (N pods)  │ │
│ └─────────────┘ │    │ └─────────────┘ │    │ └─────────────┘ │
└─────────────────┘    └─────────────────┘    └─────────────────┘
        ▲                        │                        │
        │                        ▼                        ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Metrics Server  │    │ Scaling Events  │    │ Pod Lifecycle   │
│                 │    │                 │    │                 │
│ ┌─────────────┐ │    │ • Scale Up      │    │ • Creating      │
│ │ kubelet     │ │    │ • Scale Down    │    │ • Running       │
│ │ cAdvisor    │ │    │ • Stabilization │    │ • Terminating   │
│ └─────────────┘ │    └─────────────────┘    └─────────────────┘
└─────────────────┘

HPA Scaling Algorithm

Target Replica Calculation

desiredReplicas = ceil[currentReplicas * (currentMetricValue / targetMetricValue)]

Scaling Decision Flow

┌─────────────────┐
│ Collect Metrics │
└─────────┬───────┘
          │
          ▼
┌─────────────────┐      No      ┌─────────────────┐
│ Metrics Valid?  │──────────────┤ Skip Scaling    │
└─────────┬───────┘              └─────────────────┘
          │ Yes
          ▼
┌─────────────────┐
│ Calculate       │
│ Target Replicas │
└─────────┬───────┘
          │
          ▼
┌─────────────────┐      No      ┌─────────────────┐
│ Within          │──────────────┤ No Action       │
│ Tolerance?      │              │ Required        │
└─────────┬───────┘              └─────────────────┘
          │ No
          ▼
┌─────────────────┐      No      ┌─────────────────┐
│ Cooldown        │──────────────┤ Wait for        │
│ Period Over?    │              │ Cooldown        │
└─────────┬───────┘              └─────────────────┘
          │ Yes
          ▼
┌─────────────────┐
│ Apply Scaling   │
│ Decision        │
└─────────────────┘

HPA Configuration Examples

Basic CPU-based HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Multi-metric HPA

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: advanced-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 60
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: requests_per_second
      target:
        type: AverageValue
        averageValue: "100"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 50
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 60
      selectPolicy: Max

Vertical Pod Autoscaler (VPA) Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ VPA Recommender │    │ VPA Updater     │    │ VPA Admission   │
│                 │    │                 │    │ Controller      │
│ ┌─────────────┐ │    │ ┌─────────────┐ │    │ ┌─────────────┐ │
│ │ Resource    │ │    │ │ Pod         │ │    │ │ Webhook     │ │
│ │ Analysis    │ │    │ │ Eviction    │ │    │ │ Injection   │ │
│ └─────────────┘ │    │ └─────────────┘ │    │ └─────────────┘ │
│                 │    │                 │    │                 │
│ ┌─────────────┐ │    │ ┌─────────────┐ │    │ ┌─────────────┐ │
│ │ History     │ │    │ │ Resource    │ │    │ │ Request/    │ │
│ │ Tracking    │ │    │ │ Updates     │ │    │ │ Limit Patch │ │
│ └─────────────┘ │    │ └─────────────┘ │    │ └─────────────┘ │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                        │                        │
         ▼                        ▼                        ▼
┌─────────────────────────────────────────────────────────────┐
│                    Pod Resource Updates                     │
│                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
│  │    Pod      │  │    Pod      │  │    Pod      │        │
│  │ CPU: 100m   │  │ CPU: 200m   │  │ CPU: 150m   │        │
│  │ Mem: 128Mi  │  │ Mem: 256Mi  │  │ Mem: 192Mi  │        │
│  └─────────────┘  └─────────────┘  └─────────────┘        │
└─────────────────────────────────────────────────────────────┘

VPA Configuration Example

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"  # Off, Initial, Auto
  resourcePolicy:
    containerPolicies:
    - containerName: web-container
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2000m
        memory: 2Gi
      controlledResources: ["cpu", "memory"]
      controlledValues: RequestsAndLimits

Cluster Autoscaler Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Cluster Autoscaler                       │
│                                                             │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐        │
│  │ Scale Up    │  │ Scale Down  │  │ Node Group  │        │
│  │ Decision    │  │ Decision    │  │ Management  │        │
│  │             │  │             │  │             │        │
│  │ • Pending   │  │ • Under-    │  │ • AWS ASG   │        │
│  │   Pods      │  │   utilized  │  │ • GCP MIG   │        │
│  │ • Resource  │  │   Nodes     │  │ • Azure     │        │
│  │   Requests  │  │ • Grace     │  │   VMSS      │        │
│  │             │  │   Period    │  │             │        │
│  └─────────────┘  └─────────────┘  └─────────────┘        │
└─────────────────────────────────────────────────────────────┘
           │                    │                    │
           ▼                    ▼                    ▼
┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐
│   Add Nodes     │   │ Remove Nodes    │   │ Cloud Provider  │
│                 │   │                 │   │    API          │
│ ┌─────────────┐ │   │ ┌─────────────┐ │   │ ┌─────────────┐ │
│ │   Node      │ │   │ │ Drain &     │ │   │ │ Instance    │ │
│ │ Provisioning│ │   │ │ Terminate   │ │   │ │ Management  │ │
│ └─────────────┘ │   │ └─────────────┘ │   │ └─────────────┘ │
└─────────────────┘   └─────────────────┘   └─────────────────┘

Scaling Event Timeline

Time: 0s      30s      60s      90s     120s     150s     180s
      │        │        │        │        │        │        │
      ▼        ▼        ▼        ▼        ▼        ▼        ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│Load     │ │Metrics  │ │HPA      │ │Pods     │ │Metrics  │ │Scale    │
│Increase │ │Exceed   │ │Triggers │ │Starting │ │Stabilize│ │Complete │
│         │ │Target   │ │Scale Up │ │         │ │         │ │         │
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘

Phase 1:     Phase 2:     Phase 3:     Phase 4:     Phase 5:
Detection    Decision     Execution    Provisioning Stabilization

Scaling Policies and Behaviors

Scale-Up Policies

scaleUp:
  stabilizationWindowSeconds: 0
  policies:
  - type: Percent
    value: 100         # Double the replicas
    periodSeconds: 15  # Every 15 seconds
  - type: Pods
    value: 4          # Add max 4 pods
    periodSeconds: 60 # Every minute
  selectPolicy: Max   # Choose the larger increase

Scale-Down Policies

scaleDown:
  stabilizationWindowSeconds: 300  # 5-minute stabilization
  policies:
  - type: Percent
    value: 50          # Reduce by 50%
    periodSeconds: 60  # Every minute
  - type: Pods
    value: 2          # Remove max 2 pods
    periodSeconds: 60 # Every minute
  selectPolicy: Min   # Choose the smaller reduction

Resource Metrics Flow

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│    cAdvisor     │    │ Metrics Server  │    │   HPA/VPA       │
│                 │    │                 │    │                 │
│ ┌─────────────┐ │    │ ┌─────────────┐ │    │ ┌─────────────┐ │
│ │ Container   │ │────┤ │ Aggregation │ │────┤ │ Decision    │ │
│ │ Metrics     │ │    │ │ & Storage   │ │    │ │ Making      │ │
│ └─────────────┘ │    │ └─────────────┘ │    │ └─────────────┘ │
│                 │    │                 │    │                 │
│ • CPU Usage     │    │ • 15s samples   │    │ • Scale Up/Down │
│ • Memory Usage  │    │ • Pod averages  │    │ • Resource Adj  │
│ • Network I/O   │    │ • API exposure  │    │ • Policy Apply  │
│ • Disk I/O      │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Custom Metrics Integration

External Metrics (e.g., Queue Length)

metrics:
- type: External
  external:
    metric:
      name: sqs_messages_visible
      selector:
        matchLabels:
          queue: "worker-queue"
    target:
      type: AverageValue
      averageValue: "30"

Object Metrics (e.g., Ingress RPS)

metrics:
- type: Object
  object:
    metric:
      name: requests_per_second
    describedObject:
      apiVersion: networking.k8s.io/v1
      kind: Ingress
      name: main-route
    target:
      type: Value
      value: "1000"

Best Practices for Pod Scaling

1. Resource Requests and Limits

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

2. Readiness and Liveness Probes

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 10

livenessProbe:
  httpGet:
    path: /health/live
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 20

3. Pod Disruption Budgets

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-app

Troubleshooting Scaling Issues

Common HPA Problems

Missing Resource Requests

# Check if pods have resource requests
kubectl describe deployment myapp

Metrics Server Issues

# Verify metrics server is running
kubectl get pods -n kube-system -l k8s-app=metrics-server

# Check if metrics are available
kubectl top nodes
kubectl top pods

HPA Status Check

# Check HPA status and events
kubectl describe hpa myapp-hpa
kubectl get hpa myapp-hpa -o yaml

Scaling Decision Logs

# View HPA controller logs
kubectl logs -n kube-system deployment/metrics-server

# Check scaling events
kubectl get events --sort-by=.metadata.creationTimestamp

Performance Considerations

HPA Performance Tuning

Metric Collection Interval: Default 15 seconds
Scaling Frequency: Default 30 seconds for scale-up, 5 minutes for scale-down
Tolerance: Default ±10% to prevent thrashing
Stabilization Windows: Prevent rapid fluctuations

VPA Performance Impact

Recommendation Quality: Needs 8 days of historical data
Resource Overhead: Additional memory usage for tracking
Pod Restarts: Required for applying new resource limits

Cluster Autoscaler Timing

Scale-up Decision: 10-30 seconds
Node Provisioning: 2-5 minutes (cloud provider dependent)
Scale-down Grace Period: 10 minutes default
Node Removal: 30-60 seconds

Monitoring and Observability

Key Metrics to Monitor

HPA Metrics
- Target vs Current replica count
- Scaling events frequency
- Metric collection latency
Resource Utilization
- CPU/Memory usage patterns
- Request vs limit ratios
- Node resource availability
Application Performance
- Response time during scaling
- Error rates during pod transitions
- Queue lengths and processing delays

Grafana Dashboard Queries

# Current vs Desired Replicas
kube_horizontalpodautoscaler_status_current_replicas{hpa="myapp-hpa"}
kube_horizontalpodautoscaler_status_desired_replicas{hpa="myapp-hpa"}

# Pod CPU/Memory Usage
rate(container_cpu_usage_seconds_total[5m])
container_memory_working_set_bytes

This comprehensive guide covers the essential aspects of Kubernetes pod scaling, providing both theoretical understanding and practical implementation details for effective autoscaling in production environments.