Understanding Kubernetes Horizontal Pod Autoscaling

In previous labs, you deployed a single pod and a deployment with a set number of replicas and it scaled it manually. In this lab, you will deploy a HorizontalPodAutoscaler to automatically scale the deployment when a certain condition is met. Conditions such as elevated CPU or memory usage are examples of conditions that can trigger an autoscale.

To begin, create a dedicated directory for this lab and switch into it:

cd ~

mkdir random-facts-app-autoscaling && cd random-facts-app-autoscaling

Create a new namespace called random-facts-app-autoscaling with the label lab=random-facts-app-autoscaling.

Create a Deployment manifest for the application with the following contents:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: random-facts-app
  namespace: random-facts-app-autoscaling
  labels:
    lab: random-facts-app-autoscaling
spec:
  replicas: 1
  selector:
    matchLabels:
      lab: random-facts-app-autoscaling
  template:
    metadata:
      labels:
        lab: random-facts-app-autoscaling
    spec:
      containers:
      - name: random-facts-app
        image: us-central1-docker.pkg.dev/<YOUR_PROJECT_ID>/<YOUR_REGISTRY_NAME>/random-facts-app:1.0
        ports:
        - containerPort: 5000
        resources:
          requests:
            cpu: 0.1
            memory: 256M

Apply it to your cluster and validate the pods are running with the kubectl get pods command.

Create a service with a LoadBalancer type. We will need a public IP address to send traffic to our application to trigger a scale up.

apiVersion: v1
kind: Service
metadata:
  name: random-facts-app-service
  namespace: random-facts-app-autoscaling
  labels:
    lab: random-facts-app-autoscaling
spec:
  selector:
    lab: random-facts-app-autoscaling
  ports:
  - name: http
    port: 5000
    protocol: TCP
    targetPort: 5000
  type: LoadBalancer

Apply your Service manifest and then use the kubectl get service command to retrieve the External IP.

Note: If it says <pending>, give it a few moments for GCP to assign a public IP.

NAME              TYPE        CLUSTER-IP     EXTERNAL-IP         PORT(S)   AGE
autoscaling-app   ClusterIP   10.43.149.39   <YOUR_EXTERNAL_IP>  80/TCP    18m

Open a new tab in your browser and navigate to http://<YOUR_EXTERNAL_IP>:5000.

Configure Autoscaling #

Next, we will configure the Deployment to scale up/down to maintain an average CPU utilization of 15% across all pods.

Create the HorizontalPodAutoscaler manifest for the application with the following contents:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: random-facts-app-autoscaler
  namespace: random-facts-app-autoscaling
  labels:
    lab: random-facts-app-autoscaling
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: random-facts-app
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 15

Save your manifest and apply it to the cluster.

Use the kubectl get hpa command to view the status of the HorizontalPodAutoscaler. Initially, the pods will not have been running for long, and therefore the Kubernetes metrics server (which houses utilization metrics for all workloads in the cluster) will not have any statistics for the app.

You will something similar to the following:

NAME                          REFERENCE                     TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
random-facts-app-autoscaler   Deployment/random-facts-app   <unknown>/15%   1         3         0          7s

After a few minutes, <unknown> should be replaced with an actual percentage. As shown above, 15% is the target utilization that we specified above in the manifest, and the hpa controller will attempt to automatically adjust the number of replicas to maintain that target.

Generate traffic and Observe Autoscaler Behaviour #

On your workstation, we will be using the Apache HTTP server benchmarking tool (ab) to generate traffic. This will cause an increase of cpu usage of the app above the utilization target, and ultimately will cause the app to scale up.

In your current Cloud Shell terminal, run the following command to watch your autoscaler:

watch kubectl get hpa -n random-facts-app-autoscaling

In a new Cloud Shell terminal, run the following command to watch your pods:

watch kubectl get pods -n random-facts-app-autoscaling

Finally, in a third Cloud Shell terminal, run the following ab command to generate traffic.

ab -n 100000000 -c 10 http://<YOUR_EXTERNAL_IP>:5000/

The above command will send 100,000,000 GET requests to your app, over 10 connections concurrently. Please ensure that there is a trailing / at the end of the URL. This is required and if omitted will result in an invalid URL error.

Important: If the above ab command is not found or failing due to firewall blocks, reinstall the ab tool and use kubectl port-forward as a workaround.

# Reinstall the ab tool
sudo apt get update
sudo apt install apache2-utils -y

# Then, use port forwarding and a modified ab command
kubectl port-forward service/random-facts-app-autoscaling 5000:http --namespace random-facts-app-autoscaling

ab -n 100000000 -c 10 http://localhost:5000/

While the ab load generator is running, flip back and forth through your two terminals where you are watching the HorizontalPodAutoscaler and the Pods. Notice that there is now a spike in targets and that two new pods get deployed in an attempt to keep up with the traffic.

Scaling Down #

To scale down the app, we simply stop ab from generating traffic, we can reduce the amount of requests, or raise the target utilization. When traffic is stopped, HorizontalPodAutoscaler will detect that CPU utilization is below 15% and will begin to stop pods automatically. However, this is done after a delay, to reduce the chance of flapping the replica count (or preventing the number of replicas changing rapidly, potentially causing instability.)

If the ab command is still running, stop it and close that Cloud Shell terminal.

In the Cloud Shell where you were watching the HorizontalPodAutoscaler, notice that the that the CPU utilization metric has decreased below the threshold of 15%.

And in the Cloud Shell where you were watching the Pods, notice the pods start to terminate if they haven’t already. You eventually will see just one pod running, which matches the minimum pods parameter set in the autoscaling configuration above.

NAME                                READY   STATUS    RESTARTS   AGE
random-facts-app-6f9984d959-dvrv7   1/1     Running   0          27m

Clean Up #

Before moving onto the next lab, run the following command to delete your Service:

kubectl delete service random-facts-app-service -n random-facts-app-autoscaling