GCP GKE Auto Scaling
GCP GKE Auto Scaling
- Decrease number of replicas for a Deployment with Horizontal Pod Autoscaler
- Decrease CPU request of a Deployment with Vertical Pod Autoscaler
- Decrease number of nodes used in cluster with Cluster Autoscaler
- Automatically create an optimized node pool for workload with Node Auto Provisioning
- Test the autoscaling behavior against a spike in demand
- Overprovision your cluster with Pause Pods
Welcome to Cloud Shell! Type "help" to get started.
Your Cloud Platform project in this session is set to qwiklabs-gcp-01-696b4c0e98b8.
Use “gcloud config set project [PROJECT_ID]” to change to a different project.
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ gcloud config set compute/zone us-west1-a
Updated property [compute/zone].
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ gcloud container clusters create scaling-demo --num-nodes=3 --enable-vertical-pod-autoscaling
Default change: VPC-native is the default mode during cluster creation for versions greater than 1.21.0-gke.1500. To create advanced routes based clusters, please pass the `--no-enable-ip-alias` flag
Note: Your Pod address range (`--cluster-ipv4-cidr`) can accommodate at most 1008 node(s).
Creating cluster scaling-demo in us-west1-a... Cluster is being health-checked (master is healthy)...done.
Created [https://container.googleapis.com/v1/projects/qwiklabs-gcp-01-696b4c0e98b8/zones/us-west1-a/clusters/scaling-demo].
To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-west1-a/scaling-demo?project=qwiklabs-gcp-01-696b4c0e98b8
kubeconfig entry generated for scaling-demo.
NAME: scaling-demo
LOCATION: us-west1-a
MASTER_VERSION: 1.27.8-gke.1067004
MASTER_IP: 35.203.170.65
MACHINE_TYPE: e2-medium
NODE_VERSION: 1.27.8-gke.1067004
NUM_NODES: 3
STATUS: RUNNING
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ cat php-apache.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 3
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl apply -f php-apache.yaml
deployment.apps/php-apache created
service/php-apache created
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
Scale pods with Horizontal Pod Autoscaling
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
php-apache 3/3 3 3 72s
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
horizontalpodautoscaler.autoscaling/php-apache autoscaled
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
This autoscale
command will configure a Horizontal Pod Autoscaler that will maintain between 1 and 10 replicas of the pods controlled by the php-apache
deployment. The cpu-percent
flag specifies 50% as the target average CPU utilization of requested CPU over all the pods. HPA will adjust the number of replicas (via the deployment) to maintain an average CPU utilization of 50% across all pods.
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 1%/50% 1 10 3 114s
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
Under the Targets column you see 1%/50%.
This means that the pods within your deployment are currently at 1% of their target average CPU utilization. This is to be expected as the php-apache
app is receiving no traffic right now.
Also, take note of the Replicas column. To start with, the value will be 3. This number will be changed by the autoscaler as the number of required pods changes.
In this case, the autoscaler will scale the deployment down to the minimum number of pods indicated when you run the autoscale
command. Horizontal Pod Autoscaling takes 5-10 minutes and will require shutting down or starting new pods depending on which way it’s scaling.
Scale size of pods with Vertical Pod Autoscaling
Vertical Pod Autoscaling frees you from having to think about what values to specify for a container’s CPU and memory requests. The autoscaler can recommend values for CPU and memory requests and limits, or it can automatically update the values.
Vertical Pod Autoscaling should not be used alongside Horizontal Pod Autoscaling on CPU or memory. Both autoscalers will try to respond to changes in demand on the same metrics and conflict. However, VPA on CPU or memory can be used with HPA on custom metrics to avoid overlap.
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ gcloud container clusters describe scaling-demo | grep ^verticalPodAutoscaling -A 1
verticalPodAutoscaling:
enabled: true
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
Vertical Pod Autoscaling can be enabled on an existing cluster with
gcloud container clusters update scaling-demo --enable-vertical-pod-autoscaling
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl create deployment hello-server --image=gcr.io/google-samples/hello-app:1.0
deployment.apps/hello-server created
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get deployment hello-server
NAME READY UP-TO-DATE AVAILABLE AGE
hello-server 1/1 1 1 9s
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
Assign a CPU resource request of 450m to the deployment:
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl set resources deployment hello-server --requests=cpu=450m
deployment.apps/hello-server resource requirements updated
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl describe pod hello-server | sed -n "/Containers:$/,/Conditions:/p"
Containers:
hello-app:
Container ID: containerd://b1acb5451e2d957019664b6025aff416b3a7d7e4dc6d28ff6d5cb5a1bbeb38d6
Image: gcr.io/google-samples/hello-app:1.0
Image ID: gcr.io/google-samples/hello-app@sha256:b1455e1c4fcc5ea1023c9e3b584cd84b64eb920e332feff690a2829696e379e7
Port: <none>
Host Port: <none>
State: Running
Started: Thu, 11 Apr 2024 04:09:17 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 450m
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-g4v99 (ro)
Conditions:
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
create a manifest for you Vertical Pod Autoscaler:
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ cat hello-vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: hello-server-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: hello-server
updatePolicy:
updateMode: "Off"
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
The above generates a manifest for a Vertical Pod Autoscaler targeting the hello-server
deployment with an Update Policy of Off. A VPA can have one of three different update policies which can be useful depending on your application:
- Off: this policy means VPA will generate recommendations based on historical data which you can manually apply.
- Initial: VPA recommendations will be used to create new pods once and then won’t change the pod size after.
- Auto: pods will regularly be deleted and recreated to match the size of the recommendations.
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl apply -f hello-vpa.yaml
verticalpodautoscaler.autoscaling.k8s.io/hello-server-vpa created
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get vpa
NAME MODE CPU MEM PROVIDED AGE
hello-server-vpa Off 6s
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl describe vpa hello-server-vpa
Name: hello-server-vpa
Namespace: default
Labels: <none>
Annotations: <none>
API Version: autoscaling.k8s.io/v1
Kind: VerticalPodAutoscaler
Metadata:
Creation Timestamp: 2024-04-11T04:11:32Z
Generation: 1
Resource Version: 7499
UID: 56e63a5e-f38c-477b-b48a-002fa7504cea
Spec:
Target Ref:
API Version: apps/v1
Kind: Deployment
Name: hello-server
Update Policy:
Update Mode: Off
Events: <none>
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
After some time
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl describe vpa hello-server-vpa
Name: hello-server-vpa
Namespace: default
Labels: <none>
Annotations: <none>
API Version: autoscaling.k8s.io/v1
Kind: VerticalPodAutoscaler
Metadata:
Creation Timestamp: 2024-04-11T04:11:32Z
Generation: 2
Resource Version: 7862
UID: 56e63a5e-f38c-477b-b48a-002fa7504cea
Spec:
Target Ref:
API Version: apps/v1
Kind: Deployment
Name: hello-server
Update Policy:
Update Mode: Off
Status:
Conditions:
Last Transition Time: 2024-04-11T04:12:12Z
Message: Some containers have a small number of samples
Reason: hello-app
Status: True
Type: LowConfidence
Last Transition Time: 2024-04-11T04:12:12Z
Status: True
Type: RecommendationProvided
Recommendation:
Container Recommendations:
Container Name: hello-app
Lower Bound:
Cpu: 1m
Memory: 2097152
Target:
Cpu: 2m
Memory: 3145728
Uncapped Target:
Cpu: 2m
Memory: 3145728
Upper Bound:
Cpu: 1180m
Memory: 1595932672
Events: <none>
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
- Lower Bound: this is the lower bound number VPA looks at for triggering a resize. If your pod utilization goes below this, VPA will delete the pod and scale it down.
- Target: this is the value VPA will use when resizing the pod.
- Uncapped Target: if no minimum or maximum capacity is assigned to the VPA, this will be the target utilization for VPA.
- Upper Bound: this is the upper bound number VPA looks at for triggering a resize. If your pod utilization goes above this, VPA will delete the pod and scale it up.
You’ll notice VPA is recommending the CPU request for this container be set to 25m instead of the previous 100m as well as giving you a suggested number for how much memory should be requested. At this point, these recommendations can be manually applied to the hello-server
deployment.
Vertical Pod Autoscaling bases its recommendations on historical data from the container. In practice, it’s recommended to wait at least 24 hours to collect recommendation data before applying any changes.
Update the manifest to set the policy to Auto and apply the configuration:
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ sed -i 's/Off/Auto/g' hello-vpa.yaml
kubectl apply -f hello-vpa.yaml
verticalpodautoscaler.autoscaling.k8s.io/hello-server-vpa configured
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ cat hello-vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: hello-server-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: hello-server
updatePolicy:
updateMode: "Auto"
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
In order to resize a pod, Vertical Pod Autoscaler will need to delete that pod and recreate it with the new size. By default, to avoid downtime, VPA will not delete and resize the last active pod. Because of this, you will need at least 2 replicas to see VPA make any changes.
Scale hello-server
deployment to 2 replicas:
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl scale deployment hello-server --replicas=2
deployment.apps/hello-server scaled
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
hello-server 2/2 2 2 11m
php-apache 1/1 1 1 17m
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get vpa
NAME MODE CPU MEM PROVIDED AGE
hello-server-vpa Auto 2m 3145728 True 8m17s
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get pods -w
NAME READY STATUS RESTARTS AGE
hello-server-7c7f99c596-n9mj5 1/1 Running 0 11m
hello-server-7c7f99c596-r7wvg 1/1 Running 0 70s
php-apache-69f9bc5fd5-2d6h6 1/1 Running 0 18m
hello-server-7c7f99c596-n9mj5 1/1 Running 0 11m
hello-server-7c7f99c596-n9mj5 1/1 Terminating 0 11m
hello-server-7c7f99c596-n9mj5 1/1 Terminating 0 11m
hello-server-7c7f99c596-cvkmx 0/1 Pending 0 0s
hello-server-7c7f99c596-cvkmx 0/1 Pending 0 0s
hello-server-7c7f99c596-cvkmx 0/1 ContainerCreating 0 0s
hello-server-7c7f99c596-n9mj5 0/1 Terminating 0 11m
hello-server-7c7f99c596-n9mj5 0/1 Terminating 0 11m
hello-server-7c7f99c596-n9mj5 0/1 Terminating 0 11m
hello-server-7c7f99c596-n9mj5 0/1 Terminating 0 11m
hello-server-7c7f99c596-cvkmx 1/1 Running 0 2s
This is a sign that your VPA is deleting and resizing your pods.
HPA results
By this point, your Horizontal Pod Autoscaler will have most likely scaled your php-apache
deployment down.
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 1%/50% 1 10 1 19m
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
You’ll see that your php-apache
deployment has been scaled down to 1 pod.
VPA results
Now, the VPA should have resized your pods in the hello-server
deployment.
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl describe pod hello-server | sed -n "/Containers:$/,/Conditions:/p"
Containers:
hello-app:
Container ID: containerd://15adff5ee96b6b39332baa723b1653fe9dc0eccafa86156d7bd10868749e3ce3
Image: gcr.io/google-samples/hello-app:1.0
Image ID: gcr.io/google-samples/hello-app@sha256:b1455e1c4fcc5ea1023c9e3b584cd84b64eb920e332feff690a2829696e379e7
Port: <none>
Host Port: <none>
State: Running
Started: Thu, 11 Apr 2024 04:21:03 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 2m
memory: 3145728
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2f7zw (ro)
Conditions:
Containers:
hello-app:
Container ID: containerd://95b0eb44f3c159a4df4a96fc80d016c36b66fc90f54db068497412905182a130
Image: gcr.io/google-samples/hello-app:1.0
Image ID: gcr.io/google-samples/hello-app@sha256:b1455e1c4fcc5ea1023c9e3b584cd84b64eb920e332feff690a2829696e379e7
Port: <none>
Host Port: <none>
State: Running
Started: Thu, 11 Apr 2024 04:19:30 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 2m
memory: 3145728
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-q449t (ro)
Conditions:
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
Cluster autoscaler
The Cluster Autoscaler is designed to add or remove nodes based on demand. When demand is high, cluster autoscaler will add nodes to the node pool to accommodate that demand. When demand is low, cluster autoscaler will scale your cluster back down by removing nodes. This allows you to maintain high availability of your cluster while minimizing superfluous costs associated with additional machines.
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ gcloud beta container clusters update scaling-demo --enable-autoscaling --min-nodes 1 --max-nodes 5
Updating scaling-demo...done.
Updated [https://container.googleapis.com/v1beta1/projects/qwiklabs-gcp-01-696b4c0e98b8/zones/us-west1-a/clusters/scaling-demo].
To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-west1-a/scaling-demo?project=qwiklabs-gcp-01-696b4c0e98b8
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
When scaling a cluster, the decision of when to remove a node is a trade-off between optimizing for utilization or the availability of resources. Removing underutilized nodes improves cluster utilization, but new workloads might have to wait for resources to be provisioned again before they can run.
You can specify which autoscaling profile to use when making such decisions. The currently available profiles are:
-
Balanced: The default profile.
-
Optimize-utilization: Prioritize optimizing utilization over keeping spare resources in the cluster. When selected, the cluster autoscaler scales down the cluster more aggressively. It can remove more nodes, and remove nodes faster. This profile has been optimized for use with batch workloads that are not sensitive to start-up latency.
Switch to the
optimize-utilization
autoscaling profile so that the full effects of scaling can be observed:tudent_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ gcloud beta container clusters update scaling-demo \ --autoscaling-profile optimize-utilization Updating scaling-demo...working Updating scaling-demo...done. Updated [https://container.googleapis.com/v1beta1/projects/qwiklabs-gcp-01-696b4c0e98b8/zones/us-west1-a/clusters/scaling-demo]. To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-west1-a/scaling-demo?project=qwiklabs-gcp-01-696b4c0e98b8 student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
By default, most of the system pods from these deployments will prevent cluster autoscaler from taking them completely offline to reschedule them. Generally, this is desired because many of these pods collect data used in other deployments and services. For example, metrics-agent being temporarily down would cause a gap in data collected for VPA and HPA, or the fluentd pod being down could create a gap in your cloud logs.
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get deployment -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
event-exporter-gke 1/1 1 1 37m
konnectivity-agent 3/3 3 3 37m
konnectivity-agent-autoscaler 1/1 1 1 37m
kube-dns 2/2 2 2 37m
kube-dns-autoscaler 1/1 1 1 37m
l7-default-backend 1/1 1 1 37m
metrics-server-v0.5.2 1/1 1 1 37m
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
For the purpose of this lab, you will apply Pod Disruption Budgets to your kube-system
pods which will allow cluster autoscaler to safely reschedule them on another node. This will give enough room to scale your cluster down.
Pod Disruption Budgets (PDB) define how Kubernetes should handle disruptions like upgrades, pod removals, running out of resources, etc. In PDBs, you can specify the max-unavailable
and/or the min-available
number of pods a deployment should have.
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl create poddisruptionbudget kube-dns-pdb --namespace=kube-system --selector k8s-app=kube-dns --max-unavailable 1
kubectl create poddisruptionbudget prometheus-pdb --namespace=kube-system --selector k8s-app=prometheus-to-sd --max-unavailable 1
kubectl create poddisruptionbudget kube-proxy-pdb --namespace=kube-system --selector component=kube-proxy --max-unavailable 1
kubectl create poddisruptionbudget metrics-agent-pdb --namespace=kube-system --selector k8s-app=gke-metrics-agent --max-unavailable 1
kubectl create poddisruptionbudget metrics-server-pdb --namespace=kube-system --selector k8s-app=metrics-server --max-unavailable 1
kubectl create poddisruptionbudget fluentd-pdb --namespace=kube-system --selector k8s-app=fluentd-gke --max-unavailable 1
kubectl create poddisruptionbudget backend-pdb --namespace=kube-system --selector k8s-app=glbc --max-unavailable 1
kubectl create poddisruptionbudget kube-dns-autoscaler-pdb --namespace=kube-system --selector k8s-app=kube-dns-autoscaler --max-unavailable 1
kubectl create poddisruptionbudget stackdriver-pdb --namespace=kube-system --selector app=stackdriver-metadata-agent --max-unavailable 1
kubectl create poddisruptionbudget event-pdb --namespace=kube-system --selector k8s-app=event-exporter --max-unavailable 1
poddisruptionbudget.policy/kube-dns-pdb created
poddisruptionbudget.policy/prometheus-pdb created
poddisruptionbudget.policy/kube-proxy-pdb created
poddisruptionbudget.policy/metrics-agent-pdb created
poddisruptionbudget.policy/metrics-server-pdb created
poddisruptionbudget.policy/fluentd-pdb created
poddisruptionbudget.policy/backend-pdb created
poddisruptionbudget.policy/kube-dns-autoscaler-pdb created
poddisruptionbudget.policy/stackdriver-pdb created
poddisruptionbudget.policy/event-pdb created
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
In each of these commands, you are selecting a different kube-system deployment pod based on a label defined in its creation and specifying that there can be 1 unavailable pod for each of these deployments. This will allow the autoscaler to reschedule the system pods.
With the PDBs in place, your cluster should scale down from three nodes to two nodes in a minute or two.
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-scaling-demo-default-pool-0e73e0ba-3s7q NotReady <none> 42m v1.27.8-gke.1067004
gke-scaling-demo-default-pool-0e73e0ba-hd9j Ready <none> 42m v1.27.8-gke.1067004
gke-scaling-demo-default-pool-0e73e0ba-q83h Ready <none> 42m v1.27.8-gke.1067004
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-scaling-demo-default-pool-0e73e0ba-3s7q NotReady <none> 42m v1.27.8-gke.1067004
gke-scaling-demo-default-pool-0e73e0ba-hd9j Ready <none> 42m v1.27.8-gke.1067004
gke-scaling-demo-default-pool-0e73e0ba-q83h Ready <none> 42m v1.27.8-gke.1067004
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
You set up automation that scaled your cluster down from three nodes to two nodes!
Thinking about the costs, as a result of scaling down your nodepool, you will be billed for less machines during periods of low demand on your cluster. This scaling could be even more dramatic if you were fluctuating from high demand to low demand periods during the day.
It’s important to note that, while Cluster Autoscaler removed an unnecessary node, Vertical Pod Autoscaling and Horizontal Pod Autoscaling helped reduce enough CPU demand so that the node was no longer needed. Combining these tools is a great way to optimize your overall costs and resource usage.
So, the cluster autoscaler helps add and remove nodes in response to pods needing to be scheduled. However, GKE specifically has another feature to scale vertically, called node auto-provisioning.
Node Auto Provisioning
Node Auto Provisioning (NAP) actually adds new node pools that are sized to meet demand. Without node auto provisioning, the cluster autoscaler will only be creating new nodes in the node pools you’ve specified, meaning the new nodes will be the same machine type as the other nodes in that pool. This is perfect for helping optimize resource usage for batch workloads and other apps that don’t need extreme scaling, since creating a node pool that is specifically optimized for your use case might take more time than just adding more nodes to an existing pool.
-
Enable Node Auto Provisioning:
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ gcloud container clusters update scaling-demo \ --enable-autoprovisioning \ --min-cpu 1 \ --min-memory 2 \ --max-cpu 45 \ --max-memory 160 Updating scaling-demo...done. Updated [https://container.googleapis.com/v1/projects/qwiklabs-gcp-01-696b4c0e98b8/zones/us-west1-a/clusters/scaling-demo]. To inspect the contents of your cluster, go to: https://console.cloud.google.com/kubernetes/workload_/gcloud/us-west1-a/scaling-demo?project=qwiklabs-gcp-01-696b4c0e98b8 student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
In the command, you specify a minimum and maximum number for your CPU and memory resources. This is for the entire cluster.
NAP can take a little bit of time and it’s also highly likely it won’t create a new node pool for the scaling-demo cluster at its current state.
Test with larger demand
So far, you’ve analyzed how HPA, VPA, and cluster autoscaler can help save resources and costs while your application has low demand. Now, you’ll look at how these tools handle availability for increased demand.
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
If you don't see a command prompt, try pressing enter.
OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 1%/50% 1 10 1 48m
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 173%/50% 1 10 1 48m
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
Wait and rerun the command until you see your target above 100%.
Now, monitor how your cluster handles the increased load by periodically running this command:
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get deployment php-apache
NAME READY UP-TO-DATE AVAILABLE AGE
php-apache 2/4 4 2 50m
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
After a few minutes, you will see a few things happen.
- First, your
php-apache
deployment will automatically be scaled up by HPA to handle the increased load. - Then, cluster autoscaler will need to provision new nodes to handle the increased demand.
- Finally, node auto provisioning will create a node pool optimized for the CPU and memory requests of your cluster’s workloads. In this case, it should be a high cpu, low memory node pool because the load test is pushing the cpu limits.
Wait until your php-apache
deployment is scaled up to 7 replicas and your nodes tab looks similar to this:
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
hello-server 2/2 2 2 45m
php-apache 6/6 6 6 52m
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-scaling-demo-default-pool-0e73e0ba-ctx4 Ready <none> 116s v1.27.8-gke.1067004
gke-scaling-demo-default-pool-0e73e0ba-hd9j Ready <none> 53m v1.27.8-gke.1067004
gke-scaling-demo-default-pool-0e73e0ba-pwnr Ready <none> 54s v1.27.8-gke.1067004
gke-scaling-demo-default-pool-0e73e0ba-q83h Ready <none> 53m v1.27.8-gke.1067004
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
Your cluster efficiently scaled up to meet a higher demand! However, take note of the amount of time it took to handle this spike in demand. For many applications, losing availability while provisioning new resources can be an issue.
Optimize larger loads
When scaling up for larger loads, horizontal pod autoscaling will add new pods while vertical pod autoscaling will resize them depending on your settings. If there’s room on an existing node, it might be able to skip pulling the image and immediately start running the application on a new pod. If you’re working with a node that hasn’t deployed your application before, a bit of time might be added if it needs to download the container images before running it.
So, if you don’t have enough room on your existing nodes and you’re using the cluster autoscaler, it could take even longer. Now it needs to provision a new node, set it up, then download the image and start up pods. If the node auto-provisioner is going to create a new node pool like it did in your cluster, there will be even more time as you provision the new node pool first, and then go through all the same steps for the new node.
In order to handle these different latencies for autoscaling, you’ll probably want to over-provision a little bit so there’s less pressure on your apps when autoscaling-up. This is really important for cost-optimization, because you don’t want to pay for more resources than you need, but you also don’t want your apps’ performance to suffer.
An efficient strategy to overprovision a cluster with Cluster Autoscaling is to use Pause Pods.
Pause Pods are low priority deployments which are able to be removed and replaced by high priority deployments. This means you can create low priority pods which don’t actually do anything except reserve buffer space. When the higher-priority pod needs room, the pause pods will be removed and rescheduled to another node, or a new node, and the higher-priority pod has the room it needs to be scheduled quickly.
Create a manifest for a pause pod:
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ cat pause-pod.yaml
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: overprovisioning
value: -1
globalDefault: false
description: "Priority class used by overprovisioning."
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: overprovisioning
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
run: overprovisioning
template:
metadata:
labels:
run: overprovisioning
spec:
priorityClassName: overprovisioning
containers:
- name: reserve-resources
image: k8s.gcr.io/pause
resources:
requests:
cpu: 1
memory: 4Gi
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ kubectl apply -f pause-pod.yaml
priorityclass.scheduling.k8s.io/overprovisioning created
deployment.apps/overprovisioning created
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$
Observe how a new node is created, most likely in a new node pool, to fit your newly created pause pod. Now, if you were to run the load test again, when you needed an extra node for your php-apache
deployment, it could be scheduled on the node with your pause pod while your pause pod is instead put on a new node. This is excellent because your dummy pause pods allow your cluster to provision a new node in advance so that your actual application can scale up much faster. If you were expecting higher amounts of traffic, you could add more pause pods, but it’s considered best practice to not add more than one pause pod per node.
In this lab, you configured a cluster to automatically and efficiently scale up or down based on its demand. Horizontal Pod Autoscaling and Vertical Pod Autoscaling provided solutions for automatically scaling your cluster’s deployments while Cluster Autoscaler and Node Auto Provisioning provided solutions for automatically scaling your cluster’s infrastructure.
As always, knowing which of these tools to use will depend on your workload. Careful use of these autoscalers can mean maximizing availability when you need it while only paying for what you need during times of low demand. When thinking about costs, this means you are optimizing your resource usage and saving money.
## Summary
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$ history
1 gcloud config set compute/zone us-west1-a
2 gcloud container clusters create scaling-demo --num-nodes=3 --enable-vertical-pod-autoscaling
3 cat << EOF > php-apache.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
spec:
selector:
matchLabels:
run: php-apache
replicas: 3
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache
EOF
4 cat php-apache.yaml
5 kubectl apply -f php-apache.yaml
6 kubectl get deployment
7 kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
8 kubectl get deployment
9 kubectl get deployment
10 kubectl get hpa
11 kubectl get hpa
12 kubectl get deployment
13 gcloud container clusters describe scaling-demo | grep ^verticalPodAutoscaling -A 1
14 kubectl create deployment hello-server --image=gcr.io/google-samples/hello-app:1.0
15 kubectl get deployment hello-server
16 kubectl set resources deployment hello-server --requests=cpu=450m
17 kubectl describe pod hello-server | sed -n "/Containers:$/,/Conditions:/p"
18 cat << EOF > hello-vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: hello-server-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: hello-server
updatePolicy:
updateMode: "Off"
EOF
19 cat hello-vpa.yaml
20 kubectl apply -f hello-vpa.yaml
21 kubectl get vpa
22 kubectl describe vpa hello-server-vpa
23 kubectl describe vpa hello-server-vpa
24 sed -i 's/Off/Auto/g' hello-vpa.yaml
25 kubectl apply -f hello-vpa.yaml
26 cat hello-vpa.yaml
27 kubectl scale deployment hello-server --replicas=2
28 kubectl get deployment
29 kubectl get pva
30 kubectl get vpa
31 kubectl get pods -w
32 kubectl get hpa
33 kubectl describe pod hello-server | sed -n "/Containers:$/,/Conditions:/p"
34 gcloud beta container clusters update scaling-demo --enable-autoscaling --min-nodes 1 --max-nodes 5
35 gcloud beta container clusters update scaling-demo --autoscaling-profile optimize-utilization
36 kubectl get deployment -n kube-system
37 kubectl create poddisruptionbudget kube-dns-pdb --namespace=kube-system --selector k8s-app=kube-dns --max-unavailable 1
38 kubectl create poddisruptionbudget prometheus-pdb --namespace=kube-system --selector k8s-app=prometheus-to-sd --max-unavailable 1
39 kubectl create poddisruptionbudget kube-proxy-pdb --namespace=kube-system --selector component=kube-proxy --max-unavailable 1
40 kubectl create poddisruptionbudget metrics-agent-pdb --namespace=kube-system --selector k8s-app=gke-metrics-agent --max-unavailable 1
41 kubectl create poddisruptionbudget metrics-server-pdb --namespace=kube-system --selector k8s-app=metrics-server --max-unavailable 1
42 kubectl create poddisruptionbudget fluentd-pdb --namespace=kube-system --selector k8s-app=fluentd-gke --max-unavailable 1
43 kubectl create poddisruptionbudget backend-pdb --namespace=kube-system --selector k8s-app=glbc --max-unavailable 1
44 kubectl create poddisruptionbudget kube-dns-autoscaler-pdb --namespace=kube-system --selector k8s-app=kube-dns-autoscaler --max-unavailable 1
45 kubectl create poddisruptionbudget stackdriver-pdb --namespace=kube-system --selector app=stackdriver-metadata-agent --max-unavailable 1
46 kubectl create poddisruptionbudget event-pdb --namespace=kube-system --selector k8s-app=event-exporter --max-unavailable 1
47 kubectl get nodes
48 kubectl get nodes -w
49 kubectl get nodes
50 kubectl get nodes
51 kubectl get nodes
52 kubectl get nodes
53 kubectl get nodes
54 kubectl get nodes
55 kubectl get nodes
56 kubectl get nodes
57 kubectl get nodes
58 kubectl get nodes
59 kubectl get nodes
60 kubectl get nodes
61 kubectl get nodes
62 kubectl get nodes
63 kubectl get nodes
64 kubectl get nodes
65 gcloud container clusters update scaling-demo --enable-autoprovisioning --min-cpu 1 --min-memory 2 --max-cpu 45 --max-memory 160
66 kubectl get hpa
67 kubectl get hpa
68 kubectl get deployment php-apache
69 kubectl get deployment php-apache
70 kubectl get deployment
71 kubectl get nodes
72 kubectl get deployment
73 kubectl get deployment
74 kubectl get deployment
75 kubectl get nodes
76 cat << EOF > pause-pod.yaml
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: overprovisioning
value: -1
globalDefault: false
description: "Priority class used by overprovisioning."
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: overprovisioning
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
run: overprovisioning
template:
metadata:
labels:
run: overprovisioning
spec:
priorityClassName: overprovisioning
containers:
- name: reserve-resources
image: k8s.gcr.io/pause
resources:
requests:
cpu: 1
memory: 4Gi
EOF
77 cat pause-pod.yaml
78 kubectl apply -f pause-pod.yaml
79 kubectl get nodes
80 history
student_01_fa9e4f85d2dc@cloudshell:~ (qwiklabs-gcp-01-696b4c0e98b8)$