Hello, readers! In this article, we will be focusing on an interesting concept of Kubernetes Horizontal Pod Autoscaler, in detail.
So, let us begin! 🙂
Need of Autoscaling in applications & Infrastructure
Before diving deep into the concepts of Horizontal Pod Autoscaling, it is necessary for us to understand the need of this. So, let us discuss and ponder over the details in terms of necessity.
Consider the below example–
Consider yourself building an application that enables the employees of any company to have a look at:
- their payslips
- the past salaries
- and calculate the annual base income including the appraisals per year
Now, the application is ready and is in production and being hosted as a container using Kubernetes. When do you think the application will be used the most?
The answer is month-end.
Because as the salary gets processed in the last week of every month, we will see maximum traffic hitting the application.
To handle such huge traffic, we need to have a system or structure in place that balances the load, scales the pods in the Kubernetes architecture so that there is no downtime at all due to heavy traffic.
In this scenario, the concept of Autoscaling can help.
Autoscaling, as the name suggests, enables us to scale our underlying infrastructure as and when the load or traffic increases. That is, in the case when there is huge traffic hitting the application, we can scale up the infrastructure. On similar lines, if we wish to scale down the resources, that can be done using the concept of Autoscaling.
Types of Scaling
Scaling are of two types:
- Horizontal scaling: Here, we add more resources to the existing infrastructure by adding more machines.
- Vertical scaling: In this case, we add more resources to the infrastructure by adding CPU and RAM to the existing machines.
By Autoscaling, in the backend, we increase the power of the architecture by either adding the resources to the server or spinning up a new server to support the existing infrastructure.
Let us now understand the concept of Autoscaling specific to the Kubernetes Architecture.
How does Kubernetes Horizontal Pod Autoscaler work?
To implement the concept of Autoscaling, Kubernetes provides us with the option of a Horizontal Pod Autoscaler.
In HPA (Horizontal Pod Autoscaling), it actually scales the number of pods in a replicaset/deployment/replication controller on the basis of certain utilization of CPU or some customized application metrics.
HPA acts as the backend configuration for the replicaset or the deployment. The controller records the metrics in terms of resource usage. And as the threshold value reaches, HPA increases the number of pods according to the specification.
Having understood the concept of HPA, let us implement the same through a live example.
We can apply HPA in Kubernetes through the below ways–
- HPA yaml
- Autoscale imperative command
Example: Horizontal Autoscaling in Pods
In the beginning, let’s implement the concept of HPA through YAML structure–
Enable autoscaling using HPA YAML file:
apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: name: nginx spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: nginx minReplicas: 1 maxReplicas: 8 targetCPUUtilizationPercentage: 60
Here, we have configured an HPA for Nginx Deployment in such a manner that the pods will be scaled to a maximum of 8 pods as soon as the CPU utilization crosses 60%.
kubectl apply -f nginx-deploy-hpa.yaml
kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE nginx Deployment/nginx 0%/60% 1 8 3 21s
Enable autoscaling with autoscale imperative command:
We can also configure an HPA for a replicaset or deployment using the below imperative command–
kubectl autoscale deployment nginx --cpu-percent=60 --min=1 --max=8
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
For more such posts related to Kubernetes, Stay tuned with us.
Till then, Happy Learning! 🙂