Scaling microservices

There are two aspects to scaling a microservice with Kubernetes. The first aspect is scaling the number of pods backing up a particular microservice. The second aspect is the total capacity of the cluster. You can easily scale a microservice explicitly by updating the number of replicas of a deployment, but that requires constant vigilance on your part. For services that have large variations in the volume of requests they handle over long periods (for example, business hours versus off hours or week days versus weekends), it might take a lot of effort. Kubernetes provides horizontal pod autoscaling, which is based on CPU, memory, or custom metrics, and can scale your service up and down automatically.

Here is how to scale our nginx deployment that is currently fixed at three replicas to go between 2 and 5, depending on the average CPU usage across all instances:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: nginx
namespace: default
spec:
maxReplicas: 5
minReplicas: 2
targetCPUUtilizationPercentage: 90
scaleTargetRef:
apiVersion: v1
kind: Deployment
name: nginx

The outcome is that Kubernetes will watch CPU utilization of the pods that belong to the nginx deployment. When the average CPU over a certain period of time (5 minutes, by default) exceeds 90%, it will add more replicas until the maximum of 5, or until utilization drops below 90%. The HPA can scale down too, but will always maintain a minimum of two replicas, even if the CPU utilization is zero.