Autoscaling

ReadyMage autoscaling explained with case study

ReadyMage hosting is created on Kubernetes which allows it to provide autoscaling features for its customers.

In ReadyMage, your resources of infrastructure components - MySQL, Front-end, Back-end, Varnish, ElasticSearch, are stored on pods. Pods, in turn, run on nodes, which are virtual AWS servers.

Pods are the smallest, most basic deployable objects in Kubernetes. A Pod represents a single instance of a running process in your cluster.

Why is autoscaling needed?

Imagine you have a production instance with the server resources set to handle only normal traffic. And once the sales period starts and the traffic increases, ideally, we want the website to still handle it and not have downtime. But at the same time, we want to keep default resources low so we do not need to pay for the resources that are not used. This is a case where autoscaling helps.

If we now consider the ReadyMage infrastructure, in other words, it would sound like that: we do not want to keep resource limits of one pod bigger than needed on a regular basis but we want the server to still handle the high load by assigning more resources automatically.

Real-life case study

On June 21st, beautyworksonline, hosted on ReadyMage, received more traffic than usual due to the unexpected promotion.

After a big blog post by a popular blogger, beautyworksonline received x3 more traffic than usual having around 350 concurrent users on the website.

At 8:00 PM, when the traffic went up, ReadyMage infrastructure was creating pods one by one until the total CPU amount could handle the incoming traffic.

Once it got normalized back after 1:00 AM, the unused pods were scaled down.

How does it work?

ReadyMage provides the ability of horizontal autoscaling for front-end and back-end pods.

Horizontal scaling allows you to automatically increase or decrease the number of running pods as your application’s usage changes.

As your consumed resources hit the HPA (Horizontal Pod Autoscaling) limit, the green line on the graph above, our infrastructure creates a new replica of a pod for the same infrastructure component and distributes the traffic equally between all pods.

HPA limit is calculated programmatically based on the requested resource amount that is set for the pod, the yellow line on the graph above. In ReadyMage, the HPA is 75% of this limit. It is less than the limit for safety purposes, if the replica pod requires more time to get up, your current pod will still have some available resources while waiting for it.

In the example of the CPU consumption above, the total limit is 0.8 CPU and the HPA is 0.6 CPU. At 1:05 AM, the resources required to hit the HPA limit, and a new replica of the same pod was created.

For scaling down, each 5 min the traffic is checked for front-end and back-end pods, and if the total consumed CPU can be handled by fewer pods, we scale down the allocated resources by killing unused pods. So when the traffic is back to normal, the resources are returned back to the default state and you do not need to pay more than needed.

Referring to the same example, at 1:10 AM the additional pod was killed, and the resources were allocated back to normal.

For scaling up there is no stabilization window. When the metrics indicate that the target should be scaled up the target is scaled up immediately.

Last updated