We can first start to see this from the perspective of a developer. We know as devs, we write code, that has to get built & deployed somewhere, and eventually then our code has to run on some server. This server ‘serves’ the content/data to the user. This server has to store the data somewhere so it needs some sort of external storage mechanism, this could be a database, server’s disk, etc.
Let’s see this same thing, but from a user’s perspective. If a user needs to access our application, they need to communicate with our said ‘server’. So they can maybe send a request from their browser. Our server could be any sort of server i.e. frontend, backend, etc.
But what if we have a lot of users, and they all are making requests to the server at the same time? And a single computer cannot handle all of its requests on its own? First we can maybe figure out the bottleneck of our server. The CPU? The RAM? we can identify this bottleneck and upgrade it’s CPU/RAM/computer itself such that it can handle that users. This upgrading system is known as ‘vertical scaling’. Take a single resource and making it better. But it has limitations too, it cannot handle infinite requests.
So to make our system better, we can use a different technique which is called horizontal scaling. This is when we take our server, and make copies of it. The benefit of it if we have more users, all of our users don’t have to talk to a single server, they can actually talk to one of the other servers, this way we can handle more requests at the same time. The problem here is