Part 2 — Scalability in Distributed systems

Akash Agarwal
10 min readDec 29, 2020


What is Scalability

In terms of Wikipedia

Scalability is a property of a system to handle a growing amount of work by adding resources to the system. In layman terms, scalability is basically, increasing or decreasing the operational capacity of systems / servers in order to make an efficient and optimized use of resources.

For example, if we realize that our application’s server is getting slower with an increase in user traffic and our CPU is not able to handle that much load, then we ideally have two options. First, we can add up more resources to an existing system by increasing its RAM, Disk, Processor, or any other required hardware. Second, we can add more parallel servers/systems and redirect all the user traffic efficiently among them in order to distribute the processing load. Similarly, if after a certain period of time, our application faces a downfall of requests, then we may remove those additional resources temporarily and save up on our operational costs. This process of upscaling and downscaling the application system, as and when required, is known as scalability in distributed systems.

There are two types of Scalability approach — Vertical scaling (aka scale-up) and horizontal scaling (aka scale-out).

Vertical scaling

When new resources are added to an existing machine in order to meet the application’s throughput or performance expectations, this kind of scaling is known as vertical scaling.

These resources include Physical memory like RAM or Hard-disk, Processing power, number of CPUs/cores, Logical processors, kernel optimizations, etc.

For example, to efficiently handle network throughput, we can tune in our existing server by,

  • Increasing/decreasing the RAM memory (8 Gigabyte, 16 Gigabyte, etc.)
  • Changing server’s hard-disk from traditional HDD (~approx 700–800 rpm) to SSD (~approx 1500 rpm)
  • Increasing processor capacity (Quadcore — 4 cores, Octacore — 8 cores, or even up to 32 cores now like AMD Ryzen Threadripper 1950x).

There are several benefits of using vertical scaling

  • Faster and easier to scale in the beginning for relatively smaller applications.
  • Cost-Effective, as we need to just add new hardware resources on the same system. But it actually depends, because many hardware resources like RAM or high-end processors are even costlier than adding a new commodity machine.

At the same time, there are multiple limitations to vertical scaling.

  • First and foremost, there is always an upper limit up to which we can scale our machines. If our requirement exceeds that limit, we are left with none option other than going for horizontal scaling.
  • Secondly, as stated earlier, sometimes adding more cutting edge hardware to an existing system end up being costlier than adding another machine with commodity hardware. This is simply because even today some devices like the latest RAM or SSDs are much more expensive than their commodity siblings.
  • Thirdly, there will always be downtime while adding any new hardware to an existing machine.

Vertical scaling is most commonly used in middle-range applications that are projected towards small or midsized companies and there is always a risk of downtime or server failure if we are using a single machine.

So now, let’s talk about what is horizontal scaling.

Horizontal scaling

Horizontal scaling simply means, adding more physical servers or processing units to an existing pool of resources while connecting them using all the principles of the distributed systems.

It involves growing the number of machines in a cluster with the purpose of increasing the processing power of the current system while distributing all the traffic load among them efficiently.

There are several benefits associated with horizontal scaling.

  • First and foremost, it distributes the role of each server in the system that in turn increases resilience and Robustness. Such that, now the application does not have a single point of failure and has more distributed backup nodes.
  • It increases the computing power along with the use of commodity hardware that can help reduce the software’s operational cost. (Not true in all cases though)
  • It provides a kind of dynamic scaling, adding or removing new machine images to an existing cluster can be done easily without any downtime.
  • Enables infinite scaling and can use an endless number of instances in order to handle the ever-growing load.

But, there are several impediments to this approach as well.

  • Though we can use commodity hardware while scaling horizontally, still, many times, setting up a new server machine costs way more than vertical scaling. It also brings along with it some extra infrastructure requirements like,
    - Adding a Load balancer (Software or Hardware) to distribute traffic between these distributed machines in the cluster.
    - Extra networking equipment such as router etc.
    - Licensing costs for running supporting software on different machines.
  • It complicates the architectural design and requires additional human resources in order to manage these infrastructure operations.

Numerous other complexities that befall along with distributed systems do apply with horizontal scaling and have already been discussed in Part 1 of this series. For more details, you may refer to this link — Part1 of this series.

Design decisions

Before going ahead with any of the above approaches for scaling our system, we must identify what we want to accomplish and what is our application’s ultimate goal.
Do we need to increase performance? If yes then what are the parameters based on which we measure performance? Is it our application’s throughput serving efficiency that we want to improve? Or do we want to reduce our application’s response time for any particular query? Do we need to make our system more available and Fault-tolerant?
Along with these, there are multiple other such questions that we need to ask ourselves while scaling our application. These terminologies look complex at first glance, and yes they are, but once we start focussing on the bigger picture, all the dots will eventually connect.

So, How do these scalability principles apply while scaling a database ?
Is this all the knowledge that we need to have or is there something else that we should know about ?

Let’s try to understand that.

Scaling databases in distributed database

By now we should be clear with all the scalability principles. So, let us try to relate these in order to scale any database.

In layman terms, scaling a database simply means distributing the load or data across multiple servers with the aim of storing and processing the database queries more efficiently, at the same time, making our system much more Robust and Reliable.

There can be two main reasons for scaling any database,

  • First, to make data more Available and Fault-Tolerant.
    For example, if we have a database server running on a single machine, we will encounter a single point of failure. In such a case, if that single-machine fails, we might lose all our data. So, to avoid that, we can add one more machine that will be a replica of the original machine. This will make our data more Robust and now, we can also evenly distribute query requests among two servers that are having the same instance of our database.
  • Second, To handle huge data (or even Big Data), that cannot be contained on a single machine due to the upper limit of disk storage capacity. As we know, a database size for an application can range from a few Megabytes to multiple Terrabytes and Petabytes. Storing all this data on a single machine is neither feasible nor possible. Hence we need to divide our data into multiple machines. Based on our application’s design, we may choose any algorithm for such data distribution, like
    Consistent hashing
    — Alphabet-based partitioning
    (like information of users whose name ranges from A-E will be stored on machine-1, F-J on machine-2, etc.)
    Schema-based partitioning (like user accounts schema can be stored on machine-1, products schema can be stored on machine-2, and so on)

This method of data distribution is known as Sharding or Partitioning.

I hope, till now we gained a fair amount of knowledge regarding scalability and its different types, but, What are the various principles that we should keep in mind while partitioning our database.

Let’s try to understand that with the help of a scenario !!!

Assume, we have an application that is using a single-server database for processing all client queries. We may call this as a master server. This design would work fine for a mid-scale application whose data size may not grow much and it won’t encounter many concurrent requests. But there are a few other problems.
Problem? There is a single point of failure. We may have a chance of losing all our application’s data. Moreover, if in due time, the user base increases, then there can be an increase in concurrent requests as well. Handling such a load would be a challenge for a single CPU.

So how can we handle this?

We can add another machine that will be a replica of the master server. We may call it a backup server. This backup server would just be a standalone machine that has no processing role as such, instead, it would just keep replicating master data such that, if at any point in time, the master server fails then we may redirect requests to this backup server and replace it with master in order to continue serving requests with very minimal downtime (i.e. the time taken to switch the server). This approach will make our system design more Robust, Fault-tolerant, and highly Available. But there is still some problem.
Problem? We are incurring additional server cost just for the backup purpose and are not even utilizing its processing power. Moreover, it still didn’t solve the problem of an increase in concurrent requests that may not be efficiently handled by a single server.

So how can we solve this?

We can connect this backup server to our cluster and can redirect all the read requests to this backup server and all the write requests to the master server. This way we will be able to distribute the load between both the servers while at the same time we may also utilize both of their cores by implementing parallel processing. This backup server can now be called as a Slave and this kind of database distribution is also known as Master-Slave replication.
But, are there still any problems with this approach?
Problems? We may notice that there can be some increase in latency here, since now, to make our data strongly consistent, we have to write the same data in two different servers to ensure that the same copy of data is present in both servers at all time. We can trade-off this high probability of strong consistency by having an asynchronous replication of data into the slave server. This may reduce our write latency but will make our data eventually consistent.

Let’s try to understand this with the help of an example.

Suppose there is a write request and data is written in master. Now, master is supposed to send a request to the slave to replicate the same data. But before it could even send that request, master went down. Now there can be two possibilities.
First, If it's a temporary failure, then we might have some Inconsistency in both databases for quite some time since the latest data won’t be able to replicate in the slave machine, and users who might be trying to read that specific data during this breakdown phase, won’t be able to find it. But, once the master machine is back up, the data replication will resume its state and everything will be fine.
The second possibility can be, if it is a permanent failure, we may lose the latest data because the master server will be permanently down and there will be no way to identify the level, up to which data was synchronized between both servers.

This way, though we have ensured data Availability and Partitioning, we are trading-off with data consistency. Ahhh…!!! Tradeoffs Tradeoffs and more Tradeoffs. So what we can infer from this.

In distributed systems, no matter what design approach we follow, there will always be a trade-off. But this very specific trade-off stated above is a well-known one in computer science and is popularly known as a CAP Theorem.

What is a CAP Theorem (aka Brewer’s Theorem)

CAP theorem states that a distributed database system can only guarantee two out of these three characteristics i.e. Consistency, Availability, and Partition Tolerance.


A system is said to be consistent if all nodes in a database cluster have the same data at the same time. In a consistent system, we should be able to read the most recently written value from any of the available nodes in that cluster at the same time.


A system is said to be available if it ensures to remain operational 100% of the time. Every request should get a successful response regardless of the individual server state. This means even if one of the existing servers is down there should always be some other server holding the same instance of the database and should be ready to process incoming requests without incurring any downtime. This does not guarantee that the response contains the most recent writes.

Partition Tolerance

A system is said to be partition tolerant if it does not fail regardless of if messages are dropped or delayed between nodes in a system. Partitioning the database has now become more of a necessity than a choice because of the massive storage requirement of applications nowadays. It is made possible by sufficiently replicating records across multiple nodes/servers.

I hope I was able to simplify many of the technicalities related to scalability in this post. But, before we end this, let’s try to understand in short about a term that we have been using quite often in this series, Load Balancer.

So, what is a Load Balancer?

Load Balancer is a software program or a hardware device that is responsible to distribute the load (service requests) among all existing servers/nodes/computers in a cluster with aim of making overall processing more efficient. The load distribution is completely based on functional requirements and can follow several algorithms like Round robin, Actual CPU load based, consistent hashing, DNS delegation, Client-side random balancing, etc. It may also take into consideration the request’s origin geolocation to serve it faster from the nearest geographically located data center. Along with load distribution, load balancers are also responsible for a couple of other tasks like,

  • Health check, it pings each node in the cluster and exchanges heartbeats, using which it registers the health state of all those nodes to ensure they are up and running.
  • Maintaining machine states, their current load, requests being served, and tracks if the machine is idle, etc.
  • Caching, it caches requests/responses to improve efficiency.

This is a very broad overview of a load balancer. There are a few more details related to it, that we will be discussing further in this blog series.

Thanks a lot !!!



Akash Agarwal

Right-brained techie, Passionate coder, Senior fullstack developer. Still makes silly mistakes daily. Incase u wanna connect-