In my previous post on Relational and Non-relational databases, we have seen some fundamental differences between these database tools. In this post, let’s check their Scalability.
It is an ability of a system that can easily accommodate the rapid incoming data without performance issues. This is the main factor for any system to provide good scalability. There are two types of scaling methods — Vertical and Horizontal scaling.
All the Relational database tools support vertical scaling. This is the method of increasing the power of the system by adding additional CPU, memory and disk spaces. So, to allow rapid incoming data, the single production server is optimized to scale up. In this scaling technique, there is always a single production server which can be connected by all the applications and users. A cluster environment can be created with some nodes and replicate the data across nodes. Because of ACID properties, all nodes should have the same set of data and data synchronization becomes complicated if there are several nodes in the cluster. This is much optimised for Read scaling. Vertical scaling is also known as scale-up.
The benefit of this scaling methodology is the tight integration of data and its consistency across the nodes in a cluster. All nodes will have the same set of data and if there is a problem with the production server, another node will automatically be connected by the applications. So this cluster is known as Fail-over cluster.
All the Non-relational database tools support horizontal scaling. This is the method of adding more computers to the network to allow rapid incoming data. It is easy to add more nodes into the cluster to allow data growth. Data are split automatically and processed across nodes in a cluster. This is a distributed data environment. Hadoop Distributed File System (HDFS) is a classic example for this. Horizontal scaling is also known as Scale-out.
The benefit of this scaling technique is that since data are split and replicated across nodes, if any of the nodes goes offline, the application can still have the data from other nodes and this guarantees the availability of data at all the time. This method is very useful for the cases where no JOINs are required among the data of the nodes. This is also helpful in separating data and having them in different geographical locations.
While both these scaling techniques have advantages and disadvantages, a good environment can mix both of these to have outstanding Scale-up and Scale-out. We can have a scale-up read and write database in a single server which requires ACID properties and have a scale-out distributed historical data across several nodes for data mining purpose.
In the next post, I will write about the proper usage of these tools and some of largest customers using these tools.