In my previous article I gave a short summary of the meaning of Database Scalability. In this article I’ll describe the most common architectures used to scale to Big Data sizes.
Over the past 30 years there have been three primary architectures to deploy a database including:-
Each option has different features and constraints, and (depending upon the database solution), different scalability options.
The diagram below illustrates the key hardware components of the simplest solution – Shared Memory.
In this solution, users connect to a single database server typically with locally attached disks. This works well for many databases, and is the traditional (and sensible) option for most applications. Scalability options are however limited to Scale Up, by adding CPUs, memory or disks.
The main benefits and drawbacks of a Shared Memory solution are:-
For most requirements, this may well be the simplest and cheapest solution. Increasingly, cloud deployment should be considered, even for small to medium size systems as this can help control capital costs, and provide a managed and relatively painless upgrade path.
The diagram below illustrates the Shared Disk Architecture in which users are transparently routed (perhaps using load balancing), to a database server in a cluster of machines. As the name implies, the disk system is shared by all nodes in the cluster (and is therefore a potential bottleneck), and is typically provided by a separate NAS or SAN disk system.
Under the hood, this is a relatively complex solution, as nodes are closely networked, to act as a single machine. This can create performance and scalability challenges when for example a data item is simultaneously changed on two different nodes, and the data is repeatedly “pinged” between the two. This means this deployment can require some application changes to sensibly partition the application on the servers. At it’s extreme, it’s possible (for example), to deploy a data warehouse using one node for ETL processing, and the other dedicated to servicing user read queries.
The main benefits and drawbacks of a Shared Disk solution are:-
As an on-premises upgrade path from a single database server, this provides a useful roadmap for growth. It’s sensible however to limit cluster size to around three for most solutions, before considering re-platforming to a larger machine.
The diagram below illustrates the Shared Nothing Architecture whereby data is spread (sharded) across a number of independently running machines that act as a single cluster. Unlike the Shared Disk solution, each node is responsible for only a part of the data. In effect, nothing is shared.
Unlike the Shared Disk solution which relies upon hardware for high availability, this solution can automatically replicate data to other nodes providing a significant element of built in high availability. These principles are applied to a range of solutions including Hadoop HDFS, NoSQL databases like Cassandra, and MPP column data stores including Vertica and Greenplum.
Unlike the Shared Disk solution which can be deployed without thought, an MPP database potentially needs careful attention to data sharding and co-location to maximise scalability and performance. If for example most queries are within a geographic region, it makes sense to shard data across the cluster by region instead of business area. Once deployed, applying a different sharding and data partitioning strategy is not an insignificant operation on a multi-terabyte system.
The main benefits and drawbacks of a Shared Nothing database are:-
As I’ve described before, NoSQL databases can provide nearly unlimited linear scalability for high velocity workloads on a Shared Nothing architecture.
Cassandra in particular has been demonstrated to scale to nearly 300 nodes, processing over 200,000 writes per second during a Netflix benchmark test.
Of course there are significant downsides.
NoSQL databases address a very narrow use-case. In many cases, single key lookup operations where the data can be guaranteed to be retrieved from a single node. If a Cassandra or Riak database is deployed to 50 nodes, and workload doubles, simply adding 50 nodes will solve the problem – providing the same response time, but to double the user population.
This principle breaks however if we use secondary indexes that run across the sharding key. Access via a secondary index on MongoDB, Cassandra or Riak, and you’ll potentially execute the query against every node in the cluster. Run several of these concurrently, and you’ll quickly hit the scalability limits of your your entire system.
In summary, when queries run across multiple shared nothing nodes in the cluster, scalability will drop.
If you’ve ever seen the above page while using LinkedIn, you’ve experienced the limits of Eventual Consistency used by NoSQL databases. In short, you’re not guaranteed to find the data (or the latest version of it) when you need it.
Relational databases work within ACID transactions, and once you press save, the server only responds when the change durable, and is guaranteed to be consistently available to everyone. NoSQL databases on the other hand tend to compromise on consistency in favour of speed and availability.
When data is written, the process can return control immediately, when it’s written to a majority of replica copies, or (slowest of all), when it’s written to all copies. This introduces application complexity as you need to handle these unexpected (sometimes temporary) data quality issues. You also need to accept you cannot guarantee 100% data accuracy in the event of hardware or systems failure.
Of course, the bigger the cluster, the more hardware failures. For example, given a hard disk mean time between failure (MTBF) of 10-50 years, on a cluster with 10,000 disks you should expect an average of one failure per day.
NewSQL databases attempt to provide the high availability, massive scalability and millisecond latency of NoSQL databases, but with the flexibility of a relational database, and ACID transactional. For strict OLTP (short lived, fast transactions) with the potential for massive scale, these provide a compelling solution with potential throughput reaching millions of transactions per second.
Some like NuoDB, CokroachDB and Google Spanner are designed for elastic cloud based solutions, and separate data processing from storage nodes. This gives potentially signficant benefits of being able to scale them independently.
Finally VoltDB from the Michael Stonebraker stable (Ingres, Postgres and Vertica), is a horizontally scalable, fully ACID compliant database built for streaming applications. Designed to capture, process and analyse sensor data in near real time (milliseconds), it provides strong transactional guarantees, and even survived the remarkably challenging Jepsen test of distributed system safety in the face of hardware failure.
As little as ten years ago the Shared Memory/Disk/Nothing architectures were the only serious options available from the big three vendors – Oracle, Microsoft and IBM.
The massive explosion of data volumes handled by Google, Amazon and LinkedIn lead to the development of NoSQL databases to provide fast Key/Value lookup operations to complement the entirely batch oriented serial processing available in Hadoop.
The subsequent explosion of open source development means there’s nearly 300 databases available, including Key-Value, Document, Graph and text search engines, with specialised solutions for every use case.
It’s unlikely every database will survive, but it’s good to see the 30 year relational model completely reinvented with the NewSQL databases while being largely transparent to the developer. The relational database is not dead yet.
Thanks for reading this far - there was a lot to say. If you found this interesting, share it with colleagues or press the like button below. You may also be interested in the next article in the sequence.
After 30 years as a freelance Data Warehouse Architect, I'm turning my attention to Big Data including Hadoop, NoSQL and NewSQL.