Building a database in the 2020s

Last modification on 2022-12-05

It’s been a long time since I wrote anything, so I’ll share what I’ve been thinking about more recently. Just consider it a record of work.

Let’s start with an important question: If we were to redesign a new database today from the ground up, what would the architecture look like?

Before I get into the details, let me share my understanding of what developers today (and in the future) expect from a database.

Cost-effective, the ultimate Pay-as-you-go, whether it’s deployed in a private data center, or public clouds, the ability to use the infrastructure of elasticity is the key to saving costs. The software only charges for my real consumption.
Ease-of-use matters. The simpler it is to use, the better, preferably with all the infrastructure details hidden. A low mental burden brings a very low on-boarding experience. In the context of databases, I think SQL is still the easiest interface to build applications with, plus it is important for developers to have a Modern CLI experience nowadays.
Unified. This needs a little explanation. If the point above is from a DX(developer experience) angel, then this point is actually: the simplification of the architecture. We have invented so many data processing technologies over the years. Even to make a simple application, let’s say an online ordering system for a coffee shop, you may have to use different kinds of databases (RDBMS / Datawarehouse / Message Queue / … ), another headache is the scale, and you will probably need a completely different infrastructure stack, at a different scale, even though the product is little changed (from an end-user perspective).

A modern database architecture would likely aim to simplify and unify the underlying technology, making it easier for developers to build and maintain applications. This could involve using a single database technology that is versatile enough to handle a wide range of data types and workloads, as well as having the ability to scale seamlessly as the needs of the application change. Additionally, the architecture would aim to provide a consistent and intuitive developer experience, allowing developers to focus on building the features and functionality of their applications rather than worrying about the underlying technology.

For these trends and pain points above, mapped to the system design:

Cloud-native design is important, not only in the separation of storage and computing, but the separation of all that can be separated, because now what we want to do is no longer software, but service and users do not care about the things behind the service, as will be discussed below.
Think of Serverless as the final product form (Note: Serverless is not a technology, but a product form), take Serverless as the final products form, and think backwards to the evolving path.
For the new generation of databases, HTAP is the mandatory technology route (even AWS Aurora provides Zero ETL to RedShift). A prediction: the future of the database will provide HTAP capability

With these assumptions in mind, basically, for TiDB, over the past year, it has focused on re-architecting the database to run as a cloud service, leveraging the advantages of the cloud infrastructure to improve performance and scalability. The recent release of the TiDB Serverless Tier is an important milestone in this process, representing the first iteration of the new engine. From a technical and engineering perspective, this work likely involved significant redesign and restructuring of the underlying technology, as well as careful testing and optimization to ensure that the database can deliver the performance and reliability expected of a modern database.

Here are some takeaways from this journey:

Be a service, not a software

When you do a new database today, it’s obvious you should offer a cloud service around it (soon, it will be Serverless). There are many people (especially database kernel developers) who underestimate the complexity of making a cloud service, some classic arguments: ‘Isn’t it just an automated deployment on the cloud?’ Or ‘Basically, it’s a Kubernetes Operator?’… Actually, it’s not. Building a database as a cloud service involves more than just automating deployment on the cloud. Creating a cloud service involves a deeper understanding of the needs and requirements of the users and the applications that will be using the database, as well as the ability to design and implement a robust and scalable architecture that can support a wide range of workloads. This cognitive shift towards thinking of the database as a service rather than just software is essential in order to deliver a user-friendly experience that meets the needs of modern applications on the cloud. Additionally, building a cloud service also requires expertise in areas such as cloud infrastructure (basically, the cloud services like S3/EBS, as the building blocks), automation, and DevOps, which can be challenging for traditional database developers who may not have experience in these areas.

The software has been developed with the assumption of a relatively predictable and deterministic environment. This is particularly true for single-machine software, where all the resources of the computer are available to the program, and the execution environment is well-defined. Even with the rise of distributed systems, many software designs still follow this model, using remote procedure calls (RPCs) to connect multiple computers together, but still assuming a relatively predictable and controlled environment. However, the emergence of cloud computing has introduced new challenges and complexities, particularly when it comes to managing and scaling resources:

Diverse and almost unlimited resources are provided in the form of Service APIs, and the scheduling and allocation of resources can be done by code, which is a revolutionary change.
All resources are explicitly priced, so the direction of program optimization has changed from the past one-dimensional extraction of the best performance (because the cost of hardware has been paid in advance) to a dynamic problem: trying to spend a small amount of money to do a big job.

These changes in assumptions bring about technical changes: I think a database on the cloud should, first of all, be a network of multiple autonomous microservices. The microservices here are not necessarily on different machines. They may be physically on one machine, but they need to be accessible remotely, and they should be stateless (no side effects) to facilitate rapid elastic scaling. Let’s look at some examples:

First, the storage and compute separation has been talked about so much in recent years🙂 . On the cloud, computing is much more expensive than storage, and if computing and storage are tied, there is no way to take advantage of the price of storage, plus for some specific requests, the demand for computing is likely to be completely unequal to the physical resources of the storage nodes (think heavy OLAP requests for Reshuffle and distributed aggregation). In addition, for distributed databases, the scaling speed is an important user experience indicator. When the storage and computation are separated, in principle, the scaling speed can be extremely fast because the scaling becomes: 1. starting new compute nodes 2. cache warm-up, and vice versa for scaling-down.

Second, for databases, some internal components will be separated as services, e.g., DDL becomes DDL-as-a-Service. traditional database DDL sometimes has an impact on online performance, e.g., when adding indexes, there is an inevitable need for data backfill, which causes a jitter for the OLTP load storage nodes being served. If we think about the DDL, we see that it is a global, episodic, recomputable, offline, reloadable module, and if there is a shared storage tier (e.g. S3), this type of module is ideal for stripping out into a stateless service that shares data with the OLTP storage engine via S3. The benefits are undeniable.

Virtually no performance impacts on the online workload as well
Cost reduction due to on-demand architecture.

There are many similar examples: logging (low CPU usage but high storage requirements), LSM-Tree storage engine’s compaction process, data compression, meta information, connection pooling, CDC, etc., are all targets that can and are well suited to be separated of. In the new Cloud-native version of TiDB, we use Spot Instances for Remote Compaction of the storage engine, and the cost reduction is amazing.

Another important issue to think about when designing a cloud database is QoS (Quality of service), which in detail is probably

WCU(Write capacity unit) and RCU(Read capacity unit) need to be defined as units of control because if you don’t have a way to define this, you will have no way to allocate and schedule resources or even pricing.
Multi-tenancy is a must. The tenants must be able to share hardware and even cluster resources. Large tenants can also be exclusive resources (the single-tenant model is a specialization of multi-tenancy), bringing the problem: how to avoid the Noisy Neiberhood problem? How to design Throttling policies? How to avoid being overwhelmed of shared metadata services, and how to deal with extreme hotspots? These questions can be discussed in a separate series. There are many more challenges, so I won’t list them all. The good news is many of these lessons are described in detail in the new paper on DynamoDB from AWS this year, so just refer to that paper.

What cloud services can be relied on

Another important topic is: Which services on the cloud can you rely on? This is because for a third-party database vendor, the cross-cloud experience is your natural advantage, and relying too deeply and tightly on a specific cloud service will cost you that flexibility. So you need to be very careful when choosing dependencies, and here are some principles.

Rely on interfaces and protocols, not specific implementations. You can do whatever you want within the service, but the interfaces exposed to other services should be generic and not make too many assumptions simply to minimize the burden on the mind of the invoked (old wisdom from the UNIX era). A good example is: VPC Peering and PrivateLink. If this principle is followed, PrivateLink is definitely chosen because VPC Peering tends to expose more details to the users.
If there is an industry-standard, follow the industry standard (S3, POSIX file system), there is object storage on every cloud, and presumably, every cloud’s object storage API will be compatible with the S3 protocol, which is good.
The only exception is security. If there is no way to do a good cross-cloud abstraction on security, don’t build the wheel. What the cloud has to use, such as key management, IAM or something, don’t invent it yourself.

Here are a few examples. For Cloud-Native TiDB, the following choices are made when selecting dependencies:

Storage: S3 is the key. As mentioned above, every cloud will have an object storage service for the S3 protocol. The benefit of using tiered storage like LSM-Tree in the database is the ability to leverage different levels of storage media through a set of APIs, e.g. local disks for hot data on the upper level and S3 for data on the lower level, with Compaction process to exchange the upper-level data to S3. This is the basis of TiDB storage and computation separation, and only after the data is in S3, optimizations such as Remote Compaction can be unlocked. But the problem is that S3’s high latency is destined to prevent it from appearing on the main read/write routine (upper-level cache failures can introduce extremely high long-tail latency), but for which I am more optimistic:

If we consider the 100% local cache scenario, it degenerates to the classic Shared-Nothing design, which I think is fine for supporting the most extreme OLTP scenarios (refer to TiKV now), and the extra cost is only the storage cost on S3, which is very low.
The caching and hotspot problems are well solved if the sharding is fine enough. (with a small/flexible sharding granularity)
Tiered storage can also include EBS (distributed block storage) as a secondary cache to further shave peaks (weakening latency glitches from local cache failures)

I mentioned in a talk in 2020 about building a cloud-native database. There’s a point: how well S3 could be leveraged would be key. I think this point is still valid today.

Compute: Containers and Kubernetes, like S3. Each cloud has managed K8s services, just like Linux, K8s is the operating system of the cloud. Although the separation of storage and computing is done, computing is relatively better managed (compared to the old day), but sometimes it still painful, e.g. the management of resource pools, such as Serverless clusters, need fast launch (or hibernation wakeup), starting from 0 to create new node must be too slow, need to have some reserved resources, and for example, using Spot Instance to handle Compaction tasks. If a Spot Instance is recycled, can you quickly find another Spot Instance to continue working, and the same story for load balancing and micro-services mesh … Although S3 helps you solve the most difficult state problem, the scheduling of these pure compute nodes is also painful? If you choose to build your own wheels, then most likely, you will end up reinventing a K8s. So, why not just embrace it.

On the cloud, there is also a big design question: Is the File System a good abstraction? This question comes from the infrastructure that shields the cloud under which layer of abstraction. Before S3 became popular, various large distributed system storage systems, especially Google’s: BigTable, Spanner, etc., all chose a distributed file system as their foundation (I think there is a deep trace of Plan9 here, after all, many of these Infra Guru inside Google are from Bell Labs😄). So the question is if we have S3, do we still need a layer of file system abstraction? I have not yet thought clearly, but I tend to have. The reason is still the Caching. If there is a layer of file system, in the file system layer can be based on the file access heat to do a layer of cache to enhance the expansion of the warm-up speed; another benefit is based on the file system. Many UNIX tools can be directly reused, and the complexity of operations and maintenance can be reduced.

End-user experience is the direction of optimization

One of the things I mentioned in my Keynote at TiDB DevCon this year was: how do databases on the cloud fit into the modern developer experience? This is a very interesting topic because the database has been around for so many years. It’s still almost the same, SQL is still the king, but on the other hand, the applications and tools developers are using now are very different from decades ago, as an old programmer from the UNIX era, seeing the dazzling advanced development tools and concepts used by the younger generation of developers now, I can only lament that one generation is better than the next. Although SQL is still the standard for operating data, can database software do more to integrate into these modern application development experiences? I believe some database vendors are doing a great job on that, like SupaBase. In the context of TiDB Cloud, we also have done some work on this:

Serverless, as many people think of it, is a technical term, but I think it’s not. Serverless is more about defining what a better product on the cloud from a user experience perspective is. Or maybe that’s the way it should be: why should user care about how many nodes you have? Why do I need to care about your database’s internal configuration? Why do I have to wait for another half hour after I click launch? …and so on. These things in our industry in the past seem to take things for granted. In fact, when you think about it, all feel quite ridiculous. For example: suppose you go to buy a car, the car seller first gives you an engine repair guide, tells you to read it before you can go on the road, the car does not run fast, and then tells you an engine tuning configuration need you to adjust. Every time you start the car, you have to wait, isn’t that strange? For Serverless products, the biggest implications in terms of user experience are three things.

The configuration is hidden, reducing the user’s mental burden
Extremely fast startup time, which extends the usage scenario and ease of use
Scale-to-Zero, which reduces the cost of use in most scenarios (when there are significant peaks and valleys that you can’t predict), and can even be free on a small scale.

With these three points, the database can be well embedded into other application development frameworks, which is the basis for building a larger ecosystem.

In addition to Serverless, the modern developer experience (DX) includes many other key elements, such as:

Modern CLI: The CLI is much more efficient for developers than the GUI, and easier to automate with a combination of other tools via shell scripts.
Unified development experience. This can be difficult for developers who are used to working on a single machine and may not have experience with distributed systems and cloud infrastructure. To address this challenge, vendors should provide tools and services that aim to provide a unified development experience, allowing developers to develop and test their software on their local machine, and then deploy and run it on the cloud with minimal changes, like no one wants to touch the server every day, don’t let people SSH what can be done locally, especially for cloud services. How to develop and debug for cloud applications is currently a market with many pain points.
Example/Demo/Scaffolding code: a new generation of biased PLG service providers, such as Vercel, Supabase, are really good at it. It is very reasonable. For ordinary CRUD applications, the code framework looks similar, to provide some quick start examples, to allow developers to experience the value of your product faster, but also help developers to build their applications faster.

Challenges of the future

I mentioned above is basically no man’s land, and it’s hard to foresee all the challenges in advance. This paragraph concludes with a list of some interesting, though certainly incomplete, challenges that I hope will inspire you:

When the data on the storage engines of different tenants are in S3, you could theoretically unlock a larger market based on data sharing and exchange (imagine Google Docs), or on S3 + MVCC you could theoretically achieve a Git-like version control of the data, imagine the smooth experience just like git checkout <branch>, only the difference is that you switch to your database snapshot (I know there are already database products like this on the cloud), which will bring a lot of new application scenarios and unique value.
New business model. There’s a potential to be a new business model for database providers, especially in partnership with cloud vendors who can offer the necessary infrastructure and support. By providing a DBaaS as a product, database providers can offer customers the convenience and flexibility of using a cloud-based database without the need for them to manage the infrastructure themselves on their own cloud.
People, for a database vendor, in the past, the demand for R&D and product was almost limited to database kernel development, but as a database cloud vendor, you are not only focusing on the engine but also the operations and the development of cloud services for R&D personnel and the requirements of the technical stack of the database kernel are completely different, which is bound to involve huge organizational changes, how to transition well? That’s a problem.

Well, there will always be problems and challenges. But I would say the process of making this system is also the process of understanding the system, just like Richard Feynman said

“What I cannot create, I do not understand。”