Why I don’t use NoSQL databases for new projects?
July 29th, 2022
I don't use NoSQL databases because of one thing: Practicality.
I've used most of the major NoSQL DBs professionally for many years; I still use them on some of my freelance projects. I like using Dynamodb and love single table design. I like the simplicity of Firestore. But while they have great features, in practice, I haven't found a valid reason to use them over an SQL instance like MySQL (or Postgres) when starting a new project.
A Mysql DB always ends up being the better choice when building a startup or a small project because it checks all the boxes one would need at a young product stage. This is ironic because NoSQL is generally considered a better choice for early small projects, being considered more "flexible" and more effortless. There’s this perception that SQL databases are slow, hard to learn, difficult to scale, which I disagree with. I think SQL databases are still the king when it comes to "practical" use cases and building new applications where querying patterns are not established yet.
Today, we mainly compare Firestore and Dynamodb with Mysql in the context of a small to mid-scale project building modern web 2.0 apps. This opinion is not for large established apps having scalability issues (in that case, I think sprinkling some NoSQL will help). But the arguments here will apply to other NoSQL databases too.
It's hard to satisfy all access patterns in NoSQL databases when you look at a project holistically.
Most of the tutorials/conferences/books tell you how easy it is to model your app with NoSQL, but I think they just don't show an accurate picture about building proper web apps. They often just look at one side of the app and try to model it in NoSQL, but modeling your database when only looking at half of your app is a dangerous thing. It's misleading!
For instance, they'll show you how easy it is to build an e-commerce application on NoSQL dbs like Firestore or Dynamodb. Users can view their profile and list out their orders etc., and it's so easy to structure your data in a NoSQL way so you can satisfy all access/query patterns for your customers.
But ask yourself, is that the only "view" you have of your data in an e-commerce app? Are your "customers" the only stakeholders? Most often, NO. Logistics teams, inventory mgmt, supplier, warehousing, and general admin teams also have to perform many operations on the data and need a dedicated view into that database. In addition, there are cron jobs and batch scripts that frequently need to look at different segments of data to perform all kinds of aggregations.
Even if you don't have that many stakeholders, you almost always have at least two. Your end-users and your admin panel. Because without an admin view into your project, you have no idea how it's running. Basic operations like banning/unbanning users, content moderation, changing some users' data, viewing your KPIs become very hard and often ignored.
It might be okay for a hobby project to directly go to the Firestore console and change data by hand, but it gets too painful and dangerous to do so in serious projects.
While it's straightforward (and performant) to build a NoSQL db schema that can show a customer their profile and past orders, it gets progressively harder to fit in all stakeholders' access patterns. For example, admin panels are generally higher level; they tend to query big picture stuff, aggregated metrics, bulk operations, and look at various disjointed datasets in one view. NoSQL databases have trouble satisfying these kinds of queries. I'm not saying you can't do it; it's just more complex. What is a simple SELECT ... JOIN in MySQL becomes some db gymnastics done by the dev.
If you are in Firestore land, you will have to run weird batch scripts to aggregate or denormalize data across different collections or use triggers to do the same. Dynamodb is not any better either. While it's incredibly fast and single table design is excellent, trying to fit these access patterns means creating extra GSIs and doing weird modeling to fit your data in the columnar format, or running dynamodb streams to collect/parse datasets. And it still doesn't mean that you will be able to do any random data query. Every time you have a new access use case, you will have to do some trickery to achieve it.
As of MongoDB, while on the surface it might seem like it supports are a lot more querying features than the rest, I think it has its problems. Hosting/Maintenance is complicated. Aggregation breaks down very quickly on larger datasets, and tbh if I have to go that route, using an SQL db is much better imo. That's why I'm not gonna talk about MongoDB in this post.
Don't get me wrong, I'm not saying that NoSQL databases suck. They are great technologies. But when you start looking designing a project from a holistic perspective where you consider ALL your stakeholders (end-users, admin team, etc.), you will realize that you need many more ways to query your data than these databases allow.
The querying features are pretty limited, and soon, you will start to hit those limitations and be forced to restructure your app to match the database. I think this is totally backward; a database should be the one that should fit your app's needs, not the other way around.
NoSQL databases are more "performant," but at what cost?
In the short run, NoSQL dbs are generally more cost-effective (thanks to many free tiers) and performant, but in practice, what startups and new projects need is not performance or scalability. Instead, they need the ability to move fast, stay flexible, and access the data in any shape or form they want, on-demand.
If you go with NoSQL, you are basically sacrificing querying flexibility for performance. Saving dev time is what you should be saving, not extra milliseconds on your API requests. The amount of dev time you will save by not doing db gymnastics every time you have a new access pattern is worth far more.
NoSQL advocates keep saying how fast NoSQL dbs are but try benchmarking SQL databases like MySQL or Postgres once and see for yourself. The point at which you need to worry about performance comes much much later in your project's life.
Also, just for reference, the smallest MySQL db instance t4g.micro can handle millions of rows of data and have around 40-50 concurrent db connections, which is more than enough if you design your app decently well.
NoSQL databases are schemaless, so they are more "flexible."
Another argument I see is, "oh, it's so easy to just be able to put any data payload in my db, I don't have to think about the schema. This makes it so much more flexible than doing db migrations or designing tables in MySQL."
I strongly disagree. There is no such thing as "schemaless" data. You ALWAYS have a schema either explicitly or implicitly. If your database is schemaless, your app's code has to do the schema-related stuff.
While yes, you can technically put anything you like into a db document and change the format whenever you want. But once you go live in production, you can't really do this in practice. Even if NoSQL databases are schemaless, once you have live data, you will still need to do some sort of migration when you change the data structure of your app entities. Your code will have to be more defensive and constantly check if some property exists, then do that else do this. You will have to put some "version" fields on all your objects and then perform different logic based on them to support legacy documents which couldn’t be upgraded. Either this or you will have to run some migration scripts to change old data into the new format. And even after doing all this, there will still be app crashes sometimes because the code was expecting some property but got undefined, etc.
Also, since your database never complains about putting random format, it's common to accidentally change the data. Maybe some upstream function didn't pass all the fields to the db, and now the db has incomplete data. Downstream functions start crashing because they were expecting fields that weren't there. Happens all the time.
Having a strict schema in SQL dbs makes it more flexible and worry-free. While it may sound scary to learn SQL, it's a one-time pain and a lifetime of joy. There are great tools to handle db schemas and migrations. They make it easy to write schemas and can perform ACID-compliant schema changes with no downtime to your app.
I use knexjs in all my projects and set up automated db migrations running on CI pipelines. I don't have to think about this "schema change" operation. It just happens automatically via CI while still guaranteeing data integrity during migration. As a result, my code can be less defensive because it can trust the database.
Does that mean I don't worry about data structure? Of course not. I still have null checks in critical parts of my apps. I still have to plan the data restructuring, and there are ways to do it well and ways to do it poorly; it's just a normal part of building applications.
Local development with Firestore/Dynamodb is painful
Yes, there are local emulators for these, but they're painful and full of limitations. Running MySQL or Postgres instance is trivial, and I have a docker-compose script that boots up a local instance during dev.
On top of that, countless SQL GUI clients are available for desktop. It's nice to be able to connect to a local db (or even production) and view data easily. NoSQL emulators don't even come close.
Giving read-only access to contractors/third party is trivial
This might not apply to everyone, but it's handy if it does.
Often you want to have a read-only view of production data, maybe for debugging some issues or running some analysis. Creating read-only users in SQL is trivial. I always create read-only credentials for all my projects and use those to connect to db quickly when I need to. Often I share these read-only keys with contractors/freelancers working with me.
Try doing this in Firestore/Ddb, and let me know when you succeed.
But SQL databases are hard to manage/maintain, you say
Are you sure? I deploy all my infrastructure using AWS CDK. It takes less than 20 lines of code to deploy an RDS database instance. Yes, it might take some effort to do the initial setup, but it's a one-time thing. I learned it once and never had to think or write it again. I simply copy-paste the scripts into new projects, and off I go. I'd much rather put some effort initially and then have a joyful coding and querying forever than to have a quick db setup and then a lifetime of pain, limitations, and suffering when querying data.
But RDS databases are expensive, you say
Yes, I agree. The cheapest MySQL instance on RDS would cost you around $25/month. But if you run a proper production-grade db, it'll cost you at least $100/month. Compared to that, Firestore and Dynamodb are practically free. So this is one area where I agree; if cost is a significant factor for you, you should use NoSQL and save cash. But you will be paying in extra dev time.
I still prefer to pay up for RDS mainly because it's a fixed cost that gets marginally reduced every time I create a new project because I share a single instance across all my projects. Remember: one db instance can have many MySQL databases.
If I'm running 1 project for $100/month, that's expensive, but if I run 10 projects, that's just $10/month for database per project. So this actually incentivizes me to go out and create more projects. Also, paying up once frees me from worrying about the db cost of every project or having to go out window shopping for a new database every time I have a new project idea.
And btw I don't actually pay the full price for RDS thanks to reserved instance discounts. Which can give you a lot of discounts reducing the cost further.
Having the same db workflow on every project is liberating.
When starting a new project, I run a script that quickly creates two dbs (one prod, one staging) in my shared rds instance. Plus, it creates prod and read-only MySQL user credentials and gets me up and running super fast. I have a template project that sets up ci/cd, db migrations, and schema files which I can just copy past and just start working on the new app.
This allows me to quickly focus on solving my app problem instead of worrying about "modeling" the database.
Being able to query ANYTHING in any shape or form is incredible.
You might think you don’t need this kind of querying ability right now but I promise you’ll wish it down the road if/when you start to add more features to your project.
With NoSQL databases that’s a one-way street, you can’t go backwards and gain more queryability in exchange for slow speed. Firestore/Ddb by design do not allow you to write slow queries. With SQL dbs you can, you can start with total unoptimized queries that can’t scale but are very flexible and keep churning out new queries for whatever feature your project needs. Then later when things start to chug, you can move some data around (denormalize) or create indexes to optimize the slow bits. It’s a two way street you can go whichever direction you like whenever you want. You are in control of how optimal/unoptimal you want your DB to be.
Being able to query absolutely anything in any shape of form is an incredible power. Yes some queries will run slow but as long as the frequency of those slow queries is small it doesn’t matter (Like batch jobs). But this power allows you to quickly satisfy business use cases instead of performing db magic everytime you need a new query.
So when do I use a NoSQL db?
I still use NoSQL databases in a limited capacity, mostly because the contract project I'm working on already uses one or when I need to optimize some parts of my queries. I sync the data from SQL to Dynamodb and the change my APIs to query that instead, giving me good speed boost.
But generally, I start with a MySQL db and then sprinkle some NoSQL on top for the slowest parts of my app. Apart from that, a MySQL db with a Redis cache is good enough to optimize most of the slow queries.