Why I don't use NoSQL databases for new projects.

2022, Jul 29 (10 days ago)

    I don’t use NoSQL databases because of one thing: Practicality.

    I’ve used several major NoSQL DBs professionally for many years; I still use them on some of my freelance projects. I like using Dynamodb and love single table design. I like the simplicity of Firestore. But while they have great features, in practice, I haven’t found a valid reason to use them over an SQL instance like MySQL (or Postgres) when starting a new project.

    A Mysql DB always ends up being the better choice when building a startup or a small project because it checks all the boxes one would need at a young product stage. This is ironic because NoSQL is generally considered a better choice for early small projects, being considered more “flexible” and more effortless. There’s this perception that SQL databases are slow, hard to learn, difficult to scale, which I disagree with. SQL databases are still the king for “practical” use cases and building new applications where querying patterns are not established.

    Today, we mainly compare Firestore and Dynamodb with Mysql in the context of a small to mid-scale project building modern web 2.0 apps. This opinion is not for large established apps having scalability issues (in that case, I think sprinkling some NoSQL will help). But the arguments here will apply to other NoSQL databases too.


    It’s hard to satisfy all access patterns in NoSQL databases when you look at a project holistically.

    Most of the tutorials/conferences/books tell you how easy it is to model your app with NoSQL, but I think they don’t accurately picture building proper web apps. They often look at just one side of the app and try to model it in NoSQL, but modeling your database when only looking at half of your app is a dangerous thing. It’s misleading! For instance, they’ll show you how easy it is to build an e-commerce application on NoSQL dbs like Firestore or Dynamodb. Users can view their profile and list out their orders etc., and it’s so easy to structure your data in a NoSQL way to satisfy all access/query patterns for your customers. But ask yourself, is that the only “view” you have of your data in an e-commerce app? Are your “customers” the only stakeholders? Most often, NO. Logistics teams, inventory mgmt, supplier, warehousing, and general admin teams also have to perform many operations on the data and need a dedicated view into that database. In addition, there are cron jobs and batch scripts that frequently need to look at different segments of data to perform all kinds of aggregations. Even if you don’t have that many stakeholders, you almost always have at least two. Your end-users and your admin panel. Because without an admin view into your project, you have no idea how it’s running. Basic operations like banning/unbanning users, content moderation, changing some users’ data, viewing your KPIs become very hard and often ignored. It might be okay for a hobby project to directly go to the Firestore console and change data by hand, but it gets too painful and dangerous to do so in serious projects. While it’s straightforward (and performant) to build a NoSQL db schema that can show a customer their profile and past orders, it gets progressively harder to fit in all stakeholders’ access patterns. For example, admin panels are generally higher level; they tend to query big picture stuff, aggregated metrics, bulk operations, and look at various disjointed datasets in one view. NoSQL databases have trouble satisfying these kinds of queries. I’m not saying you can’t do it; it’s just more complex. What is a simple SELECT … JOIN in MySQL becomes some db gymnastics done by the dev. If you are in Firestore land, you will have to run weird batch scripts to aggregate or denormalize data across different collections or use triggers to do the same. Dynamodb is not any better either. While it’s incredibly fast and single table design is excellent, trying to fit these access patterns means creating extra GSIs and doing weird modeling to fit your data in the columnar format, or running dynamodb streams to collect/parse datasets. And it still doesn’t mean that you will be able to do any random data query. Every time you have a new access use case, you will have to do some trickery to achieve it. As for MongoDB, while on the surface, it might seem like it supports are a lot more querying features than the rest, I think it has its problems. Hosting/Maintenance is complicated. Aggregation breaks down very quickly on larger datasets, and tbh if I have to go that route, using an SQL db is much better imo. That’s why I’m not going to talk about MongoDB in this post. Don’t get me wrong, I’m not saying that NoSQL databases suck. They are great technologies. But when you start designing a project from a holistic perspective where you consider ALL your stakeholders (end-users, admin team, etc.), you will realize that you need many more ways to query your data than these databases allow. The querying features in Firestore/Ddb are pretty limited, and soon, you will start to hit those limitations and be forced to restructure your app to match the database. You will try to ask for a workaround on Stackoverflow, and you will get a response like “You are not supposed to do that” or “That’s not how it works, that’s by design.” I think this is totally backward. A database should be the one that should fit your app’s needs, not the other way around.

    NoSQL databases are more “performant,” but at what cost?

    NoSQL dbs generally tend to be more cost-effective (thanks to many free tiers) and performant, but in practice, what startups and new projects need is not performance or scalability. Instead, they need the ability to move fast, stay flexible, and access the data in any shape or form they want, on-demand. Flexibility as in how quickly you can transform data not in how easy it is to create a new Firebase account. If you go with NoSQL, you are sacrificing querying flexibility for performance. Saving dev time is what you should be saving, not extra milliseconds on your API requests. The amount of dev time you will save by not doing db gymnastics every time you have a new access pattern is worth far more. NoSQL advocates keep saying how fast NoSQL dbs are but try benchmarking SQL databases like MySQL or Postgres once and see for yourself. They are incredibly fast in every metric worth measuring. The point at which you need to worry about performance comes much much later in your project’s life. Also, just for reference, the smallest MySQL db instance t4g.micro can handle millions of rows of data and have around 40–50 concurrent db connections, which is decent enough if you design your app decently well and considering that it’s a tiny machine.

    NoSQL databases are schemaless, so they are more “flexible.”

    Another argument I see is, “oh, it’s so easy to be able to put any data payload in my db, I don’t have to think about the schema. This makes it so much more flexible than doing db migrations or designing tables in MySQL.” I strongly disagree. There is no such thing as “schemaless” data. You ALWAYS have a schema either explicitly or implicitly. If your database is schemaless, your app’s code has to do the schema-related stuff. While yes, you can technically put anything you like into a db document and change the format whenever you want. But once you go live in production, you can’t do this in practice. Even if NoSQL databases are schemaless, once you have live data, you will still need to do some sort of migration when you change the data structure of your app entities. Your code will have to be more defensive and constantly check if some property exists, then do that else do this. You will have to put some “version” fields on all your objects and then perform different logic based on them to support legacy documents that couldn’t be upgraded. Either this or you will have to run some migration scripts to change old data into the new format. And even after doing all this, there will still be app crashes sometimes because the code was expecting some property but got undefined, etc. Also, since your database never complains about putting random format, it’s common to change/delete the data accidentally. Maybe some upstream function didn’t pass all the fields to the db, and now the db has incomplete data, or some referenced document was deleted without updating the parent. Downstream functions started crashing because they were expecting fields/docs that weren’t there. Happens all the time. Having a strict schema in SQL dbs makes it more flexible and worry-free. While it may sound scary to learn SQL, it’s a one-time pain and a lifetime of joy. There are great tools to handle db schemas and migrations. They make it easy to write schemas and perform ACID-compliant schema changes with no downtime to your app. I use knexjs in all my projects, and my db migrations run on CI pipelines. I don’t have to think about this “schema change” operation. It just happens automatically via CI while still guaranteeing data integrity during migration. As a result, my code can be less defensive because it can trust the database. Does that mean I don’t worry about data structure? Of course not. I still have null checks in critical parts of my apps. I still have to plan the data restructuring, and there are ways to do it well and ways to do it poorly; it’s just a normal part of building applications.

    Local development with Firestore/Dynamodb is painful.

    Yes, there are local emulators for these, but they are painful and full of limitations. Running MySQL or Postgres instance is trivial, and I have a docker-compose script that boots up a local instance during dev. On top of that, countless SQL GUI clients are available for desktop. It’s nice to be able to connect to a local db (or even production) and view data easily. NoSQL emulators don’t even come close. Giving read-only access to contractors/third party is trivial This might not apply to everyone, but it’s handy if it does. Often you want to have a read-only view of production data, maybe for debugging some issues or running some analysis or just security. Creating read-only users in SQL is trivial. I always create read-only credentials for all my projects and use those to connect to db quickly when I need to. Often I share these read-only keys with contractors/freelancers working with me. Try doing this in Firestore/Ddb, and let me know when you succeed.

    But SQL databases are hard to manage/maintain, you say

    Are you sure? I deploy all my infrastructure using AWS CDK. It takes less than 20 lines of code to deploy an RDS database instance. Yes, it might take some effort to do the initial setup, but it’s a one-time thing. I learned it once and never had to think or write it again. I simply copy-paste the scripts into new projects, and off I go. I’d much rather put some effort initially and then have a joyful coding and querying forever than to have a quick db setup and then a lifetime of pain, limitations, and suffering when querying data.
    #Myth (kinda)

    RDS databases are expensive

    Yes, I agree. The cheapest MySQL instance on RDS would cost you around $25/month. But if you run a proper production-grade db, it’ll cost you at least $100/month. Compared to that, Firestore and Dynamodb are practically free. So this is one area where I agree; if cost is a significant factor for you, you should use NoSQL and save cash. But you will be paying in extra dev time. I still prefer to pay up for RDS mainly because it’s a fixed cost that gets marginally reduced every time I create a new project because I share a single instance across all my projects. When a project becomes popular and resource heavy (Like mono.fm did) I move it to a dedicated database at that point. Remember: one db instance can have many MySQL databases. If I’m running 1 project for $100/month, that’s expensive, but if I run 10 projects, that’s just $10/month for database per project. So this actually incentivizes me to go out and create more projects. Also, paying up once frees me from worrying about the db cost of every project or having to go out window shopping for a new database every time I have a new project idea. And btw I don’t actually pay the full price for RDS thanks to reserved instance discounts. Which can give you a lot of discounts reducing the cost further.

    Having the same db workflow on every project is liberating.

    When starting a new project, I run a script that quickly creates two dbs (one prod, one staging) in my shared RDS instance. Plus, it creates prod and read-only MySQL user credentials and gets me up and running super fast. I have a template project that sets up CI/CD, db migrations, and schema files which I can just copy paste and start working on the new app. This allows me to quickly focus on solving my app problem instead of worrying about “modeling” the database.

    Being able to query ANYTHING in any shape or form is incredible.

    You might think you don’t need this kind of querying ability right now but I promise you’ll wish it down the road if/when you start to add more features to your project. With NoSQL databases that’s a one-way street, you can’t go backwards and gain more query-ability in exchange for slower speed. Firestore/Ddb by design do not allow you to write slow queries. Being able to query absolutely anything in any shape of form is an incredible power. Yes some queries will run slow on large datasets but as long as the frequency of those slow queries is small it doesn’t matter (Like batch jobs). But this power allows you to quickly satisfy business use cases instead of performing db magic every time you need a new query.

    So when do I use a NoSQL database?

    I still use NoSQL databases in a limited capacity, mostly because the contract project I’m working on already uses one. Or when I need to optimize some parts of my queries, in that case I sync the data from SQL to Dynamodb and the change my APIs to query that instead, giving me good speed boost. But generally, I start with a MySQL db and then sprinkle some NoSQL on top for the slowest parts of my app. Apart from that, a MySQL db with a Redis cache is good enough to optimize most of the slow queries.

    Message from me

    I regularly build web apps from scratch and share my learnings with everyone.

    I write about things that are beyond beginner-101 level that you don't often find on the internet. There's ton of coding, design, marketing, and startup tips in there.

    If you'd like, I can send you some useful tips and articles via email.