Simple automated DB migrations and seeding

December 20th, 2022

sqldatabases
This article is a work-in-progress.

I like to write down all my ideas as "drafts" and slowly improve them over time. Things written here may not be well organized or accurate at the moment. Beware!!

One of biggest hurdles when using any SQL database is the act of changing database schema aka DB schema migration. This is probably one of the major reason people run to NoSQL databases because you don’t have to do any of it there. I don’t find SQL migrations hard because I utilize libraries like knex.js and some organizational tricks to make the whole process super simple and enjoyable.

Here is how my setup looks like in practice.

I git pull my code and run yarn dev, this automatically runs a series of db migration scripts to initialize my database and I can start coding on my app immediately. If I need, I can also inject dummy data into my db for testing with a single command. When I have to alter my db schema I write a bit of code to tell knexjs how to change the db, and thats it, I restart my app and boom the db has the new changes.

When I’m done changing schema, I push to Github, which is then picked up by CI that deploys the new version of app. When that new app starts, it automatically runs the db migration scripts so that my production database gets properly initialized with correct database schema.

The whole workflow is so smooth, I don’t even have to think about database schema changes at all. The only time I touch the migration files is when I’m adding new entities in my app or changing existing ones, maybe once a month or so. This flow works very nicely in practice.

What if you don’t use knexjs or NodeJS or Javascript?

Ofcourse not everyone is able to use knexjs or NodeJS but the principle here still applies. I’m not a big fan of ORMs but if you are using one they usually have some support for db migrations. And most other programming languages have some way of running scripts before launching your main app. The concepts discussed here can be applied with some effort.

And yes, you will have to learn knexjs and SQL to be able to write these migrations scripts, but it takes maybe a few hours to take a crash course for both. I’m not teaching knexjs or SQL in this post, I’m only sharing how to use them in a real world scenario to simplify your workflow without losing your mind.

The tldr here is that you let knexjs handle the migration and seeding work and you just setup some scripts using package.json to automatically call knexjs at relevant times (whenever you think migrations or seeds should run)

Where to keep all migration and seed scripts?

I keep a folder called db-data to neatly organize all migration and seed scripts. The main database connection logic is kept in lib/db/index.js and the connection config data is in lib/db/config.js

//Root directory
//...
db-data/             //<---------- Organizes all migration and seed scripts
  migrations/
  seeds/
lib/
  db/
    index.js  //<---------- Main db logic file
    config.js //<---------- Knex connection config file
    utils.js
  //...
//...

Using pre scripts to auto run migrations

This is how my package.json scripts look like (It’s a custom nextjs app).

// package.json
"scripts": {
    "predev": "yarn; yarn db:migrate",
    "dev": "node server.js",
    "prestart": "yarn db:migrate",
    "start": "node server.js",

    "db:make": "knex migrate:make --knexfile ./lib/db/config.js",
    "db:migrate": "knex migrate:latest --knexfile ./lib/db/config.js",
    "db:seed": "knex seed:run --knexfile ./lib/db/config.js",
    //......
  },

I use predev to run yarn install and migrate scripts. Which means running yarn dev will automatically install new packages (very useful when working in teams where package.json changes frequently) and also runs the latest migrations.

prestart is the same thing, just running on the server before yarn start. Which basically means I don’t have to worry about running migrations manually on the server. When the app starts it automatically runs the migrations. And since knexjs migrations are safe to rerun and also supports concurrent locking there is no worry of db conflicts or race conditions.

Who said SQL migrations are difficult? It is practically invisible for me.

Creating new migrations:

In SQL databases, you often have to create new tables to store data or change existing ones. We do this by running SQL scripts. But creating them by hand and running them in an automatic fashion is difficult, mainly because you have to keep track of which scripts you already ran so you don’t execute them again etc. Knexjs solves all these issues.

When I need to alter db schema I run yarn db:make my_migration_name which creates a blank migration script in the db folder, which I later populate with relevant migration data.

This is how my migrations folder look like for droppped.com project. This folder gets populated with files over time as I add/update db schema. You can see I added some admin, users tables and then later as the project evolved I added domain info and some other stuff. Note that knex adds a unique timestamp as prefix in filename. Also note that this file content should never change once you run it on a database.

db-data/
  migrations/
    20201203110548_add_admins.js
    20201203110549_add_users.js
    20201203110550_add_domains.js
    20210131091658_add_job_history.js
    20210202120059_add_date_columns_domains.js
    20210202124741_add_domain_links.js
    20210203013936_add_import_stats.js
    20210204022919_add_updatedat_to_domain_links.js

and here is how the ...add_domains.js file look like.

// db/20201203110550_add_domains.js

exports.up = function (knex) {
  return knex.schema.createTable('domains', function (table) {
    table.string('domain').primary()
    table.string('tld').notNullable()
    table.string('tweetId')
    table.integer('clicks').defaultTo(0)
    table.date('expiryDate')
    table
      .enu('shortlistStatus', ['pending', 'shortlisted', 'rejected'])
      .defaultTo('pending')

    table.bool('isArchived').defaultTo(false)
  })
}

exports.down = function () {}

This is basically creating a domains table with domain as the primary key, I’m also storing the tweetId and some other things relevant to this dataset. It might look complicated but its a very typical knexjs migration file. You can read knex.js docs on how to write such files.

Also notice I don’t use down migrations, I think down migrations are pretty useless (if not dangerous) and redundant these days.

Seeding the db

Having some seed data in your local database is extremely useful for testing and debugging. But setting the seed up can be painful so devs often just ignore it. Using knexjs’ seed function I can easily seed my dev database with lots of data so I’m not staring at blank screen during testing.

Also as the project grows you will have to test a lot more edge cases (like billing users, banned users, app usage etc.) so creating test data items for those can be useful. Having this seed setup in place helps a lot. Trust me.

When I need to seed the db, I just run yarn db:seed and knexjs will run all the scripts found in db-data/seeds folder.

This is how my seeds folder looks like for a new project.

db-data/
  migrations/
  seeds/
    01_admin.js
    02_users.js

Most of my apps have an admin panel and some concept of users, so I always have at least these two files. This folder gets populated over time as I add more test entities and things to seed.

And this is how a typical seed file looks like for knexjs.

// db-data/seeds/01_admin.js
const utils = require('../../lib/db/utils')

exports.seed = async function (knex) {
  utils.ensureIsLocal(knex)

  await knex('admins').del()

  const items = [
    {
      email: 'test@email.com',
      name: 'Kashif',
    },
  ]

  await knex.batchInsert('admins', items)
}

Here I’m seeding my local database with an admin, so I can login to the admin panel using test@email.com account.

Hot tip: create a utility function called ensureIsLocal to make sure that you are not running the seeds on remote databases. Because you could accidentally delete production data or fill it with junk info.

// lib/db/utils.js

exports.ensureIsLocal = knex => {
  const { host } = knex.client.connectionSettings
  if (!host.startsWith('localhost')) {
    throw Error('Trying to seed a remote DB, very dangerous, not allowed...')
  }
}

Hi,
I'm Kashif 👋

I'm the founder of NameGrab, Mono, and OneDomain
I've been in working tech for about 11 years, and I love building startups from scratch and sharing my thoughts here.

Find me on Twitter