SaaS Design Patterns: Cron jobs using api routes

January 3rd, 2023

SaaS Patterns
This article is a work-in-progress.

I like to write down all my ideas as "drafts" and slowly improve them over time. Things written here may not be well organized or accurate at the moment. Beware!!

Cron jobs are one of the most common patterns used in web apps. Today I want to show you how to design a cron jobs system for your app that can simplify its management, ,maintenance and brings a lot of benefits.

Some use cases I’ve used cron jobs in my services.

  • aggregate kpis and other metrics
  • Send scheduled emails to users
  • Cleanup unverified users
  • Generate reports
  • Generate usage metrics and billing invoices
  • Download some files from different locations and do some parsing on them

There are several challenges when running them

Cron jobs typically operate on the same data/biz logic as your main app so generally we’d like to execute them in a similar env as your app. That means recreating the entire app env for running cron jobs on schedule and then properly closing/destroying resources.

Cron job system should allow us to trigger jobs on demand. This is very useful for debugging or handling some urgent use cases. But this means this system needs to be able to handle dual trigger design ie. on a schedule, and via some script runner or web endpoint.

Monitoring and logging cron jobs is painful but important. You wanna make sure that crons run successfully or at least notifies you in case pf error with detailed logs for debugging.

We’d also like to be able to see the history of past jobs and their run status for audit and debugging.

These might sound like overengineered problems (yagni) but when building large applications you’ll hit these requirements sooner or later. In my experience it’s better to learn/build it once and then just reuse it across projects.

The traditional approach to cron jobs is to simply have scripts in your codebase that when triggerred via cli does the job. However this complicates the execution as we now have to recreate production like environment somewhere and then trigger the CLI commands. This is doable and people generally do it via CI/CD and docker based deployments but it’s a pain to manage.

I recommend using a REST API endpoint based approach where each cron job is simply a REST endpoint in your app. So for eg your app apis looks like this

/users/…

/billing/invoices/…

You will simply add a new endpoint for each cron job. eg.

/crons/cleanup-mailing-list

/crons/generate-invoices

To run these cron jobs you can use any scheduler available to you. I run my apps on AWS so I use EventBridge which allows me to trigger any HTTP endpoint at a given schedule.

What to put inside these endpoints?

Each endpoint will contain all the logic necessary to finish that cron job. For eg. when /crons/generate-invoices endpoint is hit, it will load all paid users from the db, loop through each and get the monthly usage data and then generate invoices via Stripe or any other external API and then update the database with latest invoice data.

The response of the request could return the summary of cron execution like no. of users processed, any error, warning etc. If you think that blocking http request for a long running operation like cron job is bad then you can always just return the response quickly (Like 201 Accepted) and then continue doing the cron job work in the background. I use blocking web request until cron in done if the request takes less than 4-5 minutes. For anything longer I use non-blocking response. I prefer blocking one because firstly it’s not that big of a deal performance wise because you are not hitting cron endpoints hundreds of times a second (Crons usually run every few minutes max) so each individual request can take longer and still won’t crash your loadbalancer or server. But note that you might have to tweak your loadbalancer timeout settings to make sure you don’t abandon cron requests if they take too long.

Another reason to prefer blocking ones is because it gives immediate feedback about the success or failure of the job. Non blocking background jobs might not finish due to a server reboot or crash and you’ll never know about them.

Also by blocking the request you can do some pretty neat tricks like streaming the stdout response of the cron job the http request (using something like res.write()) which allows me to view the cron job logs in real time as it executes. This has been a big time-saver for me in the past.

How to secure cron endpoints?

Understandibly you must be panicking thinking about exposing your private cron endpoints to public. Does that mean anyone who knows the endpoint can trigger it?

The answer is no. We restrict cron endpoints using a secret token. It’s very easy to put a middleware on all cron routes to ensure that caller must pass a secret token via query param (like ?cronToken=supersecret) or via some header (like x-cron-token: supersecret) or else the request is immediately rejected.

It looks like this in practice for an expressjs app.

const requireCronTokenMiddleware = (req, res, next) => {
      if(req.query.cronToken !== process.env.CRON_TOKEN){
          return unauthorizedError
      }
      //or use whatever logic to verify the req
      //like request headers or IP address
      //or session cookies
      next()
}


app.use('/crons', requireCronTokenMiddleware)
app.get('/crons/generate-invoices', ....)
app.get('/crons/some-other-cron', ....)

Hi,
I'm Kashif 👋

I'm the founder of NameGrab, Mono, and OneDomain
I've been in working tech for about 12 years, and I love building startups from scratch and sharing my thoughts here.

Find me on Twitter