Splitting the monolith

Let's project into the world where Runnerly, as implemented previously, starts to be used by a lot of people. Features are added, bugs are fixed, and the database is steadily growing.

The first problem that we're facing is the background process that creates reports and calls Strava. Since we're having thousands of users, these tasks take most of the server resources, and users are experiencing slowdowns on the frontend.

It's getting obvious that we need to have them running on separate servers. With the monolithic application using Celery and Redis, it's not an issue. We can dedicate a couple of new servers for the background jobs.

But the biggest concern if we do this is that the Celery worker code needs to import the Flask application code to operate. So the deployment dedicated to the background workers needs to include the whole Flask app. That also means that every time something changes in the app, we'll need to update the Celery workers as well to avoid regression.

That also means we'll have to install on a server where the only role is to pump data out of Strava, all the dependencies the Flask application has. If we use Bootstrap in our templates, we'll have to deploy it on the Celery worker server!

This dependency issue begs the question: "Why does the Celery worker need to be in the Flask application in the first place?" That design was excellent when we started to code Runnerly, but it became obvious that it's fragile.

The interactions Celery has with the application are very specific. The Strava worker needs to:

  • Get the Strava tokens
  • Add new runs

Instead of using the Flask app code, the Celery worker code could be entirely independent and just interacts with the database directly.

Having the Celery worker acting as a separate microservice is a great first step to split our monolithic app--let's call it the Strava Service. The worker that's in charge of building reports can be split the same way to run, on its own, the Reports Service. Each one of these Celery workers can focus on performing one single task.

The biggest design decision when doing this is whether these new microservices call the database directly or whether they call it via an HTTP API that acts as an intermediate between the services and the database.

Direct database calls seem like the simplest solution, but this introduces another problem. Since the original Flask app, the Strava service and the Reports Service will all share the same database; every time something changes in it, they all get impacted.

If there's an intermediate layer that exposes to the different services just the info they need to do their jobs, it reduces the database dependency problem. If well designed, an HTTP API contract compatibility can be maintained when changes are made in the database schema.

As far as the Strava and Report microservices are concerned, they are Celery workers, so we don't have to design any HTTP API for them. They get some work from the Redis broker and then interact with the service wrapping database calls. Let's call this new intermediate the Data Service.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset