Write Migrations Last
by Paweł Świątkowski
22 Nov 2022
When starting to work on a new feature, let’s say: adding comments to a restaurant listing app, our muscle memory often tells us to start with generating a migration. This is backed up by how most of the tutorials guide us. Personally, I think that in a professional environment this is not the best approach, and I’d like to show you why.
On the Nature of Migrations
Migrations are very specific pieces of code in our repositories. They differ heavily from almost all the other code there. Here are some traits of migrations that make them different:
- They are immutable. It’s the only piece of code that should never change, unless maybe when you are modifying the settings of your linter. The rest of the code is a subject to change: you add methods, remove unused things, improve style. But you don’t do that in migrations, because it would not have any effect.
- They stay in the code forever. Unless you are regularly squashing them or deleting old migrations (you need to base on some kind of a schema dump to do that), migrations accumulate in your
migrationsdirectory with time. The directory is then hard to navigate, they clutter search results. All this witohut bringing any real value. Sure, you can look up with what columns the database was created, but then you have to crawl all more recent migrations to check if it was not altered later.
- [Optional] If you are applying migrations on a clean database with every run of the test suite (some teams swear by that), it becomes longer and longer. Even a simple migration takes time. Multiply these several milliseconds by a thousand and you have a significant wait before being able to run tests.
- They require special handling in the deployment pipeline, which sometimes results in a special ceremony around them. An example here might be a rule that pull requests can only include a migration or a code change, but never these two together. It’s a wise rule, by the way. You should consider it.
All of these factors distinguish migration code and necessitate special handling. I would even say that migrations are an extremely non-agile part of the code. With an outlier like that, perhaps we should put more thought into how we handle them.
The Problem with the “Classic Approach”
As I mentioned before, we often reach for a migration as the first step of building a new feature. Take this comments feature, for example. It’s a no-brainer to start with a schema:
If we want to push some work forward quickly, keep the PRs small or follow the rule of keeping code changes and migrations separate, at this point we could probably create a PR, get it approved really quickly, merge it, and move on to actually implementing a feature.
And this is when things start to happen…
You were working tirelessly on a model, controller and form. It all fits together, so you spin up a quick preview/staging environment and show it to the product owner. After playing with it a bit they make an obvious remark:
Why, as a signed-in user, do I have to fill in my nickname and email? It should be taken from my user data and actually link to my profile.
Well, if it’s obvious, why didn’t they specify it in the requirements in the first place, right? The thing is that, in my experience, people are really bad at predicting anything but a simple happy path. Even an experienced PO often forgets about some cases which becomes really apparent when they get something tangible to play with.
You quickly fix that up, by creating a second migration: removing non-null constraints and adding
author_id reference. That’s a second migration.
“Okay”, you might ask, “but why not just alter the first migration? After all it’s not yet deployed to production”. It might be true. Depending on your setup it might not be yet deployed at all or only deployed to some staging environment but wasn’t promoted to production. In case of mutating the migration file, you need to manually roll it back, add changes, and migrate again. Every developer in the company who already pulled
main has to do that too. On top of that, you have to reset every staging or preview environment out there. If people can spin them up on request, there might be quite a few of them.
Anyway, you fixed the data model, created PR, got approval merged, and voila.
In the second round of the product review, a VP of product comes and says that the competition also has a “thumb up/thumb down” feature with their comments. We need that too. Slightly irritated, you create a third migration adding this feature and making changes to the rest of the code.
In the last stage of finishing the feature, a UX writer from the agency you just hired barges in saying that comments are too boring. You need something snazzier, like “bites”. Faced with a choice of code vocabulary not resembling product vocabulary or making another change, you decide to write a fourth migration, renaming
The feature is now complete and deployed, everyone celebrates. But we have four new migrations as a result. Could this have been avoided? What’s the alternative.
The approach I sometimes use is to reverse the flow. Do not care about the database and storage in general, focus on the feature instead. Start coding from the entity.
The entity is a data structure that holds the business logic but is not necessarily connected to the storage layer. In the Rails context, this might be a PORO or a class inheriting from
ActiveModel::Model. In ROM, entities are built-in. In Elixir, this could be a struct or an embedded schema. Every technology has its way to represent an entity.
Remember what I said earlier about how eyes-opening itis to play with something tangible? In the case of us, developers, this applies to written code as well. Just by putting a vague idea into a concrete code, you might spot some potential issues, hardships or improvement opportunities. With them, you can ask more questions and have the feature better in the first iteration.
When you are done modelling, plug this entity into the views. Just hardcode some values in the controller. You can easily submit the comment list for review this way, as well as a new form and an update form if your product team is cooperative. You just need to explain to them that this is mock data and it won’t get updated. Or, if you want, you can use something like Phoenix LiveView to simulate a simple storage in memory. Or store it in Redis or some other ephemeral/schemaless storage. This is, however, quite a lot of work that usually goes for nothing. Just using mock data almost always works.
Remember to make your mocks a bit dynamic. If you detect a signed-in user, create a mock comment written by this user, so you can prove that the “Edit” button shows. You can hardcode some restaurant to have no comments at all to show an empty state.
When you’re done with product discussions, you would need a little more time to write the actual migration, plug the storage into the entity, write a repository or whatever your ORM requires. But you end up with only one migration: one representing the state of the feature after consultations and after seeing it in action.
But Isn’t This Too Much?
I’m not gonna lie, the approach of straight-up writing a migration might be slightly faster, require less cooperation from the customer/product side, and result in less code being thrown away. I know this is a mental blockage for some programmers; writing a code you know will be eventually deleted feels so bad that they are trying to avoid it like the plague. If you are one of these people, you may decide to try to challenge this instinct or just stick to “the old way.” After all, sometimes trade-offs are not worth it.
But speaking of the trade-offs, the “entities approach” may give you some additional benefits I didn’t mention earlier:
- You actually have entities! Many people struggle with wanting to add entities but not knowing when in the process to introduce them. Having data structures that are disconnected from the storage is generally beneficial because it’s easier to re-model the business logic, and testing them can be order of magnitude faster. Try and fall in love with sub-second test runs of the whole business logic.
- It lets you think outside of the database. I started to notice a pattern a few years ago. When talking about business concepts, people immediately think in terms of indices, foreign keys and database types. This can be limiting. Sometimes one business entity might be supported by two or three database tables. And sometimes one table might support multiple entities (even without STI!).
That’s it about my alternative approach to writing features. Given the problems with growing number of migrations it solves and the additional benefits it brings, I believe it is at least worth a shot. If you don’t like it, don’t force it. After all, it always depends in software engineering.