Seeds-driven development

by Paweł Świątkowski
14 Sep 2020

In my career as a web developer, I created my fair share of what I call content-driven websites. Unlike usual e-commerce or Twitter-like social platforms we talk about, these work differently. Content is not provided by users, they usually only consume it. The content creators are usually a small group of people with access to some kind of admin panel (sometimes it is really only one person!).

“Well, why not just use Wordpress” - is a question I’ve heard a lot. The answer - the content is always pretty specific. Aside from the usual title and body, it has a bunch of properties of relations to other data entities. Modelling it in classic CMS systems, such as Wordpress or Drupal, is possible, but many times I found it hard and not intuitive. For this reason, creating a completely new web application might be viable. Also, the content has to be very queryable: user expects to be able to filter on many things, sort them, etc. - so there is a certain level of interaction. Because of that, static generators won’t cut it.

Some real examples of things I’ve built:

Anything including sports results. You have teams, matches, leagues, matchdays, friendly tournaments, goals, etc. It is a complex data model with a lot of relations and searching in it must be flexible.
Legends (as in dragons and fairies). You may think it’s very plain data - a title and legend’s text, but it wasn’t. Legends are attached to places (sometimes more than one), they have different versions, some have a historical background. And they are grouped in collections such as “legends about devil dropping a stone while flying, trying to destroy the church” (no, I haven’t made this up).
A regular news site, but each news was attached to a department. Departments were stored in a tree-like hierarchy. The user should be able to select the department, and see all the news for the selected departments and all below.

Working on this kind of projects can be fun, but there are some challenges, which I see as a recurring patern. As you might have guessed, you need an admin panel to input the data. And it has to be a pretty complex one. On the other hand, the content is not useful until you display it to the end-user, allowing them to search, filter, see related entries and more. This creates a chicken-egg problem: you can either start with an admin panel and have it ready to input data that will be not displayed in a proper way. Or you could start with a feature-rich front end, but end up without any content, because no one will be able to fill it.

I found a solution, which I found interesting, while working on the legends project. This one was actually for myself, so I was the one doing the coding and creating the content. I decided to go with a seed-driven approach: along with creating a front end, I started creating a bunch of seed files with the content. These were stored in the repository, along the code, and the project was re-seeded upon every deployment. This yielded some interesting results:

I didn’t have to care about the admin panel so I did not get distracted too much from the core of the project.
During putting in real data, not just Lorem Ipsum placeholders, I discovered a lot about the data itself - for example that attaching a legend to only one place is not enough.
I had a backup for free. This is my worst worry in an early stage of the project - how much should I care about boring stuff such as backups? When is the point I should stop doing them manually and automate? Here I could take that cognitive load off my head - my content was safely kept in git repository and it wasn’t going anywhere.
I could launch a website with more that hundred actual entries present, without having to go from phase one (coding) to phase two (adding data), when I felt it was ready.
I could easily had the same set of real data on many computers (in development), without having to copy databases or whatnot.
This also forced me to use slugs from day one - since things get re-seeded often, I could not rely on ids in the database. But the good news is: I shouldn’t do that anyway!

What about projects where someone else is creating the data. First of all it’s not like other people are stupid and cannot use YAML format when you tell them to. In rare cases, when it’s not possible, you still have to take care on converting their “free format” to your seeds format. I still think it’s better option than keep fixing the admin panel when your front end is not ready.

I will certainly use this approach in the future for my content-based projects. But probably not only for them. Having real-looking data instead of dummy text can really help discover things in development phase, not few hours after the launch.

Updates

2022.02.07: Josef Strzibny wrote an interesting piece on re-using Rails fixtures (usually used for testing) as a seeding device for development. Very interesting approach, with which you can also test on a real data.

Seeds-driven development

by Paweł Świątkowski 14 Sep 2020

Updates

by Paweł Świątkowski
14 Sep 2020