Contesting Contexts: Who Owns the Schemas?
by Paweł Świątkowski
23 Jun 2023
This is the second post about challenging a common dogmas around Phoenix contexts with architectural scalability, larger applications and bigger dev teams in mind. In the first part I mostly replied to the post written by Peter Ullrich with some general (mis)conceptions about structuring the code using context. Today I want to focus on one particular part - Ecto schemas - and their place in the project structure.
Spoiler: the answer to the question in the title is “it depends”.
What are schemas?
Schemas are Ecto’s way to connect database world built with tables and records to Elixir world build with modules and structs. It contains a definition of a data translation - mapping an Elixir module to a database table and mapping all the columns and their types to struct’s fields and Elixir types. As such, it is kind of primitive. Someone with Java background could call them DTOs (data transfer object), although of course they are not objects.
It is very common to see a changeset
function added to a schema module. Some Ecto conventions actually expect this function to be defined. Changeset in principle tell how to map unsafe user input to a safe schema struct. But changeset also include some validations - from the simplest about some fields being required, via more complex about min/max values, to really quite complex custom validations.
Because of this last trait, it’s quite easy (and commonand unfortunate) to treat them as keepers of the business invariants.
But putting aside the question of business validation, some interesting questions remain about schemas. Given we organize our codebase with contexts:
- Where should the schema files be? In the contexts or outside of them?
- If inside, what about accessing the data in the interesting table from other schemas?
- What about relationships between two or more schemas, modeled with
belongs_to
orhas_many
macros?
I’ll try to show three possible approaches to answering these questions, their strenghts and weaknesses. We will start with the most familiar one…
The Phoenix Way
To answer our questions, it would be natural to reach to Phoenix documentation. And, in fact, we will find a lot of hints there. More precisely, in the section about contextsI’m using the documentation of Phoenix 1.7.6 in this post. Starting with the first code listing:
$ mix phx.gen.html Catalog Product products title:string \
description:string price:decimal views:integer
We see here a generation of Catalog
context with Product
schema mapping to products
database table, which has columns listed in the rest of the input. So the answer seems to be quite clear: Product
schema should live in Catalog
context.
Later in the tutorial we’ll create a ShoppingCart
context which will contain Cart
and CartItem
schemas. In the CartItem
schema we are going to see lines like this:
belongs_to :cart, Hello.ShoppingCart.Cart
belongs_to :product, Hello.Catalog.Product
The first one is a no-brainer, but the second one is… interesting. It couples the CartItem
schema with Product
schema from a different context. Fetching a cart with items, we will get also products from Catalog
context, and either some functions in ShoppingCart
context need to know how to handle Catalog.Product
, or they will need to call more functions from Catalog
context which know how to work with Product
.
Truth is that the tutorial mentions the this is only one possible approach:
With that in mind, we have two options. One is to expose APIs on the
Catalog
context that allows us to efficiently fetch product data for use in theShoppingCart
system, which we would manually stitch together. Or we can use database joins to fetch the dependent data. Both are valid options given your tradeoffs and application size, but joining data from the database when you have a hard data dependency is just fine for a large class of applications and is the approach we will take here.
I understand they picked the one for simplicity - after all this is a tutorial about the concept of contexts, not a lesson in project architecture in general. However, I personally feel that this decision is almost always wrong for non-trivial application. Allowing context to randomly rely on other contexts quickly created a really tangled spaghetti. Change in one context avalanches to other contexts.
Imagine you change the description
field of the Product
for description_en
and add description_es
, because now your e-commerce will be bilingualIt’s rather not the best way to work with translating content, but trust me - I’ve seen it done like that. Now everything that relies on a specisic shape of Product
struct might break. And there are probably functions relying on that in ShoppingCart
context. Then if we have Orders
context relying on ShoppingCart
it might transitively rely on Product
too (but it’s harder to follow now). If we have Payments
relying on Orders
… Well, I think you get my point.
Okay, I criticized the approach taken by Phoenix tutorial. But what are the alternatives?
Different strokes schemas for different folks contexts
One approach that I usually promote is to keep your shit together. What I mean by that is that cross-context dependencies are generally not allowed and referencing schemas belonging to a different context as a relationship is certainly not allowed. If you need to access some data stored in the products
table in ShoppingCart
schema (and you will), create a ShoppingCart.Product
schema.
This usually raises the but what about duplication and DRY?! question.
But get this. You probably don’t need all the columns from the produtcs
table in ShoppingCart
context. You’d need the name, the price, maybe description and image_url. Less likely that you need category, tags, sku. Almost impossible that you need added_by or from which wholesale_dealer you can order it. You are not duplicating whatever lies in Catalog.Product
. Instead you carefully cherry-pick which columns you really need, creating a coupling as light as possible. You can also add some ShoppingCart
-specific virtual fields that do not make sense in Catalog
context.
And the remaining question is about changesets and validationI was actually asked about it not too long ago on Twitter.
My answer to that is that you only should probably write to the table only from one context anyway. Only Catalog
changes products. Only ShoppingCart
changes Cart
. You may have small deviations from this rule for some kind of micro-optimizations (like keeping a count of how many time the product was added to cart as a column in products
table), but that should be a rare exception.
This way you don’t need to worry that you will miss an update to validation in one context when you add another - because there won’t be any duplicated changeset logic to miss it.
Hanami and ROM way
I promised the third alternative. I must confess that I haven’t really used it “in the wild”. The idea comes from my work with a project built in Hanami and ROM. They are both Ruby projects, but are pretty different from what you might recall from working with Ruby on Rails. In short, Hanami has the concept of “slices”, which might be similar to Phoenix context. And ROM does not use active record pattern, instead promoting repository patters and mapping database data to regular entity objectsThey are actually sometimes called “structs”. Sounds familiar?, that do not have any “database magic” attached.
Aside from repositories and entities/structs, ROM also defines a concept of relation. This is a mapping definition between database schema and object-oriented Ruby. And example of a relation from my experimental forum application looks like this:
module Persistence
module Relations
class Messages < ROM::Relation[:sql]
schema(:messages) do
attribute :text, Types::String
attribute :posted_at, Types::DateTime
associations do
belongs_to :threads, as: :thread
belongs_to :profiles, as: :author
end
end
end
end
end
All it says is there’s a Messages
relation that maps to messages
database table and that is has associations to threads
and profiles
relations (via respective thread_id
and profile_id
, by naming convention). Wait… Doesn’t it sound familiar? It’s almost the same description I gave to an Ecto schema before.
Relations are then used by repositories and an example repository could look like this:
module Discussion
module Repositories
class Thread < Palaver::Repository[:threads]
def create_message(thread:, author:, content:)
messages # <-- this is how relations are referenced
.changeset(:create, text: content, posted_at: DateTime.now)
.associate(thread)
.associate(author)
.commit
end
end
end
end
Here’s the thing:
While the relations are set up once globally for the whole application (after all we have just one database connected to it), repositories are specifically scoped to a given slice. In other words, what data we have is global, but what can we do with that data is specific to a slice. Or in Phoenix case - context.
Because this is obviously where I’m going to: define schemas only once, outside of contexts contextUgh, I knew it would happen. But keep the schemas as dumb as possible. They are just “bags for data”, nothing more. In the contexts you define how to use that data - by writing queries and changesets, and executing them with a repo.
This way you map the database in one place - and have one place to look up the definition. But then the schemas cannot really contain any kind of business logic. No changesets, no validations, no virtual fields, no helper functions - all these things are specific to a context in which you use the data, and as a result all these have to sit in the context. This approach almost certainly requires you to define context-specific entity structs. Similar to ROM entities, they would know nothing about the database, about preloads etc. But they will contain a specific subset of data available in the database along with some logic.
This is almost certainly most clean approach and at the same time the one requiring most work with defining entities etc. It might not be for everyone but I think it can work quite well if you really want to separate database concerns from business logic. I can’t wait to try it myself on some bigger project, although I have a feeling I will have to wait some more time.
Anyway, thanks for reading. I will probably have some more things to say in the future.