This is not a regular finished-and-refined article, just a quicknote. A brain-dump, if you will. Read at your own risk.

Contesting Contexts: Who Owns the Schemas?

by Paweł Świątkowski
23 Jun 2023

This is the second post about challenging a common dogmas around Phoenix contexts with architectural scalability, larger applications and bigger dev teams in mind. In the first part I mostly replied to the post written by Peter Ullrich with some general (mis)conceptions about structuring the code using context. Today I want to focus on one particular part - Ecto schemas - and their place in the project structure.

Spoiler: the answer to the question in the title is “it depends”.

What are schemas?

Schemas are Ecto’s way to connect database world built with tables and records to Elixir world build with modules and structs. It contains a definition of a data translation - mapping an Elixir module to a database table and mapping all the columns and their types to struct’s fields and Elixir types. As such, it is kind of primitive. Someone with Java background could call them DTOs (data transfer object), although of course they are not objects.

It is very common to see a changeset function added to a schema module. Some Ecto conventions actually expect this function to be defined. Changeset in principle tell how to map unsafe user input to a safe schema struct. But changeset also include some validations - from the simplest about some fields being required, via more complex about min/max values, to really quite complex custom validations.

Because of this last trait, it’s quite easy (and commonand unfortunate) to treat them as keepers of the business invariants.

But putting aside the question of business validation, some interesting questions remain about schemas. Given we organize our codebase with contexts:

  • Where should the schema files be? In the contexts or outside of them?
  • If inside, what about accessing the data in the interesting table from other schemas?
  • What about relationships between two or more schemas, modeled with belongs_to or has_many macros?

I’ll try to show three possible approaches to answering these questions, their strenghts and weaknesses. We will start with the most familiar one…

The Phoenix Way

To answer our questions, it would be natural to reach to Phoenix documentation. And, in fact, we will find a lot of hints there. More precisely, in the section about contextsI’m using the documentation of Phoenix 1.7.6 in this post. Starting with the first code listing:

$ mix phx.gen.html Catalog Product products title:string \
description:string price:decimal views:integer

We see here a generation of Catalog context with Product schema mapping to products database table, which has columns listed in the rest of the input. So the answer seems to be quite clear: Product schema should live in Catalog context.

Later in the tutorial we’ll create a ShoppingCart context which will contain Cart and CartItem schemas. In the CartItem schema we are going to see lines like this:

belongs_to :cart, Hello.ShoppingCart.Cart
belongs_to :product, Hello.Catalog.Product

The first one is a no-brainer, but the second one is… interesting. It couples the CartItem schema with Product schema from a different context. Fetching a cart with items, we will get also products from Catalog context, and either some functions in ShoppingCart context need to know how to handle Catalog.Product, or they will need to call more functions from Catalog context which know how to work with Product.

Truth is that the tutorial mentions the this is only one possible approach:

With that in mind, we have two options. One is to expose APIs on the Catalog context that allows us to efficiently fetch product data for use in the ShoppingCart system, which we would manually stitch together. Or we can use database joins to fetch the dependent data. Both are valid options given your tradeoffs and application size, but joining data from the database when you have a hard data dependency is just fine for a large class of applications and is the approach we will take here.

I understand they picked the one for simplicity - after all this is a tutorial about the concept of contexts, not a lesson in project architecture in general. However, I personally feel that this decision is almost always wrong for non-trivial application. Allowing context to randomly rely on other contexts quickly created a really tangled spaghetti. Change in one context avalanches to other contexts.

Imagine you change the description field of the Product for description_en and add description_es, because now your e-commerce will be bilingualIt’s rather not the best way to work with translating content, but trust me - I’ve seen it done like that. Now everything that relies on a specisic shape of Product struct might break. And there are probably functions relying on that in ShoppingCart context. Then if we have Orders context relying on ShoppingCart it might transitively rely on Product too (but it’s harder to follow now). If we have Payments relying on Orders… Well, I think you get my point.

Okay, I criticized the approach taken by Phoenix tutorial. But what are the alternatives?

Different strokes schemas for different folks contexts

One approach that I usually promote is to keep your shit together. What I mean by that is that cross-context dependencies are generally not allowed and referencing schemas belonging to a different context as a relationship is certainly not allowed. If you need to access some data stored in the products table in ShoppingCart schema (and you will), create a ShoppingCart.Product schema.

This usually raises the but what about duplication and DRY?! question.

But get this. You probably don’t need all the columns from the produtcs table in ShoppingCart context. You’d need the name, the price, maybe description and image_url. Less likely that you need category, tags, sku. Almost impossible that you need added_by or from which wholesale_dealer you can order it. You are not duplicating whatever lies in Catalog.Product. Instead you carefully cherry-pick which columns you really need, creating a coupling as light as possible. You can also add some ShoppingCart-specific virtual fields that do not make sense in Catalog context.

And the remaining question is about changesets and validationI was actually asked about it not too long ago on Twitter.

My answer to that is that you only should probably write to the table only from one context anyway. Only Catalog changes products. Only ShoppingCart changes Cart. You may have small deviations from this rule for some kind of micro-optimizations (like keeping a count of how many time the product was added to cart as a column in products table), but that should be a rare exception.

This way you don’t need to worry that you will miss an update to validation in one context when you add another - because there won’t be any duplicated changeset logic to miss it.

Hanami and ROM way

I promised the third alternative. I must confess that I haven’t really used it “in the wild”. The idea comes from my work with a project built in Hanami and ROM. They are both Ruby projects, but are pretty different from what you might recall from working with Ruby on Rails. In short, Hanami has the concept of “slices”, which might be similar to Phoenix context. And ROM does not use active record pattern, instead promoting repository patters and mapping database data to regular entity objectsThey are actually sometimes called “structs”. Sounds familiar?, that do not have any “database magic” attached.

Aside from repositories and entities/structs, ROM also defines a concept of relation. This is a mapping definition between database schema and object-oriented Ruby. And example of a relation from my experimental forum application looks like this:

module Persistence
  module Relations
    class Messages < ROM::Relation[:sql]
      schema(:messages) do
        attribute :text, Types::String
        attribute :posted_at, Types::DateTime
        
        associations do
          belongs_to :threads, as: :thread
          belongs_to :profiles, as: :author
        end
      end
    end
  end
end

All it says is there’s a Messages relation that maps to messages database table and that is has associations to threads and profiles relations (via respective thread_id and profile_id, by naming convention). Wait… Doesn’t it sound familiar? It’s almost the same description I gave to an Ecto schema before.

Relations are then used by repositories and an example repository could look like this:

module Discussion
  module Repositories
    class Thread < Palaver::Repository[:threads]
      def create_message(thread:, author:, content:)
        messages    # <-- this is how relations are referenced
          .changeset(:create, text: content, posted_at: DateTime.now)
          .associate(thread)
          .associate(author)
          .commit
      end
    end
  end
end

Here’s the thing:

While the relations are set up once globally for the whole application (after all we have just one database connected to it), repositories are specifically scoped to a given slice. In other words, what data we have is global, but what can we do with that data is specific to a slice. Or in Phoenix case - context.

Because this is obviously where I’m going to: define schemas only once, outside of contexts contextUgh, I knew it would happen. But keep the schemas as dumb as possible. They are just “bags for data”, nothing more. In the contexts you define how to use that data - by writing queries and changesets, and executing them with a repo.

This way you map the database in one place - and have one place to look up the definition. But then the schemas cannot really contain any kind of business logic. No changesets, no validations, no virtual fields, no helper functions - all these things are specific to a context in which you use the data, and as a result all these have to sit in the context. This approach almost certainly requires you to define context-specific entity structs. Similar to ROM entities, they would know nothing about the database, about preloads etc. But they will contain a specific subset of data available in the database along with some logic.

This is almost certainly most clean approach and at the same time the one requiring most work with defining entities etc. It might not be for everyone but I think it can work quite well if you really want to separate database concerns from business logic. I can’t wait to try it myself on some bigger project, although I have a feeling I will have to wait some more time.

Anyway, thanks for reading. I will probably have some more things to say in the future.