ActiveRecord, over-optimization and POLS

by Paweł Świątkowski
28 Jul 2022

I recently learned that ActiveRecord implements an “interesting” optimization. When you pass a condition to where that could not ever be satisfied, mostly by passing an empty array, it skips the database call completely, returning empty result set. An example of this might be:

Order.where(customer_id: []).to_a

Just by looking at it, we can see that it won’t return anything. So it is pretty cool to have such optimization in place, right? Well, I think it’s actually wrong. Here’s why.

It does not really optimize

The code above would translate into a following SQL query:

SELECT orders.* from orders where 1=0;

Now, the thing is that database can handle itself in such situations. No need to take over its responsibility in a patronizing way by the framework. This is a sample query plan for a relatively large table:

app_dev=# explain analyze select * from ahoy_events where 1=0;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.002..0.002 rows=0 loops=1)
   One-Time Filter: false
 Planning Time: 0.087 ms
 Execution Time: 0.015 ms

As you can see, the size of the table does not matter, because the planner immediately detects that there is a condition that always evaluates to false, so there’s no need to look into the table at all. The whole operation takes around 0.1 ms. So, unless you are connected to a database on the other side of the globe and latency is killing you, this optimization does not optimize anything in a meaningful way.

It violates POLS

POLS stands for “principle of least surprise”. Among DRY, SOLID or KISS, which are widely known acronyms, POLS is a little forsaken. However it is very important rule. It says that the code should do what we intuitively think it would do, unless there’s a very good reason to do otherwise. We often use the rule intuitively, writing new code in a way that matches the rest of the codebase. It might not be the best way to solve the problem, but it is consistent with how we usually solve such problems. And we often value it more than “being slightly better”.

But what does POLS has to do with an optimization I’m talking about?

Wherever I see a code like SomeModel.where(column: value).to_a I expect it to make a call to the database. With all the consequences of it, such as:

Failing when database is not reachable
Failing when there is no table backing the model in question
Failing when a non-existing column is referenced

All these three assumptions are broken with this optimization in place. Not only we have not optimized, but we also have the code that behaves counterintuitively.

Syntax errors based on data

Last but not least, this code can really blow up unexpectedly. Our life is not easy with dynamic interpreted language. It’s often that we don’t see the code failing unless a specific code path is executed, which might be long after deploying the code in question to production. However, the ActiveRecord “optimization” adds a whole new level to unexpected failures. Consider this code:

Order.where(customer_id: vip_customer_ids).maximum(:order_vaule) # note the typo

This is a totally valid case - we want to take the orders places by our VIP users and find out what the highest total value was. The list of VIP users comes from somewhere and at first it’s empty, because we don’t have any VIPs yet.

As you see, I’ve made a typo. This typo would easily blow up in the first moment we run this code, we would catch it and fix it. Except not, thanks to the wonderful optimization we have in Rails. The query won’t ever make it into the database so the database won’t tell us that there is no such column. Now, it’s Saturday evening and we have our first VIP signed in. Let’s say that the list of VIPs is fetched periodically from Airtable, where out sales department manages its data. And once the data is fetched… BOOM, some of your pages or API endpoints stop working, because of 500 error. vip_customer_ids stops being empty and the calls start to actually reach the database. Now, the typo we’ve made in the code is clearly visible, but only now because the data somewhere changed.

This is just another variant of POLS violation. We wouldn’t generally expect the change in some database to cause failures in the code. But we’ve just witnessed that. Of course, one can try to dismiss it with the notion that a proper test suite should catch it. And it’s true, however no test suite is perfect (even, or maybe especially, when line coverage is close to 100%) and things like that tend to slip through it.

Summary

This concludes my little rant about harmful over-optimization in ActiveRecord. It does not optimize and it can make our lives more miserable. But it is there, and probably not going away, so at least YE BE WARNED.

And for some context, here’s the PR that initially added this. Here’s a recent one, which added this behaviour to calculations too. Oh, and by the way, it already broke stuff for people before.