Yesterday we hit pretty interesting issue with our PostgreSQL when all of the sudden queries started to take up an unacceptable long time. It was interesting, because we did not change anything and there was no peak in incoming data. Yet system started to be really irresponsive and we had to investigate what happened. So, here’s a story.
First of all, let me give you a sneak peek of how database looks like. We have user profiles in one table, containing such data as gender, age, name org phone number. In many other tables there we have information about users activity, such as logging into wifi, coupon usage etc. For the purpose of this example let’s say we have this:
Bonus: you have about 15 million profile records and maybe 40 million coupon usage events.
One on things we need to query is: “find me all women that used less than 2 coupons in last month”. What we did was what we thought to be a standard approach:
Of course, our query was “a bit” more complicated, but you get the idea. This worked pretty well. Until it stopped.
After some manual investigation (
EXPLAIN ANALYZE gave me nothing of value) I found out that when a subquery returns more than some number between 60k and 61k ids (youe mileage will probably vary) it changes from acceptable 1.5 second to more than 300 second to perform a query. That was not good.
After doing some reading I realized that PostgreSQL is not really good at having many values in
NOT IN clause. However, what it is really good at are
JOINs. All I need was to figure out how to use joins in our query. Here’s what I ended up with:
Left joining a subquery turns out to be pretty fast. To have
NOT IN we need to check if there was nothing to join (so it will be
null). This way we have stable less-than-two-seconds query times for all kind of intervals, be it one month or two years.