Eager loading Rails ActiveStorage Variants Almost Killed My Site

For the past couple of years, I have been running the technical side of the annual Icelandic Book Catalogue, which is not only a website as its printed edition has been delivered to Icelandic households for the past century and marks the beginning of the Christmas book sales season for many.

The project has been an interesting exercise in web application performance and what I call “sensible scalability” as both visitor traffic and data entry are highly seasonal. It also relates to my print design background as it provides XML data for use in InDesign using a PowerShell based utility.

Now that it is the off-season and I was upgrading the Ruby on Rails version for the site from 7.0 to 8.0, which was long-overdue, I came across a sudden, but serious performance bottleneck that was choking the system.

Suddenly, rendering a list of books, which usually took less than 200 ms was taking up to 1200 ms and important endpoints that 3rd parties use were timing out completely and maxing out server resources.

Our Heroku dynos were crashing. I was burning the midnight oil and there were no changes to my own codebase that could explain this.

A Bit of Technical Background

If you are a Rails developer or someone who has used any other MVC framework with a good ORM library instead of writing your SQL statements manually like a caveman, you’ve come across the concept of n+1 queries, which are generally considered to be something to avoid if performance and scalability are important to you.

The general idea is that you really don’t want to iterate through database results in your app and then send in another database query for each results for relational data. You want to fetch as much information as you can in as few queries as possible to save overhead and reduce the time to takes to fetch and present your data.

If you need a refresher, an ORM is a part of the way we query the database in MVC frameworks and as the Rails docs put it, it’s a part of the M of MVC.

Rails’ ORM library is called ActiveRecord and it is how we work with the model structure and its relation to the database as it writes database queries for you and facilitates things like complex joins across tables using Ruby instead of low level SQL.

So, instead of writing your SQL queries manually like SELECT * FROM books, you can use ActiveRecord to generate the same query using Book.all in Ruby — and while it may not sound impressive at this point, once you consider data scopes, foreign keys and anything else beyond a simple single-table query, it can get very powerful very quickly and can juggle and generate SQL queries that are beyond any mortal’s comprehension.

In a book catalogue like the one I’ve been working on in the past couple of years, you can set up scopes, joins and includes — so if I call Book.current.for_web.by_category(1) from Ruby, it will generate a couple of large SQL queries that fetch all books in a specific category of online edition of the current edition of the catalogue along with eager loading relational data such as authors, their roles, different editions of the book and their ISBN numbers, publisher information etc.

So calling Book.current.for_web.by_category(1) in my Rails app will result in those two seriously chonky SQL queries hitting the database server instead of multiple smaller ones, which has serious positive impact on speed and performance:

SELECT DISTINCT "books"."title" AS alias_0, "books"."id" FROM "books" LEFT OUTER JOIN "publishers" ON "publishers"."id" = "books"."publisher_id" LEFT OUTER JOIN "book_categories" ON "book_categories"."book_id" = "books"."id" LEFT OUTER JOIN "categories" ON "categories"."id" = "book_categories"."category_id" LEFT OUTER JOIN "book_authors" ON "book_authors"."book_id" = "books"."id" LEFT OUTER JOIN "authors" ON "authors"."id" = "book_authors"."author_id" LEFT OUTER JOIN "author_types" ON "author_types"."id" = "book_authors"."author_type_id" LEFT OUTER JOIN "book_editions" ON "book_editions"."book_id" = "books"."id" LEFT OUTER JOIN "book_edition_categories" ON "book_edition_categories"."book_edition_id" = "book_editions"."id" LEFT OUTER JOIN "editions" ON "editions"."id" = "book_editions"."edition_id" WHERE "book_editions"."edition_id" IN (SELECT "editions"."id" FROM "editions" WHERE "editions"."online" = TRUE ORDER BY "editions"."id" DESC) AND "book_edition_categories"."for_web" = TRUE AND "book_edition_categories"."category_id" = 1;
SELECT "books"."id" AS t0_r0, "books"."source_id" AS t0_r1, "books"."pre_title" AS t0_r2, "books"."title" AS t0_r3, "books"."post_title" AS t0_r4, "books"."slug" AS t0_r5, "books"."description" AS t0_r6, "books"."long_description" AS t0_r7, "books"."publisher_id" AS t0_r8, "books"."created_at" AS t0_r9, "books"."updated_at" AS t0_r10, "books"."uri_to_buy" AS t0_r11, "books"."uri_to_sample" AS t0_r12, "books"."uri_to_audiobook" AS t0_r13, "books"."title_noshy" AS t0_r14, "books"."title_hypenated" AS t0_r15, "books"."country_of_origin" AS t0_r16, "books"."original_title" AS t0_r17, "books"."cover_image_srcsets" AS t0_r18, "books"."sample_pages_srcsets" AS t0_r19, "books"."original_language" AS t0_r20, "publishers"."id" AS t1_r0, "publishers"."source_id" AS t1_r1, "publishers"."name" AS t1_r2, "publishers"."slug" AS t1_r3, "publishers"."url" AS t1_r4, "publishers"."created_at" AS t1_r5, "publishers"."updated_at" AS t1_r6, "publishers"."email_address" AS t1_r7, "publishers"."kennitala" AS t1_r8, "publishers"."is_member" AS t1_r9, "publishers"."schema_type" AS t1_r10, "categories"."id" AS t2_r0, "categories"."source_id" AS t2_r1, "categories"."origin_name" AS t2_r2, "categories"."slug" AS t2_r3, "categories"."rod" AS t2_r4, "categories"."created_at" AS t2_r5, "categories"."updated_at" AS t2_r6, "categories"."group" AS t2_r7, "categories"."name" AS t2_r8, "categories"."book_count" AS t2_r9, "categories"."book_count_web" AS t2_r10, "categories"."book_count_print" AS t2_r11, "book_authors"."id" AS t3_r0, "book_authors"."source_id" AS t3_r1, "book_authors"."book_id" AS t3_r2, "book_authors"."author_id" AS t3_r3, "book_authors"."author_type_id" AS t3_r4, "book_authors"."created_at" AS t3_r5, "book_authors"."updated_at" AS t3_r6, "authors"."id" AS t4_r0, "authors"."source_id" AS t4_r1, "authors"."firstname" AS t4_r2, "authors"."lastname" AS t4_r3, "authors"."slug" AS t4_r4, "authors"."created_at" AS t4_r5, "authors"."updated_at" AS t4_r6, "authors"."is_icelandic" AS t4_r7, "authors"."order_by_name" AS t4_r8, "authors"."name" AS t4_r9, "authors"."added_by_id" AS t4_r10, "authors"."schema_type" AS t4_r11, "author_types"."id" AS t5_r0, "author_types"."source_id" AS t5_r1, "author_types"."name" AS t5_r2, "author_types"."slug" AS t5_r3, "author_types"."rod" AS t5_r4, "author_types"."created_at" AS t5_r5, "author_types"."updated_at" AS t5_r6, "author_types"."abbreviation" AS t5_r7, "author_types"."plural_name" AS t5_r8, "author_types"."schema_role" AS t5_r9, "book_editions"."id" AS t6_r0, "book_editions"."book_id" AS t6_r1, "book_editions"."edition_id" AS t6_r2, "book_editions"."created_at" AS t6_r3, "book_editions"."updated_at" AS t6_r4, "book_edition_categories"."id" AS t7_r0, "book_edition_categories"."book_edition_id" AS t7_r1, "book_edition_categories"."category_id" AS t7_r2, "book_edition_categories"."for_web" AS t7_r3, "book_edition_categories"."for_print" AS t7_r4, "book_edition_categories"."created_at" AS t7_r5, "book_edition_categories"."updated_at" AS t7_r6, "book_edition_categories"."invoiced" AS t7_r7, "book_edition_categories"."dk_invoice_number" AS t7_r8, "editions"."id" AS t8_r0, "editions"."title" AS t8_r1, "editions"."original_title_id_string" AS t8_r2, "editions"."created_at" AS t8_r3, "editions"."updated_at" AS t8_r4, "editions"."print_date" AS t8_r5, "editions"."closing_date" AS t8_r6, "editions"."opening_date" AS t8_r7, "editions"."online_date" AS t8_r8, "editions"."cover_image_srcsets" AS t8_r9, "editions"."is_legacy" AS t8_r10, "editions"."year" AS t8_r11, "editions"."online" AS t8_r12, "editions"."open_to_web_registrations" AS t8_r13, "editions"."open_to_print_registrations" AS t8_r14 FROM "books" LEFT OUTER JOIN "publishers" ON "publishers"."id" = "books"."publisher_id" LEFT OUTER JOIN "book_categories" ON "book_categories"."book_id" = "books"."id" LEFT OUTER JOIN "categories" ON "categories"."id" = "book_categories"."category_id" LEFT OUTER JOIN "book_authors" ON "book_authors"."book_id" = "books"."id" LEFT OUTER JOIN "authors" ON "authors"."id" = "book_authors"."author_id" LEFT OUTER JOIN "author_types" ON "author_types"."id" = "book_authors"."author_type_id" LEFT OUTER JOIN "book_editions" ON "book_editions"."book_id" = "books"."id" LEFT OUTER JOIN "book_edition_categories" ON "book_edition_categories"."book_edition_id" = "book_editions"."id" LEFT OUTER JOIN "editions" ON "editions"."id" = "book_editions"."edition_id" WHERE "book_editions"."edition_id" IN (SELECT "editions"."id" FROM "editions" WHERE "editions"."online" = TRUE ORDER BY "editions"."id" DESC) AND "book_edition_categories"."for_web" = TRUE AND "book_edition_categories"."category_id" = 1 AND "books"."id" IN (3, 454, 301, 431, 88, 383, 191, 27, 271, 122, 403) ORDER BY "books"."title" ASC

(No, I don’t expect anyone in their right mind to put in the energy to analyse or pick apart those SQL queries in any other way than seeing how big and chonky they are.)

My Approach

As a contractor, I try my best not to do things off-the-clock because I have to sell and bill for the work that I do for my clients — but I must admit that pushing the best possible performance out of my web apps without relying on an external caching service is a serious passion (and distraction) of mine.

For most of my work, I strife to keep 500 ms (that’s half a second) as the absolute maximum time that it takes to render a view and with model structures that almost always end up more complicated than originally intended, eager loading to prevent n+1 queries and being sure to index the database properly are the two most important parts to that.

For simple content such as our Cookie and Privacy Policy page, 250 ms (a quarter of a second) should then always be achievable, even on WordPress sites like this one, which renders the main landing page in well under 100 ms and a blog post in something like 125.

And to make it clear, most issues with making performant and scalable web apps and sites is rarely about the framework, platform or the CMS involved being scalable or not — it is about having a grasp of the inner workings of it and knowing how to avoid unnecessary overhead — and with that in mind, almost any web development framework that runs from a database can scale.

Digging for the Causes

Now, an almost tenfold increase in loading times on the production instance, not to mention timeout issues in our JSON endpoints is a serious issue.

I had not changed anything in the app itself, so my first thought was to see if the database was properly indexed — and because migrations had been run during the upgrade, which may have changed something, it was worth the look. However, the indexes were correctly defined, just like they had been.

I do run the pg_prewarm PostgreSQL extension as a cron job every morning to make sure that the indexes are loaded into memory, but checking on it, it was working as expected. No issues detected.

Downgrading and upgrading other dependencies lead nowhere, so it must have been a change in Rails itself that was causing this, but how and where?

Admittedly, I disconnect the site from monitoring services during the low season so now was the time to connect it to AppSignal to see what was really going on. (And admittedly, if you’re not using a monitoring service like AppSignal and New Relic, performance analysis is mostly guesswork).

I did not have to wait long to for the results to come in. Let’s have a look at this screenshot:

A screenshot from the AppSignal performance interface, showing two SQL queries adding about 700 ms to a timeline of 1.2 seconds.

See the two long blue lines? Remember the two chonky SQL queries from above?

They had become much bigger and insanely slow for some reason and having a further look, they were taking much longer to execute, causing a significant delay — but how?

What those queries and others ActiveRecord calls that were killing the site had in common is that they were supposed to be eager loading information about the ActiveStorage attachment each book has such as the front cover, sample pages and audiobook previews.

For the uninitiated, ActiveStorage is how Ruby on Rails keeps file attachments that are then associated with ActiveRecord records, using ActiveRecord. Each attachment can then have variations such as different sized images or different file formats and may be stored in the cloud.

Adding .with_attached_cover_image to the method chain is supposed to add the necessary joins to the resulting database query to eager load information about the image of the front cover of every book, to save on the number of queries.

This was a deliberate choice for my Rails app as it was supposed to save time and overhead, resulting in much quicker processing times (and AppSignal would stop telling me my code was bad) — but why was it suddenly grinding down to a halt, to the point of breaking the site?

You See, They Made it Better

This all led me down the rabbit hole of looking at ActiveRecord release notes for versions 8.0 down to 7.1 to see what was going on — and there it was in version 7.1!

Fix variants not included when eager loading multiple records containing a single attachment

When using the with_attached_#{name} scope for a has_one_attached relation, attachment variants were not eagerly loaded.

Russell Porter

This means that the feature was “fixed” so that all information about every single variant of each ActiveStorage object is now also eager loaded from the database, which is what caused the SQL queries to explode like this.

The thing is I only needed to eager load basic information like if a book has a picture of its cover attached at all or how many sample pages it had. What I did not intend to eager load was information about every single variant to every image attached to every book, because I was actually lazy loading each image variant like so:

# Load the srcset for browsers that support the WebP format
book.cover_image_srcsets['webp']
=> "https://cdn.bokatidindi.is/an8kcwm4chce8ztydmm3f1hq6374 150w, https://cdn.bokatidindi.is/j8gyh6jqtjkmgmjzwalyd23iztm0 260w, https://cdn.bokatidindi.is/q8ei2b2169hie958h2fg7x8it379 550w, https://cdn.bokatidindi.is/3incottl2rdioxwdhhrdm3u748b4 1200w, https://cdn.bokatidindi.is/f5da1w2yh8wr7r0y8nwh862ixzsi 1600w"

Because we’re using responsive images, each book’s front cover image has 11 different variants — 5 different sizes for both JPEG and WebP and a single TIFF image that we use for the printed edition — in addition to 7 different image sizes for each sample page in JPEG and WebP as well as the audio preview.

Now, if each book has on average 2 sample pages, we would be dealing with 25 variants per book and if we’re loading out 18 results per page, that’s 450 variants in total — and for our maximum 100 search results we don’t need cover images for each results, we may be dealing with 2500 variants, most if which we would not be using at all. And that’s a lot.

Now, add the fact that deleting variants wasn’t really a thing until version 7.1.

I had known that the variant information wasn’t included in the eager loading feature back in Rails version 7.0 and fetching them one-by-one was making things extremely slow, so the image variants for each front cover are actually lazy loaded (but not eager loaded) using static attributes for each book called cover_image_srcsets in order to improve processing times. (This also maps their URLs to their proper location on our CDN instead of linking straight to the cloud bucket storage.)

In other words, the ActiveStorage variant API was (and still is) overly slow and bloated for what it is supposed to achieve and my intention was to bypass it in a smart and efficient way while also using what was then a rudimentary but relatively efficient way to eager load some basic information like if there is a picture of the cover attached.

So by “fixing” the ActiveStorage API and eager loading all the variants if I call Book.current.for_web.by_category(@category.id).with_attached_cover_image in version 7.1, the Ruby on Rails team managed to add bloat and inefficiencies that were totally unexpected.

The Fix

Sometimes we need to accept that not everything is perfect, so I went ahead and did a project-wide search-and-replace for any .with_attached_cover_image statement and removed each and every single one of them.

This will result in more n+1 queries on the site but it has already resulted in a considerable performance improvement and it is as close enough to the original performance figures.

Two line graphs from AppSignal, showing a substantial performance improvement between July 19 and 20.

Looking at he AppSignal graphs for the site, we can see that the mean response time went from about 600 ms at its peak for one controller action that use .with_attached_cover_image statement, down to 70-170 ms.

The Lesson

While Ruby on Rails does facilitate best practices in general and in particular the prevention of n+1 queries, which can be a considerable performance bottleneck, depending on catch-all solutions may break things in unexpected ways.

Furthermore, reading the release notes and browsing the source code for the framework or CMS we use — or any other dependency for that matter — is probably the best way to trace the unexpected issues we are dealing with during upgrades as not everything makes it to blog posts, social media or official documentation.

Lastly, we need to be more eager to perform periodical updates for client projects. Regular periodical maintenance may not add extra features to your app or improve performance, but developers need to have the confidence to educate clients about the importance of incremental framework and dependency upgrades instead of rushing in at the end-of-life.