Web caching: what is the right time-to-live for cached pages?

I have been having performance problems with my blog and this forced me to spend time digging into the issue. Some friends of mine advocate that I should just “pay someone” and they are no doubt right that it would be the economical and strategic choice. Sadly, though I am eager to pay for experts, I am also a nerd and I like to find out why things fail.

My blog uses something called WordPress. It is a popular blog platform written in PHP. WordPress is fast enough for small sites, but the minute your site gets some serious traffic, it tends to struggle. To speed things up, I tried using a WordPress plugin called WP Super Cache. What the plugin does is to materialize pages as precomputed HTML. It should make sites super fast.

There is a caveat to such plugins: by the time you blog is under so much stress that PHP scripts can’t run without crashing, no plugin is likely to save you.

I also use an external service called Cloudflare. Cloudflare acts as a distinct cache, possibly serving pre-rendered pages to people worldwide. Cloudflare is what is keeping my blog alive right now.

After I reported that by default (without forceful rules) Cloudflare did very little caching, a Cloudflare engineer got in touch. He told me that my pages were served to Cloudflare with a time-to-live delay of 3 seconds. That is, my server instructs Cloudflare to throw away cached pages after three seconds.

I traced back the problem to what is called an htaccess file on my server:

<IfModule mod_expires.c>
  ExpiresActive On
  ExpiresByType text/html A3

The mysterious “A3” means “expires after 3 seconds”.

How did this instruction get there? It gets written by WP Super Cache. I checked WP Super Cache.

I work on software performance, but I am not an expert on Web performance. However, this feels like a very short time-to-live.

I am puzzled by this decisions by the authors of WP Super Cache.

Daniel Lemire, "Web caching: what is the right time-to-live for cached pages?," in Daniel Lemire's blog, February 1, 2019.

Published by

Daniel Lemire

A computer science professor at the University of Quebec (TELUQ).

6 thoughts on “Web caching: what is the right time-to-live for cached pages?”

  1. I have some experience building and managing some quite big WordPress sites. Using WP Super Cache is usually a bad idea, especially when there are dedicated pieces of really nice software already available.

    The reason behind the “A3” setting is that WP Super Cache is storing “rendered” HTML pages. These pages are served without executing WordPress or any PHP code, so any HTTP header must be set by the web server. If a post is modified, the HTML version must be regenerated. The short time-to-live setting is there to prevent client caching and to force your readers’ browsers to fetch always the content from your server. This is not a very good idea.

    If you have the time try to use Varnish. Varnish is a really fast caching HTTP reverse proxy and there are some nice plugins to make it work with WordPress.

  2. In my experience, the right TTL for a web resource is basically “forever” (or very close to it) or “never” (or very close to it).

    The forever case is suitable when you identify the resource by a hash of its contents – when you want to change it, you publish a new file with a different name, and change the pages that refer to it to use the new file name.

    The never case is suitable for top-level resources whose names must be stable, eg a blog post URL. You want a very short TTL so that changes will be visible promptly.

    However, this is in tension with the fact that you want to serve things quickly. Often the solution here is to cache things internally, but serve them to the world with a short TTL. WP Super Cache may be doing this. Presumably, because it’s hooked in to the WP architecture, it can invalidate its internal cache when you revise a post or receive a new comment. Thus, when the 3 second external TTL expires, the cost of re-rendering the page is still relatively cheap, because it’s just checking to see if its internal copy is stale, determining that it’s not, and serving it from a file cache (which hopefully is itself warm in the kernel’s buffer cache). Particularly good systems will serve a stale page and regenerate the page in the background to ensure consistently low latencies.

    I don’t have personal knowledge of how WP Super Cache works, but I’ve done high-performance web stuff with other tools and this is the commonly accepted approach. I wouldn’t be too suprised if WP Super Cache doesn’t do this, or if it’s easy to misconfigure it such that it does it poorly, though.

  3. I used to set my webcache to 7 days for HTML, and a month for CSS, JS, images, etc. Google recommends a year. Now, my website is hosted on github pages, which is surprisingly fast and has good defaults. I use Hugo to generate static HTML, which works like a charm for blogs and other content that changes infrequently. You may want to look into this solution, even if you are going the self-hosted route.

    Check out Google pagespeed insights, it gives you real suggestions for performance optimisations, along with code snippets and other howtos. For example, running it on your website shows that, on a mobile device, you could save up to 2.7 seconds by deferring loading non-critical CSS and JS, and another 1.5 seconds by not serving CSS that is not used by your website.

  4. Hey Daniel, I was reading your post on computing remainders and decided to see what else you’ve written. Anyways, I used to maintain one of the busiest websites on the internet for a living, and while I’m merely a consultant now, I may still remember a thing or two about how the internet works.

    The three second cache is what’s called a microcache. If a page gets slashdotted (remember that term?), even though it may be receiving thousands of requests per second- say 300,000 per minute, your web server with only render the page 20 times in that minute, with it being served from cache the remainder of the time (whether that be a CDN, Apache’s cache, a Varnish or NGINX cache, etc). Anyways, if your performance is how you want it to be, my hat is off to you! If not, feel free to shoot me an email and I’d be more than happy to offer some suggestions tailored to your needs from my experience (for free, of course- it would be an honor to work with anyone that’s bested the GCC, and you’d be doing the implementation anyways).

  5. What’s the old joke about cache invalidation being one of the hardest problems in computer science? Honestly, particularly for content driven tools like WordPress, Etags are probably the right tool, not TTL based cache expiry. I don’t know why frameworks don’t embrace them more.

Leave a Reply

Your email address will not be published.

You may subscribe to this blog by email.