Load Testing the WP Super Cache Plugin with Kernl

So you have a WordPress site and you are expecting a spike of traffic to it. Do you know when your site will fall over? How many users can be browsing it at once and still have a good experience? If you can’t answer these questions then you need to load test your site! Through the course of this article I’ll use a new load testing tool by Kernl to test enabling WP Super Cache on Kernl’s blog. Kernl WordPress load testing doesn’t require any coding or load testing experience and by the end you’ll know how to test any performance optimizations of your WordPress site.

What is Kernl.us?

Kernl.us is a WordPress developer tool service. It does a lot of different things to help WordPress developers be more productive including:

What we’re going to focus on is the WordPress load testing portion of Kernl. Most WordPress developers never really consider load testing for a number of different reasons. Maybe they think that their site can handle lots of load already or perhaps they don’t know know where to start with load testing. Luckily Kernl can help with both of those problems.

So lets get started! The Kernl blog runs on a $5/month Digital Ocean droplet. It has 1GB of RAM and 1 CPU. It runs your typical LAMP setup (Linux, Apache, MySQL, PHP 7). In general this setup is not known to scale well out of the box and requires quite a bit of tweaking to be performant. For this blog post though we’re only going to add a caching plugin and see how that changes performance.

Getting Started

Before starting any WordPress performance optimizations we need to know what our current WordPress performance characteristics look like. Kernl makes this a 2 step process:

  1.  Create a template – Kernl uses the WP JSON API to fetch your site’s layout. It then creates a load test template that you can tweak based on the data that it fetched. A template simply tells Kernl what routes it should test and also how frequently it should visit a given route.
  2. Start a load test – Once you have a template you can start a load test. The load test reads the template, spins up the load testing infrastructure, and then reports back the status of the load test.

For each load test we’re going to throw traffic at https://blog.kernl.us using the following parameters:

  • 200 users
  • 10 minute duration
  • 2 users per second spawn rate
  • Traffic will be produced from Digital Ocean‘s London data center. Kernl’s blog is hosted in Digital Ocean’s NYC3 (New York City) data center.

The Template

The template that we used for this blog post looks like this:

Kernl WordPress Load Test Template
Kernl WordPress Load Test Template

As you can see its pretty straight forward. Each route it accompanied by a multiplier. The multiplier simply tells the load test software how much traffic it should send to a given route relative to the other routes.  So in the example above the “/” route has a 3x multiplier. That means it will receive 3x as much traffic as any of the other 1x routes.

Baseline Load Test

Load Test Baseline - No Cache
Load Test Baseline – No Cache (Requests)

For the baseline load test we can see that within 30 seconds or so we hit our peak throughput. After that some portion of the LAMP stack starts to get overwhelmed and can only handle processing 2 requests per second. 2 requests per second is not great performance.

Load Test Baseline - No Cache
Load Test Baseline – No Cache (Failures)

You can see from the failure graph that at right around the time the requests graph slows down to 2 requests/s failures start to occur. If you start to see your failure graph climb like this you’ll know that you’ve reached your maximum capacity.

The last and most important part of the baseline load test is the request distribution graph. This graph tells us a lot about how users experience our site when it is under load.

Load Test Baseline - No Cache
Load Test Baseline – No Cache (Distribution)

This graph can be a little confusing to read at first, but isn’t bad at all once you get the hang of it. Read it like this:

  1. Pick a column. We’ll use the “90%” column.
  2. Now read the value of the column. The value is milliseconds.
  3. Combine your knowledge! 90% of requests were completed in 110000 milliseconds (110 seconds).

In the context of this load test this is a really bad user experience. If you look at the 50% column you can see that it’s hovering around 70000 milliseconds (70 seconds). This means that 50% of all traffic in the load test took at least 70 seconds to complete 🙁

Cache Enabled Load Test

Now that the baseline load test is complete we can enable a caching plugin and see how much better it makes our site perform! For this test we’re using the excellent WP Super Cache plugin. To make sure we were comparing performance in an “apples to apples” manner, we’re going to use the exact same Kernl load test configuration as we did for the first test.

Load Test With Cache Enabled (Requests)
Load Test With Cache Enabled (Requests)

WOW! WP Super Cache made a huge difference in the throughput that the Kernl blog could handle. We maxed out at around 20 requests/s sustained which is a 10x improvement over our baseline sustained requests. And what about request failures?

Load Test With Cache Enabled (Failures)
Load Test With Cache Enabled (Failures)

Once again, pretty fantastic results. Through the entire load test only a single request failed. To put that number in to perspective a bit: 12,174 requests were made during the 10 minute load test. Only one failed. Thats a failure rate of 0.008%!

High throughput and low failures are great, but what about the distribution? If the user has to wait for 20 seconds for the page to load the site may as well be down. Lets check out the graph.

Load Test With Cache Enabled (Distribution)
Load Test With Cache Enabled (Distribution)

As you can see the distribution with caching enabled tells a very different story than without caching enabled. All requests finished in under 3 seconds and 50% of requests finish in under 2 seconds. Not perfect performance but certainly usable by an end user. Its also worth remembering a few things:

  • This is a $5/month droplet on Digital Ocean.
  • No server tuning was done.
  • No WP Super Cache tuning was done.
  • 20 requests/s is about 1.7 million requests per day.

Next time you need to test performance improvements to your WordPress site, be sure to check out Kernl so that you can be confident in your site’s ability to handle traffic.

0 to 1 Million: Scaling my side project to 1 million requests a day

In the Beginning

In late 2014 I decided that I needed a side project.  There were some technologies that I wanted to learn, and in my experience building an actual project was the best way to do that.  As I sat on my couch trying to figure out what to build, I remembered an idea I had back when I was still a junior dev doing WordPress development.  The idea was that people building commercial plugins and themes should be able to use the automated update system that WordPress provides.  There were a few self-managed solutions out there for this, but I thought building a SaaS product would be a good way to learn some new tech.

Getting Started

My programming history in 2014 looked something like: LAMP (PHP, MySQL, Apache) -> Ruby on Rails -> Django.  In 2014 Node.js was becoming extremely popular and MongoDB had started to become mature.  Both of these technologies interested me, so I decided to use them on this new project.  As to not get too overwhelmed with learning things, I decided to use Angular for fronted since I was already familiar with it.

A few months after getting started, I finally deployed https://kernl.us for the world to see.  To give you an idea of the expectations I had for this project, I deployed it to a $5/month Digital Ocean droplet.  That means everything (Mongo, Nginx, Node) was on a single $5 machine.  For the next month or two, this sufficed since my traffic was very low.

The First Wave

In December of 2014 things started to get interesting with Kernl.  I had moved Kernl out of a closed alpha and into beta, which led to a rise in sign ups.  Traffic steadily started to climb, but not so high that it couldn’t be handled by a single $5 droplet.

Around December 5th I had a customer with a large install base start to use Kernl.  As you can see the graph scale completely changes.  Kernl went from ~2500 requests per day, to over 2000 requests per hour.  That seems like a lot (or it did at the time), but it was still well within what a single $5 droplet could handle.  After all, thats less that 1 request per second.

Scaling Up

Through the first 3 months of 2015 Kernl experienced steady growth.  I started charging for it in February, which helped fuel further growth as it made customers feel more comfortable trusting it with something as important as updates.  Starting in March, I noticed that resource consumption on my $5 droplet was getting a bit out of hand.  Wanting to keep costs low (both in my development time and actual money) I opted to scale Kernl vertically to a $20 per month droplet.  It had 2GB of RAM and 2 cores, which seemed like plenty.  I knew that this wasn’t a permanent solution, but it was the lowest friction one at the time.

During the ‘Scaling Up’ period that Kernl went through, I also ran into issues with Apache.  I started out by using Apache as a reverse proxy because I was familiar with it, but it started to fall over on me when I would occasionally receive requests rates of about 20/s.  Instead of tweaking Apache, I switched to using Nginx and have yet to run in to any issues with it.  I’m sure Apache can handle far more that 20 requests/s, but I simply don’t know enough about tweaking it’s settings to make that happen.

SCaling Out & Increasing Availability

For the rest of 2015 Kernl saw continued steady growth.  As Kernl grew and customers started to rely on it for more than just updates (Bitbucket / Github push-to-build), I knew that it was time to make things far more reliable and resilient than they currently were.  Over the course of 6 months, I made the following changes:

  • Moved file storage to AWS S3 – One thing that occasionally brought Kernl down or resulted in dropped connections was when a large customer would push an update out.  Lots of connections would stay open while the files were being download, which made it hard for other requests to get through without timing out.  Moving uploaded files to S3 was a no-brainer, as it makes scaling file downloads stupid-simple.
  • Moved Mongo to Compose.io – One thing I learned about Mongo was that managing a cluster is a huge pain in the ass.  I tried to run my own Mongo cluster for a month, but it was just too much work to do correctly.  In the end, paying Compose.io $18/month was the best choice.  They’re also awesome at what they do and I highly recommend them.
  • Moved Nginx to it’s own server – In the very beginning, Nginx lived on the same box as the Node application.  For better scaling (and separation of concerns) I moved Nginx to it’s own $5 droplet.  Eventually I would end up with 2 Nginx servers when I implemented a floating ip address.
  • Added more Node servers – With Nginx living on it’s own server, Mongo living on Compose.io, and files being served off of S3, I was able to finally scale out the Node side of things.  Kernl currently has 3 Node app servers, which handle requests rates of up to 170/second.

Final Thoughts

Over the past year I’ve wondered if taking the time to build things right the first time through would have been worth it.  I’ve come to the conclusion that optimizing for simplicity is probably what kept me interested in Kernl long enough to make it profitable.  I deal with enough complication in my day job, so having to deal with it in a “fun” side project feels like a great way to kill passion.