What’s New With Kernl – July 2016

With summer in full-swing here in the United States, development on Kernl has been slowing down to accommodate much busier schedules than during the rest of the year.  This doesn’t mean we haven’t been busy though.

Features

Infrastructure, Bugs, and Miscellaneous

  • When the server throws a 500 error, it renders the correct template.  Prior to this fix Kernl would render a 404 page, which made it very hard to tell when you encountered an actual problem.
  • We now have a robots.txt file!
  • Kernl’s Mongo infrastructure has been moved to Compose.io.  Having a professional DBA manage Kernl’s database helps me sleep easier at night and provides customers with a more performant and stable backend.
  • The landing page for Kernl was taking over 1 second to load for many people.  Caching was added, and we now have the number down to under 100ms on average.

What’s next?

July is a busy month outside of Kernl, so I don’t expect much to get done.  The current plan is to take it easy in July and then come back with renewed vigor in August.

What’s New With Kernl – June 2016

The past month of work on Kernl has seen a lot of great infrastructure improvements as well as a few customer facing features that I’m pretty excited about.

Customer Facing Features

  • Direct Uploads to AWS S3 – When Kernl was originally created all file uploads were stored directly on Kernl’s servers.  As we grew, this became an unsustainable solution, so the process changed to just use Kernl’s servers as temporary holding space before putting the file on S3.  This month we made this process even better by having files upload directly to S3. For you, this means faster uploads and less time waiting to get updates out to your customers.
  • Expiring Purchase Codes – You can now create purchase codes that expire on a specific date.  This allows you to sell your updates over time, instead of having to give them away for free for the life of the plugin or theme.
  • Max Download Purchase Code Flag – You can configure a purchase code to only allow a certain number of update downloads.  This will help resolve any issues with customers sharing purchase codes amongst themselves or across multiple installations.
  • JS Cache Busting – As customer facing features get rolled out Kernl automatically busts the client-side javascript cache for https://kernl.us.  This should help prevent confusion and remove the need for any sort of “hard refresh” when new features are released.
  • plugin_update_check.php Bug Fixes – There was an edge-case bug where some code in this file would collide with an old version of WP-Updates plugin update check file.  This happens when a customer has your plugin and also has a really old version of somebody else’s plugin installed.  This update takes care of that collision permanently.
  • Client-side JS Errors – A few minor miscellaneous bug fixes were performed on the front-end of Kernl.

Infrastructure

  • MongoDB – The month started off with Kernl’s database moving to it’s own server.  This was a temporary step that aimed to make the move to a highly available setup easier.
  • Mongo Replica Sets – After the first MongoDB move, the next step was to make the setup highly available.  Kernl now has 3 Mongo databases (1 master + 2 replicas).  In the event that the master database goes down, Kernl automatically fails over to one of the replicas with no downtime.
  • Memcache – Memcache was moved to it’s own server to make it easier to increase the number of items that Kernl caches over time.  This piece of the setup doesn’t need to be highly available.  If for some reason it goes down, Kernl will continue to operate fine.
  • Nginx – Nginx is used by Kernl both as a front-door to the application as well as load balancer between the app servers.  This was moved to it’s own server which allows it scale up when we need additional capacity.  In the future (hopefully soon), we’ll use a floating IP address to give this portion of the infrastructure the ability to fail over to a backup Nginx server.
  • Multiple App Servers – Kernl’s app servers can now scale horizontally.  We’re currently running 3 app servers which Nginx load balances traffic to.  This setup allows us to add app servers easily as our traffic grows.
  • Automated Deployment – Kernl can now be deployed with a single command.
A rough drawing of how Kernl is architected now.
A rough drawing of how Kernl is architected now.

What’s Next?

  • Caching the repository list that you see when you set up CI builds.
  • Get a rich text editor set up on the installation and description fields.
  • Theme change logs.
  • Wrap up infrastructure work.
  • Sign in / Sign up with BitBucket & GitHub.
  • Slack Integration.
  • HipChat Integration.

What’s New With Kernl – May 2016

Since last month I’ve been working hard on getting a few features out the door.  They are:

  • All new files are now hosted on S3 – Part of the work to make Kernl highly available is to get files hosted elsewhere.  When you upload a new version to Kernl, or push a change via webhooks, the deliverable now lives on S3.  Existing versions were a bit complicated, so thats going to be a task for May.
  • SSL and domain renewed – The SSL cert and domain for Kernl were renewed this month.  This should have been a completely transparent change.
  • Editable version fields – For plugins, you can now edit a few fields on a version once it has been created.  This was a pre-requisite for getting changelogs implemented nicely.
  • Plugin changelog API – You can now programmatically add, get, and remove changelog entries from you plugins.   Documentation on this feature is available at https://kernl.us/documentation/api#changelog-api and full examples are available at https://github.com/vital101/Kernl-API-Examples
  • Plugin changelog tab – The changelog tab in the plugin detail update window is now populated automatically and looks like the wordpress.org version.

So whats on backlog for May?

  • Moving the legacy version files to S3.
  • Moving the database to its own server + adding a replica.
  • Moving Memcache to its own server.
  • Analytics

What’s New With Kernl – April 2016

Over the past 4 months we’ve been making a lot of progress on many different fronts with Kernl.   After 4 new features, 5 feature updates, 3 infrastructure changes, and numerous bug fixes, Kernl is better than ever.  Check out the detailed info below, and leave a comment or reach out if you have questions.

New Features
  • Purchase Code API – A long requested feature has been the ability to add and remove purchase codes from Kernl via an API.  This has always been supported, but there wasn’t any documentation or examples of how to do it.  We now have detailed documentation for the Purchase Code API available at https://kernl.us/documentation/api.
  • WebHook Build Log – For customers using BitBucket and GitHub integration, it could be frustrating to figure why your build failed.  To help with that, we added a WebHook Build Log on the continuous integration page.  It can be found at https://kernl.us/app/#/dashboard/continuous-integration  Webhook build log
  • Envato Purchase Code Validation – Another often requested feature was the ability to validate against Envato purchase codes.  You can read about how to use and enable this functionality at https://kernl.us/documentation#envato-purchase-code-validation.
  • Caching – Since the beginning of the year, Kernl’s traffic has more than doubled and isn’t showing any signs of slowing down.  Kernl Traffic  To keep response times and server load down, update check results are now cached for 10 seconds.  What this means for you is that after you upload a new version or make any changes in the ‘edit’ modal, Kernl will take a maximum of 10 seconds to reflect those changes on the update check API endpoints.
Feature Updates
  • PHP Update Check File Timeout – In the plugin_update_check.php and theme_update_check.php files that you include in your plugin and themes, the timeout value for fetching data from Kernl is set really high by default (10 seconds).  If you want the update to fail fast in the event that Kernl is down, you can now configure this value using the remoteGetTimeout property.  Depending on how close your client’s server are to Kernl and how fast Kernl responds, you could likely lower this value significantly.  You should exercise caution using this though.  The documentation has been updated here and here to reflect the change.  You will also need to publish a new build with the updated PHP files.
  • Email Notification Settings – You can now enable/disable email notifications from Kernl.  There are two types: General and Build.  General email notifications are all emails Kernl sends to you that aren’t build emails.  Build notifications are the emails you receive when a webhook event from BitBucket or GitHub triggers a build.  You can modify these settings in your profile.
  • Failed Build Email Notifications – You will now receive an email notification when your BitBucket/GitHub webhook push fails to build in an unexpected way.  For instance if the version number in your kernl.version file doesn’t follow semantic versioning, the build would fail and send you an email notification.
  • Indeterminate Spinner for Version Uploads – Depending on the size of your deliverable and the speed of your connection, the Kernl interface didn’t give a lot of great feedback when you were uploading a file.  An indeterminate spinner now shows while your file is being uploaded.  Copy has also been updated to reflect that this action can take a little while.
  • Filterable Select on Repository Select Drop Downs – When trying to select a repository for continuous integration, it could be a real pain if you had lots of repositories.  A filterable select field is now in place that allows you to search large lists easily.
Infrastructure Changes
  • Capacity Increases – In mid March we had about 4 minutes of downtime in the wee hours of the morning while we upgraded our server capacity.  Current capacity should hold until we double or triple our traffic levels.
  • Mandrill to SendGrid Migration – Since the beginning of Kernl we used Mandrill as our transactional email provider.  As I’m sure some of you know, Mandrill sort of screwed it’s customers by making their free low-volume plan cost $30 per month.  Since this isn’t really something we wanted (or needed) to pay for, we migrated to SendGrid.
  • Apache to Nginx Migration – As our traffic numbers started to rise, Apache started to fall over on us.  A migration to Nginx as our reverse-proxy was high in the backlog, so instead of tweaking Apache we just did a quick migration to Nginx.  With the default configuration, load levels dropped from 1 – 1.5 to 0.3 – 07 with no configuration tweaking.  *high five nginx*
Whats Next?
  • Multi-tier Server Architecture – Kernl started out as a fun side project.  As a side project, keeping things simple as long as possible is almost always the right choice.  Now that Kernl has a growing number of paying customers, and those customers have lots of paying customers, it’s time for Kernl’s server architecture to grow as well.  Over the next month or two, we’ll be teasing apart Kernl’s current infrastructure to support horizontal scaling and automatic failover in case any node in the stack goes down.
  • Better Change Log Support – The current change log support on Kernl is… meh, at best.  A big goal for the next month or two is try and get better change log support out the door.
  • Analytics – Having some insights into your clients has always been a goal of Kernl.  Doing this efficiently and cost effectively is tough, but we’re 60% there already.  Infrastructure work has a higher priority than this right now, but getting this out the door in the next few months is a priority.
  • Bug Fixes – As always, bug fixes come first.
Other News

When you log in to Kernl, near the top you see a few boxes with general stats in them.  The ‘update pings’ stat is going to be off for awhile until the new analytics work is complete.  This is due to the naive way that we currently calculate update pings not being compatible with how we cache.  The ‘update downloads’ stat is still accurate since we do not cache the download endpoints.

Kernl Goes Beta!

Screen Shot 2015-11-22 at 3.02.31 PM

In May of this year I launched the Kernl alpha with hopes that WordPress developers would be interested in it.  And interested they were!  6 months later Kernl has over 65 users from all around the globe and a host of new capabilities to make WordPress plugin and theme development easier.  For instance, since launch we’ve added:

  • Continuous Integration with BitBucket
  • Continuous Integration with GitHub
  • Purchase Code Validation

But new features aren’t all that make a service great.  For people to trust in something it must be reliable, and thats what the beta phase of Kernl is all about: improving reliability.  We’ve reached a point where we feel Kernl provides enough value to the WordPress community to allow us to take some time to refactor code and add a lot more tests.

What does this mean for you?  Not much.  If we do our job right you won’t notice anything.  The beta is still free and everyone will get big “heads up” before we start charging for the service.

Thank you to all of the alpha users who have made this possible.  Without you Kernl wouldn’t be where it is today.

 

Continuous Deployment of WordPress Plugins Using Kernl

One of the problems I’ve always had with WordPress plugin development is doing it in a modern build pipeline. I really wanted to be able to merge a branch into master, build the zip file, and push the update out to my clients. For the longest time I wasn’t able to do this, so I built Kernl to enable a more modern development approach to WordPress plugin development.

What is Continuous Deployment?

Continuous Deployment (or Continuous Delivery) is a software development strategy where you ship code frequently. Your pipeline is fully automated, so as soon as some event on your version control repository is triggered the deploy process starts. For me, that event is when I merge a pull request into master.

What is Kernl

Kernl started out as a way to provide private plugin and theme updates for WordPress, which grew out of my frustration at having to update clients manually every time a small bug was patched. Once I had the updates working manually, the next step was automating everything. This is where “push to build” came in.

How Push To Build Works

Getting Started with Kernl

Getting push-to-build updates on your plugin or theme is pretty easy to set up with Kernl.

  1. Go to https://kernl.us and sign up. After you’ve logged in, click “Continuous Integration”.
  2. Now connect BitBucket.  This will authorize Kernl to access your BitBucket account so that it can enable push-to-build functionality.
  3. The next step is adding a WebHook to BitBucket.  This tells BitBucket to send a message to Kernl after every code push.  To do this, go to your repository settings, scroll down to “Integrations” and click “WebHooks”.  Set the new Webhook to point at https://kernl.us/api/v1/repositories/bitbucket/webhook.
  4. In order for Kernl to know when to build a new version of your plugin, it looks for a file named kernl.version in the root directory of your repository.  Go ahead and add this file now and commit it.  The kernl.version should contain a semantic version that looks like “1.0.1”.
  5. Next, you need to add a plugin.  In Kernl, click “Plugins” on the left and then click “Add Plugin” on the upper-right.  Fill out the name, slug, and description fields, then scroll to the bottom.Kernl Select Repository and Branch You should now be able to select from a list of repositories from your BitBucket account.  You can also choose what branch Kernl should make its builds from.  The default is master, but it can be anything that you want.  Select a repository now and press “Save”.
  6. Next, you need to add the first version to Kernl manually.  Click the “versions” button for the plugin you just created, and then click “Add Version”.  The most important part of the process here is to make sure that the version number in Kernl, kernl.version, and your plugin match.  If you put 1.0.0 in the kernl.version file, make sure that it matches in your plugin’s main file, as well as in Kernl when you upload the first version.  If this still isn’t clear, check out the example plugin on BitBucket.  The kernl.version should contain one line, and on that line will be your version.  Once you have the versions figured out, zip up the plugin as if you were going to distribute it and upload it to Kernl.
  7. Thats it!  Distribute this copy of the plugin to your clients and they’ll receive private updates whenever you upload a new copy or push a new version to your BitBucket repository.

Pushing a New Version

With all the boilerplate setup complete, getting a new update out to your clients is super easy.  Follow the steps below and you’ll be good to go.

  1. Make code changes.  Whatever change you want to push out, go ahead and make it.
  2. Update your plugin’s version.  This is typically in the comment document block in your functions.php file.
  3. Update the kernl.version file.  This should match your functions.php version.
  4. Commit
  5. Push to the branch you specified in your plugin setup on Kernl.  If you didn’t specify a branch, that means you’ll need to push to master.
  6. Done.  If all went well, you’ll receive an email from Kernl that lets you know about the new version that was pushed.  You can also verify that the plugin was built by visiting Kernl and looking in the version list for your plugin.

Plugin build email

If you’ve ever wanted to modernize your WordPress development pipeline, I highly suggest you check out Kernl.  Automatic updates triggered by changes in your repository will save you tons of time and get bug fixes and updates out to your clients faster.

Using the Django Per-Site Cache with the Nginx HTTP Memcached Module

For a long time I thought that the most interesting problems in my field were in scalability. Some people may be more interested in scaling, and others might be more into slick interfaces and fast animations. But for me, scalability has continued to be my passion. For awhile though, it was a unicorn. That unattainable thing that I wanted to work on but couldn’t find anywhere to do it at. That is, until I started work at Future US.

Future is a media company. Originally they started in old media focusing heavily on gaming and tech magazines. Eventually the internet became prominent in everyday life, so more of their old media properties made the transition to the web. The one that really matters to me though is PC Gamer. I’ve been a huge fan of PC Gamer since I was about 7 years old. I still have fond memories getting demo disks in the mail with my subscription.

When I was hired at Future it was to help facilitate the move of PC Gamer from its existing platform (WordPress) to Django. Future had experienced success moving other properties to Django, so it made sense to do it with PC Gamer. When it eventually came time to implement our caching layer, we thought about a lot of different ways that it could be done. Varnish came up as an option, but we decided against it since nobody on the team had experience configuring it (and people elsewhere in the organization had experienced issues with it). Eventually we settled on having Nginx serve pages directly from Memcache. For us, this method works great because PC Gamer doesn’t have a lot of interaction (its almost completely consumption from the user end). Anything that does require back-and-forth between the server is handled via javascript, which makes full page caching super easy to do.

The high level architecture for pc gamer.
The high level architecture for pc gamer.

So how does it all work? The image above describes PC Gamer’s server architecture from a high level. Its pretty basic and works quite well for us. We end up having two types of requests: cache hits & cache misses. The flow for a cache hit is: request -> load balancer -> nginx -> memcache -> your browser. The flow for a cache miss is: request -> load balancer -> nginx -> application server (django) -> (store page in cache) -> your browser.

Since we’re basically running a static site, deciding what content to cache is easy: EVERYTHING!

Cache all the things!
Cache all the things!

Luckily for us Django already has a nice way of doing this: The per-site cache. But it is not without its issues. First of all, the cache keys it creates are insane. We needed something a little simpler for our setup so Nginx could build the cache key of the current request on the fly.

How It Works

The meat and potatoes of overriding Django’s per-site cache key comes in the `_generate_cache_key` function.

def _generate_cache_key(request, method, headerlist, key_prefix):
    if key_prefix is None:
        key_prefix = settings.CACHE_MIDDLEWARE_KEY_PREFIX
    cache_key = key_prefix + get_absolute_uri(request)
    return hashlib.md5(cache_key).hexdigest()

To make things easier for Nginx to understand we just take the url and md5 it. Simple!

On the Nginx side of things, the setup is equally simple.

        set            $combined_string "$host$request_uri";
        set_by_lua     $memcached_key "return ngx.md5(ngx.arg[1])" $combined_string;
 
        # 404 for cache miss
        # 502 for memcached down
        error_page     404 502 504 = @fallback;
 
        memcached_pass {{ cache.private_ip }}:11211;

All this setup does is take the MD5 of the host + request URI and then check to see if that cache key exists in memcache. If it does then we serve the content at that cache key, if it doesn’t we fall back to our Django application servers and they generate the page.

Thats it. Seriously. It’s simple, extremely fast, and works for us. Your mileage may vary, but if you have relatively simple caching requirements I highly suggest looking into this method before looking at something like Varnish. It could help you remove quite a bit of complexity from your setup.

Getting around memory limitations with Django and multi-processing

I’ve spent the last few weeks writing a data migration for a large high traffic website and have had a lot of fun trying to squeeze every bit of processing power out of my machine. While playing around locally I can cluster the migration so it executes on fractions of the queryset. For instance.

./manage.py run_my_migration --cluster=1/10
./manage.py run_my_migration --cluster=2/10
./manage.py run_my_migration --cluster=3/10
./manage.py run_my_migration --cluster=4/10

All this does is take the queryset that is generated in the migration and chop it up into tenths. No big deal. The part that is a big deal is that the queryset contains 30,000 rows. In itself that isn’t a bad thing, but there are a lot of memory and cpu heavy operations that happen on each row. I was finding that when I tried to run the migration on our Rackspace Cloud servers the machine would exhaust its memory and terminate my processes. This was a bit frustrating because presumably the operating system should be able to make use of the swap and just deal with it. I tried to make the clusters smaller, but was still running into issues. Even more frustrating was that this happened at irregular intervals. Sometimes it took 20 minutes and sometimes it took 4 hours.

Threading & Multi-processing

My solution to the problem utilized the clustering ability I already had built into the program. If I could break the migration down into 10,000 small migrations, then I should be able to get around any memory limitations. My plan was as follows:

  1. Break down the migration into 10,000 clusters of roughly 3 rows a piece.
  2. Execute 3 clustered migrations concurrently.
  3. Start the next migration after one has finished.
  4. Log the state of the migration so we know where to start if things go poorly.

One of the issues with doing concurrency work with Python is the global interpreter lock (GIL). It makes writing code a lot easier, but doesn’t allow Python to spawn proper threads. However, its easy to skirt around if you just spawn new processes like I did.

Borrowing some thread pooling code here, I was able to get pretty sweet script running in no time at all.

import sys
import os.path
 
from util import ThreadPool
 
def launch_import(cluster_start, cluster_size, python_path, command_path):
    import subprocess
 
    command = python_path
    command += " " + command_path
    command += "{0}/{1}".format(cluster_start, cluster_size)
 
    # Open completed list.
    completed = []
    with open("clusterlog.txt") as f:
        completed = f.readlines()
 
    # Check to see if we should be running this command.
    if command+"\n" in completed:
        print "lowmem.py ==> Skipping {0}".format(command)
    else:
        print "lowmem.py ==> Executing {0}".format(command)
        proc = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        output = proc.stdout.read() # Capture the output, don't print it.
 
        # Log completed cluster
        logfile = open('clusterlog.txt', 'a+')
        logfile.write("{0}\n".format(command))
        logfile.close()
 
 
if __name__ == '__main__':
 
    # Simple command line args checking
    try:
        lowmem, clusters, pool_size, python_path, command_path = sys.argv
    except:
        print "Usage: python lowmem.py <clusters> <pool_size> <path/to/python> <path/to/manage.py>"
        sys.exit(1)
 
    # Initiate log file.
    if not os.path.isfile("clusterlog.txt"):
        logfile = open('clusterlog.txt', 'w+')
        logfile.close()
 
    # Build in some extra space.
    print "\n\n"
 
    # Initiate the thread pool
    pool = ThreadPool(int(pool_size))
 
    # Start adding tasks
    for i in range(1, int(clusters)):
        pool.add_task(launch_import, i, clusters, python_path, command_path)
 
    pool.wait_completion()

Utilizing the code above, I can now run a command like:

python lowmem.py 10000 3 /srv/www/project/bin/python "/srv/www/project/src/manage.py import --cluster=" &

Which breaks the queryset up into 10,000 parts and runs the import 3 sets at a time. This has done a great job of keeping the memory footprint of the import low, while still getting some concurrency so it doesn’t take forever.