Kernl Goes Beta!

Screen Shot 2015-11-22 at 3.02.31 PM

In May of this year I launched the Kernl alpha with hopes that WordPress developers would be interested in it.  And interested they were!  6 months later Kernl has over 65 users from all around the globe and a host of new capabilities to make WordPress plugin and theme development easier.  For instance, since launch we’ve added:

  • Continuous Integration with BitBucket
  • Continuous Integration with GitHub
  • Purchase Code Validation

But new features aren’t all that make a service great.  For people to trust in something it must be reliable, and thats what the beta phase of Kernl is all about: improving reliability.  We’ve reached a point where we feel Kernl provides enough value to the WordPress community to allow us to take some time to refactor code and add a lot more tests.

What does this mean for you?  Not much.  If we do our job right you won’t notice anything.  The beta is still free and everyone will get big “heads up” before we start charging for the service.

Thank you to all of the alpha users who have made this possible.  Without you Kernl wouldn’t be where it is today.


Continuous Deployment of WordPress Plugins Using Kernl

One of the problems I’ve always had with WordPress plugin development is doing it in a modern build pipeline. I really wanted to be able to merge a branch into master, build the zip file, and push the update out to my clients. For the longest time I wasn’t able to do this, so I built Kernl to enable a more modern development approach to WordPress plugin development.

What is Continuous Deployment?

Continuous Deployment (or Continuous Delivery) is a software development strategy where you ship code frequently. Your pipeline is fully automated, so as soon as some event on your version control repository is triggered the deploy process starts. For me, that event is when I merge a pull request into master.

What is Kernl

Kernl started out as a way to provide private plugin and theme updates for WordPress, which grew out of my frustration at having to update clients manually every time a small bug was patched. Once I had the updates working manually, the next step was automating everything. This is where “push to build” came in.

How Push To Build Works

Getting Started with Kernl

Getting push-to-build updates on your plugin or theme is pretty easy to set up with Kernl.

  1. Go to and sign up. After you’ve logged in, click “Continuous Integration”.
  2. Now connect BitBucket.  This will authorize Kernl to access your BitBucket account so that it can enable push-to-build functionality.
  3. The next step is adding a WebHook to BitBucket.  This tells BitBucket to send a message to Kernl after every code push.  To do this, go to your repository settings, scroll down to “Integrations” and click “WebHooks”.  Set the new Webhook to point at
  4. In order for Kernl to know when to build a new version of your plugin, it looks for a file named kernl.version in the root directory of your repository.  Go ahead and add this file now and commit it.  The kernl.version should contain a semantic version that looks like “1.0.1”.
  5. Next, you need to add a plugin.  In Kernl, click “Plugins” on the left and then click “Add Plugin” on the upper-right.  Fill out the name, slug, and description fields, then scroll to the bottom.Kernl Select Repository and Branch You should now be able to select from a list of repositories from your BitBucket account.  You can also choose what branch Kernl should make its builds from.  The default is master, but it can be anything that you want.  Select a repository now and press “Save”.
  6. Next, you need to add the first version to Kernl manually.  Click the “versions” button for the plugin you just created, and then click “Add Version”.  The most important part of the process here is to make sure that the version number in Kernl, kernl.version, and your plugin match.  If you put 1.0.0 in the kernl.version file, make sure that it matches in your plugin’s main file, as well as in Kernl when you upload the first version.  If this still isn’t clear, check out the example plugin on BitBucket.  The kernl.version should contain one line, and on that line will be your version.  Once you have the versions figured out, zip up the plugin as if you were going to distribute it and upload it to Kernl.
  7. Thats it!  Distribute this copy of the plugin to your clients and they’ll receive private updates whenever you upload a new copy or push a new version to your BitBucket repository.

Pushing a New Version

With all the boilerplate setup complete, getting a new update out to your clients is super easy.  Follow the steps below and you’ll be good to go.

  1. Make code changes.  Whatever change you want to push out, go ahead and make it.
  2. Update your plugin’s version.  This is typically in the comment document block in your functions.php file.
  3. Update the kernl.version file.  This should match your functions.php version.
  4. Commit
  5. Push to the branch you specified in your plugin setup on Kernl.  If you didn’t specify a branch, that means you’ll need to push to master.
  6. Done.  If all went well, you’ll receive an email from Kernl that lets you know about the new version that was pushed.  You can also verify that the plugin was built by visiting Kernl and looking in the version list for your plugin.

Plugin build email

If you’ve ever wanted to modernize your WordPress development pipeline, I highly suggest you check out Kernl.  Automatic updates triggered by changes in your repository will save you tons of time and get bug fixes and updates out to your clients faster.

Using the Django Per-Site Cache with the Nginx HTTP Memcached Module

For a long time I thought that the most interesting problems in my field were in scalability. Some people may be more interested in scaling, and others might be more into slick interfaces and fast animations. But for me, scalability has continued to be my passion. For awhile though, it was a unicorn. That unattainable thing that I wanted to work on but couldn’t find anywhere to do it at. That is, until I started work at Future US.

Future is a media company. Originally they started in old media focusing heavily on gaming and tech magazines. Eventually the internet became prominent in everyday life, so more of their old media properties made the transition to the web. The one that really matters to me though is PC Gamer. I’ve been a huge fan of PC Gamer since I was about 7 years old. I still have fond memories getting demo disks in the mail with my subscription.

When I was hired at Future it was to help facilitate the move of PC Gamer from its existing platform (WordPress) to Django. Future had experienced success moving other properties to Django, so it made sense to do it with PC Gamer. When it eventually came time to implement our caching layer, we thought about a lot of different ways that it could be done. Varnish came up as an option, but we decided against it since nobody on the team had experience configuring it (and people elsewhere in the organization had experienced issues with it). Eventually we settled on having Nginx serve pages directly from Memcache. For us, this method works great because PC Gamer doesn’t have a lot of interaction (its almost completely consumption from the user end). Anything that does require back-and-forth between the server is handled via javascript, which makes full page caching super easy to do.

The high level architecture for pc gamer.
The high level architecture for pc gamer.

So how does it all work? The image above describes PC Gamer’s server architecture from a high level. Its pretty basic and works quite well for us. We end up having two types of requests: cache hits & cache misses. The flow for a cache hit is: request -> load balancer -> nginx -> memcache -> your browser. The flow for a cache miss is: request -> load balancer -> nginx -> application server (django) -> (store page in cache) -> your browser.

Since we’re basically running a static site, deciding what content to cache is easy: EVERYTHING!

Cache all the things!
Cache all the things!

Luckily for us Django already has a nice way of doing this: The per-site cache. But it is not without its issues. First of all, the cache keys it creates are insane. We needed something a little simpler for our setup so Nginx could build the cache key of the current request on the fly.

How It Works

The meat and potatoes of overriding Django’s per-site cache key comes in the `_generate_cache_key` function.

def _generate_cache_key(request, method, headerlist, key_prefix):
    if key_prefix is None:
        key_prefix = settings.CACHE_MIDDLEWARE_KEY_PREFIX
    cache_key = key_prefix + get_absolute_uri(request)
    return hashlib.md5(cache_key).hexdigest()

To make things easier for Nginx to understand we just take the url and md5 it. Simple!

On the Nginx side of things, the setup is equally simple.

        set            $combined_string "$host$request_uri";
        set_by_lua     $memcached_key "return ngx.md5(ngx.arg[1])" $combined_string;
        # 404 for cache miss
        # 502 for memcached down
        error_page     404 502 504 = @fallback;
        memcached_pass {{ cache.private_ip }}:11211;

All this setup does is take the MD5 of the host + request URI and then check to see if that cache key exists in memcache. If it does then we serve the content at that cache key, if it doesn’t we fall back to our Django application servers and they generate the page.

Thats it. Seriously. It’s simple, extremely fast, and works for us. Your mileage may vary, but if you have relatively simple caching requirements I highly suggest looking into this method before looking at something like Varnish. It could help you remove quite a bit of complexity from your setup.

Getting around memory limitations with Django and multi-processing

I’ve spent the last few weeks writing a data migration for a large high traffic website and have had a lot of fun trying to squeeze every bit of processing power out of my machine. While playing around locally I can cluster the migration so it executes on fractions of the queryset. For instance.

./ run_my_migration --cluster=1/10
./ run_my_migration --cluster=2/10
./ run_my_migration --cluster=3/10
./ run_my_migration --cluster=4/10

All this does is take the queryset that is generated in the migration and chop it up into tenths. No big deal. The part that is a big deal is that the queryset contains 30,000 rows. In itself that isn’t a bad thing, but there are a lot of memory and cpu heavy operations that happen on each row. I was finding that when I tried to run the migration on our Rackspace Cloud servers the machine would exhaust its memory and terminate my processes. This was a bit frustrating because presumably the operating system should be able to make use of the swap and just deal with it. I tried to make the clusters smaller, but was still running into issues. Even more frustrating was that this happened at irregular intervals. Sometimes it took 20 minutes and sometimes it took 4 hours.

Threading & Multi-processing

My solution to the problem utilized the clustering ability I already had built into the program. If I could break the migration down into 10,000 small migrations, then I should be able to get around any memory limitations. My plan was as follows:

  1. Break down the migration into 10,000 clusters of roughly 3 rows a piece.
  2. Execute 3 clustered migrations concurrently.
  3. Start the next migration after one has finished.
  4. Log the state of the migration so we know where to start if things go poorly.

One of the issues with doing concurrency work with Python is the global interpreter lock (GIL). It makes writing code a lot easier, but doesn’t allow Python to spawn proper threads. However, its easy to skirt around if you just spawn new processes like I did.

Borrowing some thread pooling code here, I was able to get pretty sweet script running in no time at all.

import sys
import os.path
from util import ThreadPool
def launch_import(cluster_start, cluster_size, python_path, command_path):
    import subprocess
    command = python_path
    command += " " + command_path
    command += "{0}/{1}".format(cluster_start, cluster_size)
    # Open completed list.
    completed = []
    with open("clusterlog.txt") as f:
        completed = f.readlines()
    # Check to see if we should be running this command.
    if command+"\n" in completed:
        print " ==> Skipping {0}".format(command)
        print " ==> Executing {0}".format(command)
        proc = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        output = # Capture the output, don't print it.
        # Log completed cluster
        logfile = open('clusterlog.txt', 'a+')
if __name__ == '__main__':
    # Simple command line args checking
        lowmem, clusters, pool_size, python_path, command_path = sys.argv
        print "Usage: python <clusters> <pool_size> <path/to/python> <path/to/>"
    # Initiate log file.
    if not os.path.isfile("clusterlog.txt"):
        logfile = open('clusterlog.txt', 'w+')
    # Build in some extra space.
    print "\n\n"
    # Initiate the thread pool
    pool = ThreadPool(int(pool_size))
    # Start adding tasks
    for i in range(1, int(clusters)):
        pool.add_task(launch_import, i, clusters, python_path, command_path)

Utilizing the code above, I can now run a command like:

python 10000 3 /srv/www/project/bin/python "/srv/www/project/src/ import --cluster=" &

Which breaks the queryset up into 10,000 parts and runs the import 3 sets at a time. This has done a great job of keeping the memory footprint of the import low, while still getting some concurrency so it doesn’t take forever.

Two Frameworks

The past couple of months have found me working diligently on work stuff, but also consistently dropping an hour a day on my current side project.  It just so happens that the side project and my actual work share the same language (Python) and framework (Django).  This has been nice because it’s given my brain a moment to relax with regards to learning new material,  but at the same time I feel stagnate.

Django is my framework of choice.   I know it inside and out, can bend it to my will, and work extremely fast in it.  However I’m not blind to the fact that the popularity of the old monolithic frameworks(Rails, Django, Cake, etc) for new projects is waning.  People these days are starting new projects with a service oriented architecture in mind.  They’re using Node.js with Express on the backend for an API, and then Angular on the front end to create a nice single page app.  I’ve done this sort of development before extensively, but I’m out of practice.  So I’ve come to a fork in the road.

Over the years I’ve come to realize that I can only hold two frameworks in my mind at one time. It doesn’t matter if they are written in different languages or not (those seem to stick with me easier for some reason), but two frameworks is the max I can handle.  So my choices are as follows:  1) Learn Android, 2) Get good at Node.

I’ve made one Android app before when I worked at a marketing firm.  It was fun.  I enjoyed not doing web stuff for once.  I found Java overly verbose,  but as long as you stayed within the “modern Java” lines it was fine.  As for Node, I already know it but I’m just out of practice.  I feel like it would be valuable to become an expert in but sometimes I feel burnt out on the web.

After a lot of deliberation, I think I’m going to move forward with Android development by making an Android app for RedemFit.  It’ll give me a chance to break out of Web development for awhile and hopefully will become something I enjoy doing as much as web.

Learning Clojure: Part 2

In part 1 of my “Learning Clojure” series, I created a simple program to calculate salary based on how many years someone worked. For this post, I’m going to be attempting something a bit more complicated.

Project Gutenberg

One of my favorite websites in the entire world is Project Gutenberg(PG). PG is an archive of books that have passed into the public domain, which makes is a great resource for text mining data. I use it almost every time I need some words to parse, and you should too! So why does this matter right now? I’m glad that you asked.


Given how simple the last program was, I decided that I should probably take this one up a notch. Its going to involve fetching a file, writing it to disk, reading the file, and processing command line args. In order, here’s what the program needs to do:

  1. Validate command line args – We’re going to accept two arguments. The word that should be counted and a url that points to a .txt file at Project Gutenberg my web host (Project Gutenberg doesn’t like crawlers apparently) for processing.
  2. Download the file – It could be large and might fail. We’ll need to be careful here.
  3. Split the file into a vector – Split the file up on ” ” and load it into a vector.
  4. Print – Print to standard output how many words were found. If none, make it known.

The Program

The full source code can be found at It looks pretty simple, but I did expand my Clojure knowledge quite a bit with this one. Some of the things I did:

  • Used a 3rd party library
  • Messed around with vectors (split word data) and sequences (args).
  • Wrote to a file.
  • Refactored constantly

I do want to highlight one bit of code that I wrote, because its pretty straight forward but does a lot of stuff.

(defn process
    [url word]
    (write-file (get-source-file url))
    (log "Info" "Processing File...")
    (let [data (cljstr/split (slurp filename) #"\s+")]
        (log "Result" (str (count (filter #{word} data)) " occurrences of '" word "'"))))

My next program needs to be more complicated from a data perspective, so that I’m forced to use things like “map”, “reduce”, and other functional elements on data sets.

Learning Clojure: Part 1

Clojure is a functional programming language based on Lisp and written to run on top of the JVM. I’ve tried learning it in the past, but have failed mostly due to biting off more than I could chew. But not this time! I’m taking my time, reading lots of code, and doing 1st year computer science assignments with it. I figure this worked well when I first learned how to program, so it will probably work well now.

The Return to Trivial Programs

After spending the past 6 years neck deep in non-trivial professional programming, I’m returning to trivial toy programs to learn Clojure. My first task is to write a program that takes user input from the terminal and calculates their salary at a year which they input. More specifically:

  • Starting salary is $1000
  • Salary doubles every year
  • Validate input to make sure it is a number.
  • Write history to file called: salary_history.txt
  • In format…. [years_working]:$[salary]

All in all its pretty straight forward. I currently could write this program in a handful of different languages (Python, PHP, Java, Javascript [Node], Ruby), but am struggling with one bit of the Clojure implementation.

(ns salary.core
(defn get-integer
    "Returns a string in integer form, else false."
        (#(Integer/parseInt %) input)
        (catch Exception e false)))
;; Incomplete.  Will eventually write to a file.
(defn output
    "Takes the console input and error message and outputs them to file and console."
    [console-input message]
    (println (str console-input ": " message)))
;; ????? WTF DO I DO HERE
(defn calculate-salary
(defn -main
  [& args]
  (println "How many years do you want to work?")
  (let [user-input (read-line)]
    (let [years (get-integer user-input)]
        (if years
            (calculate-salary (- years 1) 1000)
            (output user-input "This is NOT an integer.")))))

The Python implementation of calculate salary would look something like this:

def calculate_salary(years):
    salary = 1000
    for i in range(years-1):
        salary = salary * 2
    return salary

But in Clojure things are bit more complicated. In Clojure values are immutable. I can’t just loop over the years and keep doubling the salary while storing it in the same variable. I need to use recursion. Or reduce. Or map. Hell, I don’t know. I need to use something functional, lest I want the Clojure experts to laugh at me. I need something that will call a function that doubles whatever value comes into it, then returns. Then I need to call said function up to N times (where N is the number of years that the person enters).

Any ideas?

With the help of Ryan (below), I came up with:

(defn calculate-salary
    [years salary]
    (if (= years 0)
        (calculate-salary (- years 1) (* salary 2))))

Shuttering Side Projects

Over the past few years I’ve slowly accumulated some big side projects. They weren’t done for clients, but just for myself. At some point maintenance of these side projects isn’t fun anymore and hinders the creative juices. I have other things I want to work on, but having these other zombie side projects feels too much like an albatross around my neck.

After much deliberation, I’ve decided to shut down two of my large side projects: BookCheaply and Smooth Bulletin. I really believe both of these projects could do someone some good, but they were both learning projects for me and I don’t see them moving forward anymore. Effective immediately I’m disabling their Apache configs, backing up their DBs, TARing it all together, and putting it somewhere safe. I’ll keep access to the Git repos, but eventually I’ll clone a copy of those out too and archive them. If I don’t get them out of the way completely, I feel like I’ll want to work on them too much.

Shuttering these projects marks a transition for me, where I move from using Python and Django on side projects to Node.js, Express, and Angular. While apprehensive about abandoning my go-to stack for side projects, I’m excited to learn the nooks and crannies of Node (and I still use Python/Django for my day job anyways).

Here’s to the future!

Smooth Bulletin