Github Auto Pull on Post Receive

There are many different approaches on how you handle middle-scale to large-scale web applications development, source control, deployment and testing. Internally things may change, you can work in waterfall, scrum, agile and other methodologies, but externally it’s pretty much the same — local development server or servers, a testing (I like to call it playground) server and your production server, where your website is live and available to the whole world.

Today’s post is about handling automatic code updates at your testing server directly from your source control. Now this may sound tempting but you should never do anything like this on your production servers. Production server updates might be covered in one of our next blog posts but that’s one place where you’re better off doing some deployments manually rather than automate them.

Source Control

You’ve probably dealt with version control earlier, whether it’s Subversion, Git or Mercurial. The nice thing about version control is that they’ve got hooks, similar to WordPress actions, and the one we’re going to look at today is widely known as post-receive and/or post-commit depending on the version control you’re using. Quoting from the Git hooks manual, post-receive:

This hook is invoked by git-receive-pack on the remote repository, which happens when a git push is done on a local repository. It executes on the remote repository once after all the refs have been updated.

This basically means that you can carry out some specific work as soon as a push message is received to your git repository and in our tutorial today, we’ll be carrying out a pull command on our testing server as soon as that happens. With this sort of setup in your workflow, whenever a set of commits is pushed to your git repository, the changes will be (almost) immediately reflected on your playground server.

You should get familiar with Github as that’s what we’ll use today, and as you will see later, Github provides a way to fetch a specified URL when a push message is received. If you’re running Subversion or Mercurial, the principles are similar, but you should read your guides before messing around with your repositories and make sure you’re playing around with a sandbox one and not a real-life project.

Shell Scripts, Linux Cron & PHP

By now you should have realized that this is not a beginners topic, which is why I have to mention this. To understand the theory of how it’s done is one thing, but if you’re about to start making changes to your repositories, hooking and twisting and writing code for this, please make sure you’re familiar with writing and executing shell scripts, dealing with files and directories permissions in Linux, comfortable with editing your system cron files (without using any GUI) and of course have some file IO knowledge in PHP. That goes without saying that you need SSH connectivity to your hosting account and the privileges to edit certain system configuration files. I’ll be assuming that you’re running a VPS with sudo/root access. I am.

To summarize that all up, our goals for today are:

  1. Create a flexible file structure for dealing with code updates
  2. Create a shell script that would pull new code from the repository
  3. Execute that shell script using your system cron
  4. Create a PHP script that would signal when post-receive was issued
  5. Launch that PHP script when a post-receive is issued via Github

File Structure, User Permissions and Cron

As I mentioned earlier, you should be familiar with file permissions, owners and groups. The tricky part is that Github can fetch a certain URL that you have specified, but the PHP script at that URL is probably executed by your Apache or nginx user, typically www-data. It’s quite a major security risk if the www-data user is allowed to carry out your code upgrade so we won’t give it any extra privileges to do that. Instead we will delegate the update job to a different Linux user with read and write access to your source code.

We’ll do that by having a special directory, readable and writable by any user in the Linux system, including www-data, which upon receiving a post-receive request, will simply create an empty file in that directory, which will signal that there’s something new in our git repository and a git pull is required.

We’ll have a cron job running every few minutes that will check if that specific file exists in that special directory, and if it does, run an update on behalf of a user with access and then finally remove the file. Simple as that.. Right?

I’ll be working on my Linux box (my playground server) under my own username with sudo access, you should use your own. Here’s how my directories and file structure looks like:

  • /home/kovshenin/ is my home directory
  • /home/kovshenin/git-pull-requests/ is that special read and write directory
  • /home/kovshenin/www/example.org/ is the www root of my project which is under source control

So typically I’d browse to /home/kovshenin/www/my-project/ and fire git pull to grab the latest from the repository, but I won’t do that any longer.

Creating the Pull Request File

In this step we’re going to create a PHP script that will create a new file in our pull requests directory. The PHP script will be the one called by Github’s post-receive script so make sure it’s public. Suppose my project is really available at example.org, I’d put my PHP script in /home/kovshenin/www/example.org/github-post-receive.php to make it available at example.org/github-post-receive.php for Github — we’ll deal with this later.

The contents of the file are quite simple, as I mentioned we simply need to create an empty file, like this:

$file = fopen( '/home/kovshenin/git-pull-requests/example.org', 'w+' );
fclose( $file );

We’re using the fopen function in the w+ mode which means open for writing, plus create the file if it doesn’t exist. Since we won’t be writing anything into the file, we close it immediately. Don’t forget to open your php tag at the beginning of the script file ;)

At this stage, in theory, whenever a push is issued to our git repository, the github-post-receive.php script is executed by www-data, creating a new file in the git-pull-requests directory called example.org. Let’s give it a test drive by setting up the post-receive hook at Github.

Post Receive on Github

Setting this up is really easy with Github’s interface. All you have to do is browse to the Admin screen of your project and locate Post-Receive URLs under your Service Hooks.

Github Post-Receive Service Hook Configuration

You will have an option to specify one or more URLs for post-receive. That’s where you input http://example.org/github-post-receive.php, replacing example.org with your own project domain of course. Note that Github will fire a POST request to that page with a certain payload about the repository and the latest commits.

You can test the service hook from Github, but better save your settings, create some minor commit and push it to your repository. At that point on your playground server a new file should have been created inside the git-pull-requests directory called example.org. Right? Congratulations!

Setting up Your Cron Job

Now that a pull request file is being created, you’ll need a cron job to fire a shell script that checks for that file’s existence and fires a git pull command to your source files. You should locate your cron files depending on your Linux distribution and version, I’m using Ubuntu Server and I’ve got a neat directory called /etc/cron.d/ where I can create a new cron configuration file, I’ll call it git-pull-requests and here’s the content of the file:

*/10 * * * * kovshenin /home/kovshenin/git-pull-requests/fetch.sh

You should read more about cron if you’re unfamiliar with the snippet above. I’m basically firing the fetch.sh shell script (we didn’t write it yet) located in my git-pull-requests directory. The script is fired on behalf of the kovshenin user once every ten minutes. Also note that you need sudo privileges to edit the cron files.

Once that is done, restart your cron daemon:

$ sudo service cron restart

That’s it for time scheduling. We now know that our fetch.sh shell script will be fired once every ten minutes and we also know that the script doesn’t yet exist, so let’s go ahead and create it.

The Fetch Shell Script

Don’t confuse the script’s name with the git fetch command, it’s a little different, but I didn’t call it pull for a reason. The shell script will not pull unless there’s a signal for that and by signal I mean the example.org file existence in our requests directory.

So change dir to your git-pull-requests directory and create a new file called fetch.sh. We’ll use a simple if statement in the script to check for file existence, remove it if it does and launch a git pull command on our source code.

if [ -f /home/kovshenin/git-pull-requests/example.org ]
then
rm -f /home/kovshenin/git-pull-requests/example.org ]
cd /home/kovshenin/www/example.org && git pull
fi

Save the file and make sure it’s executable. Also make sure that your git-pull-requests has the 0777 mode for read and write access by any of the users on your Linux server. Here’s how you do both:

$ chmod +x /home/kovshenin/git-pull-requests/fetch.sh
$ chmod 0777 /home/kovshenin/git-pull-requests

That’s it for our shell script, nice and clean. Note that if you’re managing several projects on your testing servers, you can use the same fetch file for all of them, just use different file names for pull requests.

Testing the Monster Out

Okay so we’ve basically set everything up. We’ve got a github-post-receive.php script that will signal when an update is needed by creating an empty file called example.org into our pull requests directory. We’ve got a cron job set to run our fetch script. We’ve got the fetch script set to look for signal files and carry out the updates. Where do we start testing?

Well the easiest way would be to run your shell script manually, see if it generates any errors. Create a fake signal file called example.org and run the script again, see if it runs a git pull command and see if it cleans up the signal file. Second is to recheck your git post-receive hook again, push something to the repository and see if a signal file was created. Run your fetch script and see if the source got update and whether the signal file was deleted.

Last testing bit is to remove all the signal files, create a few commits and push them to your git repository, sit and wait. Your testing server should automatically update itself within 10 minutes after your push. Congratulations!

Conclusion

We’re pretty much done here as you can see. Again, please note that this is not a technique you would use on your production servers. Production servers should get code updates manually, and usually via the git checkout command and tags, not from the master branch.

If you’ve got quite a few team members working on one project you should probably create a testing/playground branch and tune your automatic updates to that instead of the master branch. That way, you and your team members can commit and push as often as you need to, then merge at the playground branch to get a single update on your testing server. This will take the load off if there are way too frequent pushes.

After fine-tuning all the scripts, setting up more auto-update channels for your other projects on the same playground server and thoroughly testing out the way it works, you should be able to forget about doing anything manually on the testing server. Even a database fixture can be part of the update process, it’s a matter of tweaking your scripts.

So what do you think about this whole process. Is it worth going through the trouble of setting it up? Or are you comfortable pulling code updates to your testing server manually? Or perhaps via FTP? ;) Tell us what you think in the comments section below! Hope you enjoyed the read and tuned to our RSS feed for more geeky stuff.