Google Analytics API in WordPress

I’m pretty sure you all know what Google Analytics is, but not everybody is aware that it comes with a Data Feed API which can be used to grab different sorts of valuable information from your Analytics profile.

In this post we’ll talk about the API, authentication and firing requests to the API using the WordPress HTTP methods, about understanding and parsing the response and finally, we’ll go through some examples of how the techniques can be used to get interesting information from your Analytics account straight into your WordPress site.

Getting Started

Before you get started, you should be familiar with the WordPress HTTP API, with functions like wp_remote_get and wp_remote_retrieve_body. You should have a good idea about what XML is and how to use the SimpleXML library (php 5) to parse the response from Google, but if you don’t, no worries — it’s easy enough to copy and paste your way through.

The Google Analytics Data API

Google currently provides two APIs for Analytics — Management API and the Data Export API which are both part of Google Labs, meaning they’re in active development stage and can be changed (or even deprecated) from time to time. So there’s no real guarantee that the things we’ll write today will keep working one year from now.

If you look at the getting started page in the API docs, you’ll see that you need a Google Account (which you hopefully have) and Google Analytics set up with a profile for your WordPress site that’s got some data (that’s your GA-XXX string). Also take good note at the Quota section and make sure you’re caching the results wherever possible. We’ll talk about caching at the end of this tutorial.

Authorization

Before we can fire requests at the Google Analytics API, we need to authorize ourselves and Google currently offers three authorization methods — ClientLogin, AuthSub and OAuth. You can learn more about all the three methods in the Authentication and Authorization for Google APIs documents.

We’ll use ClientLogin in this tutorial for simplicity, which is not as safe as the others but allows you to get an authentication token by providing a Google Account login and password to Google’s ClientLogin URL. The token can then be used in our requests to the Analytics API. To get a token we’ll use cURL from our command line.

You shouldn’t have any problems executing this command on any of the major Linux distributions (having cURL installed) or Mac OS X. Windows might cause some trouble, but if you manage to get cURL working, you can drop the awk part and simply ignore the returned SID arguments from Google. Here’s the command where you’ll have to replace the e-mail and password with your own, and make sure that the Google Account has got access to your Analytics profiles:

curl https://www.google.com/accounts/ClientLogin -s \
 -d Email=yourname@gmail.com \
 -d Passwd=password \
 -d accountType=GOOGLE \
 -d service=analytics | awk /Auth=.*/

The command should return an authentication string that looks something like this:

Auth=DQAAAMQAAAARqOjsWH2S-RR2J0jNRWS2KjmqC0xkCPkUNSAOdmwAy5bGAre9dwAe ...

Google Account Authentication with cURL

Copy the full authentication string and store it in a variable somewhere in your code, I used a variable called $auth_string in my code to store my Google Account authentication token. We’ll use that later when firing requests at the Analytics API.

Making Requests

Before making any requests, we should decide where and when we should fire these. It really depends on what kind of data you’d like to get from the API, for example if you’d like to get the number of visits to the particular post somebody’s viewing, you should go for the wp action and run a check for is_single before querying Google; if you simply want to see some figures inside your admin panel you can use the admin_init action, and so on.

We’ll use the init hook here as a one fit for all and for brevity, but note that it’s probably not the best place depending on the purpose, since init is fired upon every request, so we’ll see the results in any case. Let’s create a function in your theme’s functions.php file (or a plugin) that would run during init:

add_action( 'init', 'my_google_analytics_experiments' );
function my_google_analytics_experiments() {
    // Rest of the code goes here
}

If you’re using PHP 5.3 or above, you can even write that cleaner and more l33t, like this:

add_action( 'init', function() {
    // Rest of the code goes here
} );

Looks like javascript, eh? Don’t use that in public themes and plugins though ;) What we’ll do next is make sure our authentication token is available, create the request headers and query arguments, fire a request to the Google Analytics API and simply output the response as Google Analytics returns it. So inside our function we’ve got something like this:

$auth_string = 'Auth=DQAAAMQAAACgjdMY-JrKkN6uwWv ...'; // Replace with your Auth token
$headers = array(
    'GData-Version' => 2,
    'Authorization' => 'GoogleLogin ' . $auth_string
);

// These will be different for different requests
$query_args = array(
    'ids' => 'ga:12345', // Replace this with your GA profile
    'metrics' => 'ga:visits',
    'start-date' => date( 'Y-m-d', mktime( 0, 0, 0, date( 'm' ) - 1, date( 'd' ),   date( 'Y' ) ) ),
    'end-date' => date( 'Y-m-d' ),
    'max-results' => 50
);

So these are the authentication string, the headers and the query arguments to the Analytics API. Query arguments are somewhat tricky, especially when it comes to memorizing them. In this case we’re getting the number visits between the start and end date, where the end date is set to “today” while the start date is set to last month. Don’t forget to replace the ids value with your Google Analytics profile ID.

Data Feed Query Explorer

You can use Google’s Data Feed Query Explorer to get an insight of what query variables are available and how they work, as well as the data feed reference to learn about metrics and dimensions, sorting and filtering. To finally create the request and grab the response, we use http_build_query, wp_remote_get and wp_remote_retrieve_body:

$url = 'https://www.google.com/analytics/feeds/data?' . http_build_query( $query_args );
$response = wp_remote_get( $url, array( 'headers' => $headers, 'sslverify' => false ) );

// Parse the request
if ( ! is_wp_error( $response ) ) {
    $body = wp_remote_retrieve_body( $response );
    // $body will (hopefully) contain some XML data
}

The sslverify bit makes sure that this works locally. We can now print_r the $body variable to see what exactly was retrieved from Google and refer to the Account Feed Response section to see what each part of the XML means. We can also add a prettyprint key with the value of true to the query arguments to get a human readable XML which is perfect for debugging and learning.

Parsing the Response

At this point, our $body variable will contain XML retrieved from Google. It’s quite easy to transform that XML into a useful object, especially if you’re familiar with SimpleXML in PHP 5. So to get a SimpleXML object we would do something like this:

$xml = new SimpleXMLElement( $body );

And then use print_r on the $xml object to see what’s available. The tricky part though is that Google uses it’s own namespaces for things like metrics, dimensions and so on, and we won’t have those out of the box and are required to explicitly ask for them with the children method. For example, to grab the dpx namespace for each entry we would write the following:

// Loop through the XML entries
foreach ( $xml->entry as $entry ) {

    // Grab the DPX namespace (Analytics)
    $dpx = $entry->children( 'http://schemas.google.com/analytics/2009' );
}

Of course you can use the print_r function to see what’s available in $dpx but we wrote a little snippet that would parse the dpx metrics and dimensions for you and provide them in an easily accessible associative array. Put this inside the foreach loop right after you got the $dpx object:

// We'll populate these arrays with metrics and dimensions
$metrics = array();
$dimensions = array();

// Loop through each metric
foreach ( $dpx->metric as $metric ) {
    $attributes = $metric->attributes();
    $value = (string) $attributes[ 'value' ];
    settype( $value, (string) $attributes[ 'type' ] );
    $metrics[ (string) $attributes[ 'name' ] ] = $value;
}

// Loop through each dimension
foreach ( $dpx->dimension as $dimension ) {
    $attributes = $dimension->attributes();
    $dimensions[ (string) $attributes[ 'name' ] ] = (string) $attributes[ 'value' ];
}

Quite tricky eh? We basically get all the returned metrics, loop through them and assign each one to the $metrics array. Since metrics types can be different, we make use of the type attribute that Google gave us and the settype function to convert the value to the related type. I believe this will not work with dates and other complex objects but it does with numeric values and numeric is fine for most of the values we’d want to retrieve from the API.

The dimensions part is similar except the type casting bit. After these two loops you can inspect the $metrics and $dimensions arrays to see what’s available for each entry. Also note that you can have different number of entries for different types of queries as you’ll see in our examples, but before we get to them, let’s take a look at the full code for our function hooked to init, just to make sure we didn’t miss anything:

function my_google_analytics_experiments() {
    // Authentication string from ClientLogin
    $auth_string = 'Auth=DQAAAMQAAACgjdMY-JrKkN6uwWva4V ...'; // Your authentication token

    // Headers remain the same with each request
    $headers = array(
        'GData-Version' => 2,
        'Authorization' => 'GoogleLogin ' . $auth_string
    );

    // Query arguments may change
    $query_args = array(
        'ids' => 'ga:12345', // Replace with your Google Analytics account
        'metrics' => 'ga:visits',
        'start-date' => date( 'Y-m-d', mktime( 0, 0, 0, date('m') - 1, date( 'd' ),   date( 'Y' ) ) ),
        'end-date' => date( 'Y-m-d' ),
        'max-results' => 50
    );

    // Format the URL and fire the GET request
    $url = 'https://www.google.com/analytics/feeds/data?' . http_build_query( $query_args );
    $response = wp_remote_get( $url, array( 'headers' => $headers, 'sslverify' => false ) );

    // Parse the request
    if ( ! is_wp_error( $response ) ) {
        $body = wp_remote_retrieve_body( $response );
        $xml = new SimpleXMLElement( $body );

        // Loop through the XML entries
        foreach ( $xml->entry as $entry ) {

            // Grab the DPX namespace (Analytics)
            $dpx = $entry->children( 'http://schemas.google.com/analytics/2009' );

            // We'll populate these arrays with metrics and dimensions
            $metrics = array();
            $dimensions = array();

            // Loop through each metric
            foreach ( $dpx->metric as $metric ) {
                $attributes = $metric->attributes();
                $value = (string) $attributes['value'];
                settype( $value, (string) $attributes['type'] );
                $metrics[ (string) $attributes[ 'name' ] ] = $value;
            }

            // Loop through each dimension
            foreach ( $dpx->dimension as $dimension ) {
                $attributes = $dimension->attributes();
                $dimensions[ (string) $attributes[ 'name' ] ] = (string) $attributes[ 'value' ];
            }

            // Let's see what we got!
            print_r( $entry ); // Entry title, description, url, etc.
            print_r( $metrics ); // All the metrics
            print_r( $dimensions ); // All the dimensions
        }
    }
}

Not that complex after all, is it? If you’re coding for production use make sure you provide an else statement too during the is_wp_error check for some failover, and of course don’t forget about caching, which we’ll talk about later, but now, let’s get on to some examples!

Examples

There are a lot of different things we can get from Google Analytics since most of the data is available. It’s up to your imagination of how you want to structure that data, filter it and sort it. You can also refer to the Common Queries docs for some other good examples.

Number of Visits Today

You can use this query to show off the total number of visits today. It’s not real time since Google Analytics isn’t, but in most cases you’ll get a figure which is true with a possible error of 30 minutes to an hour or so. Same as you would switch the date to “today” when watching Google Analytics from your browser. Here’s our query variables for the total number of visits for today:

$query_args = array(
    'ids' => 'ga:12345',
    'metrics' => 'ga:visits',
    'start-date' => date( 'Y-m-d' ),
    'end-date' => date( 'Y-m-d' ),
    'max-results' => 1 // We'll have one result anyways
);

You can further filter these by search visitors, direct visitors, referral visitors and other stuff you can think of. Good to have this somewhere in the WordPress admin panel too if you’re obsessed with analytics — would save you a few minutes every time you want to check your track.

Monthly Pageviews by URL

If your visitor has landed on a certain post, you might want to show them some metrics for that particular URL. The tricky part is a well formatted URL, one that the Google Analytics API would understand and that’s not a URL at all, but a page path or pagePath.

Also note that this cannot be used during init because the current page permalink is not available at that time. You can switch the action to wp which works perfectly well.

$permalink = parse_url( get_permalink() );
$pagePath = $permalink[ 'path' ];

$query_args = array(
    'ids' => 'ga:12345',
    'metrics' => 'ga:pageviews',
    'filters' => 'ga:pagePath==' . $pagePath,
    'start-date' => date( 'Y-m-d', mktime( 0, 0, 0, date('m') - 1, date( 'd' ),   date( 'Y' ) ) ),
    'end-date' => date( 'Y-m-d' ),
    'max-results' => 1
);

Note the double equal sign and look at Filter Operators for more options including filters by regular expression. Again, you can further filter this by search, direct and referral visitors, and stick it into your admin panel too, perhaps via a custom column in your WordPress admin.

Popular Posts this Month

This example will return more than one entry so make good use of the max-results value. There’s also quite a tricky part figuring out the correct filter to use and this is where it comes to regular expressions. Let’s take a look at this query array:

$query_args = array(
    'ids' => 'ga:12345',
    'dimensions' => 'ga:pagePath',
    'metrics' => 'ga:pageviews',
    'sort' => '-ga:pageviews',
    'start-date' => date( 'Y-m-d', mktime( 0, 0, 0, date('m') - 1, date( 'd' ),   date( 'Y' ) ) ),
    'end-date' => date( 'Y-m-d' ),
    'max-results' => 20
);

It will return the first twenty page paths with the most page views during the month, but the problem is that if you inspect the results, you’ll see that your home page is included, as well as all of your static pages, category archives, etc. There’s no easy way to get rid of them since Google doesn’t really know which page is_single, which is_page and which one is_home or is_archive — you’ll have to write a regular expression to match your posts URLs.

This depends on the permalink structure you have set up and if you’re good at regular expressions, you’ll be able to figure out yours in a few minutes. We’re using the /year/month/slug-id/ here on Theme.fm and our regular expression looks like so:

^/[0-9]{4}/[0-9]{2}/.+-[0-9]+/$

Four numbers for the year, slash, two numbers for the month, slash, any number of any characters for the slug, dash, any number for the post id. Here’s how we then use the regular expression as a filter for ga:pagePath:

$query_args = array(
    'ids' => 'ga:12345',
    'dimensions' => 'ga:pagePath',
    'metrics' => 'ga:pageviews',
    'sort' => '-ga:pageviews',
    'filters' => 'ga:pagePath=~^/[0-9]{4}/[0-9]{2}/.+-[0-9]+/$',
    'start-date' => date( 'Y-m-d', mktime( 0, 0, 0, date('m') - 1, date( 'd' ),   date( 'Y' ) ) ),
    'end-date' => date( 'Y-m-d' ),
    'max-results' => 20
);

Voila! You can now watch your $dimensions and $metrics array to see the page views and their page paths for every loop in the array. You can furthermore attach the page paths to the domain to create a URL and use the url_to_postid function to grab a valid post ID for each entry.

$path = $dimensions[ 'ga:pagePath' ];
$url = home_url( $path );
$post_id = url_to_postid( $url );

Collect a few of those post IDs and use them in a WP_Query to render your monthly popular posts in a sidebar widget or elsewhere. It’s also easy enough to transform this into a weekly popular one or perhaps most tweeted or searched. Play around with filters, you’ll enjoy it :)

Caching

I mentioned caching a couple of times earlier in this post, but since caching is out of the scope of this tutorial, we won’t cover much of it. Basically, keeping the Google API quotas in mind and of course website performance, you don’t want to fire requests every time a page loads. That would drastically slow down the website.

Instead, query the API from time to time and store the results in a database. You can use Transients or post meta, or even Memcache depending on the context. If you’re gathering statistics for each of your 500 blog posts, you could also move the whole thing to a scheduled task running a couple of times a day, but still keep in mind your API quotas and keep caching where possible.

Recap and Conclusion

That’s about it folks. Today we’ve covered the Google Analytics Data Feed API. We talked about authentication and making requests to the API using WordPress HTTP functions and making sense of the responses from Google using the SimpleXML PHP library. We’ve also shown a few interesting examples that you can use and build upon to get the metrics you like as well as converting Google’s pagePath to a totally valid post ID in WordPress.

Hope you enjoyed our post and learned something new today. Feel free to ask us questions, post your thoughts or just say “hi” using our comment box below. Thank you so much for reading and keep in mind that we really appreciate you sharing our posts too! Thanks again and stay tuned!