WordPress Internals: The Rewrite API

In today’s edition of WordPress Internals we are going to dig into the WordPress Rewrite API. Exploring its inner workings and mechanisms should allow us to understand WordPress better, why it does what it does and how it does it.

Have you ever wondered how the heck does WordPress take /2011/10/15/wordpress-rewrites-ftw and renders that exact page? Surely there’s no 2011 folder and a file called wordpres-rewrites-ftw does not exist inside the document root. So what’s going on?

All front-end requests for pages are usually handled by WordPress index.php, it takes care of initializing all of WordPress and returning the page that was requested in a nice and clean response. We did a three-part series detailing this process, and if you haven’t seen it yet feel free to do so as it will greatly help in today’s journey of ours. Thus, index.php, through a series of calls, includes and requires the bits that transform something like /tag/wordpress/ into data that can be interpreted by the data-fetching parts of WordPress, like WP_Query‘s get_posts() method, or the query_posts() function. These bits cannot handle /tag/wordpress/ as a query, they take either an array of arguments or a query string and /tag/wordpress/ is not a query string.

Rewriting

This is where WP_Rewrite comes in, defined in wp-includes/rewrite.php, one of its key responsibilities is to handle the transformation of one request to another. It keeps track of these maps, generates them based on your Settings/Permalinks setup and even generates Apache rewrite rules to go along with this all.

As we already mentioned, index.php handles all front-facing page requests (the ones that part of what your visitors usually see), no other file can process a request for a WordPress page other than the index.php file, which is the exact reason why your Apache configuration files probably contain the following lines:

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L] 

…or why your nginx rolls with:

if (!-e $request_filename) {
    rewrite ^(.*)$ /index.php last;
}

These web-server rewrites redirect any requests to non-existing files or directories to the index.php for further handling, and guess what – /tag/wordpress/ is a non-existing directory, so index.php gets to process this one. Of course if you don’t have those rewrite rules you’re probably using some of the default http://wordpress.lo/index.php/2011/10/15/sample-post/ permalink structures, where the request /2011/10/15/sample-post is already being passed to index.php.

When a /2011/10/15/sample-post/ request comes in, index.php gets to process it, and it all goes down when the wp() function is called. This function calls the $wp->main() method which goes through several steps to map the human-readable, search-engine friendly request to one that can be used and comprehended internally. $wp->main() calls $wp->parse_request() internally and this is where the magic begins. So go ahead and open up wp-includes/class-wp.php and find line 120.

The method begins by parsing any extra arguments that exist in the request, for example /2011/10/15/sample-post?paragraph=2 has got paragraph=2 as an extra argument, these are parsed and kept for later use. Next, the $wp_rewrite global instance of the WP_Rewrite class is loaded. This is the main class that contains all the fancy methods to work with its rewrite maps. This class is defined along with lots of useful wrapper functions in wp-includes/rewrite.php.

All the stored rewrite rules are fetched into $rewrite = $wp_rewrite->wp_rewrite_rules();. WP_Rewrite holds a simple key-value pair that allows mapping parts of the original request to parts of the rewritten request, and its wp_rewrite_rules() method fetches these from the database using the get_option function. If there are no rules stored in the database, the rewrite_rules() method is called which generates them and stores them in the database.

Here’s how they look inside there…
WordPress Internals: Rewrite API

The Permalinks custom structure in my WordPress environment is set to /%year%/%monthnum%/%day%/%postname%/ and this is how some of my rewrite rules set looks like:

category/(.+?)/page/?([0-9]{1,})/?$ => index.php?category_name=$matches[1]&paged=$matches[2]
category/(.+?)/?$ => index.php?category_name=$matches[1]
tag/([^/]+)/?$ => index.php?tag=$matches[1]
page/?([0-9]{1,})/?$ => index.php?&paged=$matches[1]
([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/?$ => index.php?year=$matches[1]&monthnum=$matches[2]&day=$matches[3]
(.+?)(/[0-9]+)?/?$ => index.php?pagename=$matches[1]&page=$matches[2]

The $wp_rewrite->wp_rewrite_rules() method returns an associative array. Add the following line echo '<pre>'; var_dump($wp_rewrite->rewrite_rules()); echo '</pre>'; inside your theme or at the very end of index.php to see your rewrite rules. By the way, if you think that’s brainf*ck, the left-hand side (the keys) are actually Perl-compatible regular expressions (PCRE).

The original request is acquired from the $_SERVER['REQUEST_URI'] server global and stored. After lots of cleaning up (removal of trailing slashes, etc.) at line 187 we // Look for matches. Each of the rewrite rules is preg_match‘ed against the regular expression until a match is found and stored inside $this->matched_rule.

Match and Map

/2011/10/15/wordpress-rewrites-ftw matches (after cleaning up the slash in the front) against the ([0-9]{4})/([0-9]{1,2})/([0-9]{1,2})/([^/]+)(/[0-9]+)?/?$ rule. Four digits, followed by a slash, 1 or 2 digits, another slash, 1 or 2 digits, another slash, then grabs everything that has no slashes, and then possibly a slash followed by some digits. Quite a rule, huh? And yes, the request matches the regular expression, thus it’s chosen, and the $matches variable contains all the matched groups (the ones that are in parenthesis). The $matches would contain:

$matches[1] => '2011'
$matches[2] => '10'
$matches[3] => '15'
$matches[4] => 'wordpress-rewrites-ftw'

See where this is going? The year, month, day and post name have been saved as matches, and the mapped query for the whole rewrite match is index.php?year=$matches[1]&monthnum=$matches[2]&day=$matches[3]&name=$matches[4]&page=$matches[5]… see where this is going? Yes, you guessed it, the matches values have to be substituted into the query to form one that you’ve probably seen dozens of times. A whole class has been written to do this, it’s defined in the wp-includes/class-wp.php and is surprisingly not part of the WordPress Rewrite API. WP_MatchesMapRegex::apply($query, $matches) static method is called with the rewrite match (index.php?year=$matches[1]&monthnum=$matches[2]&day=$matches[3]&name=$matches[4]&page=$matches[5]) and the array of matches.

WordPress Internals: The Rewrite API

The class uses preg_replace_callback() to replace occurrences of the $_pattern = '(\$matches\[[1-9]+[0-9]*\])' pattern with values from the $matches array. The result is a year=2011&monthnum=10&day=15&name=wordpress-rewrites-ftw string. And you’ve probably written such strings when querying for posts, haven’t you?

What a process! You may have noticed that most of what was done was conducted and maintained by the main WP object, it did the cleaning up of the initial request, figured out what went where, requested the rewrite rules, matched, rewrote and continued to do other intricate operations mostly by itself. $wp_rewrite‘s role was limited to purely fetching the rules. The rewriting was done by WP_MatchesMapRegex, and it has, most probably, limited use outside the WordPress flow of things.

Experiments

Now that we understand how mapping and rewriting works in WordPress internally, we are ready to explore how a developer would interface with the Rewrite API. Adding a rewrite rule is as simple as using the wrapper function add_rewrite_rule() and flush_rewrite_rules() thereafter. The former function simply adds a regular expression and it’s destination expression before or after the list of rewrites. If the destination does not start with index.php the destination is considered to be an external one and will not be added to the rewrite rules stack but rather added to the Apache configuration file as a mod_rewrite rule.

Time for an example. There are better ways of accomplishing what we’re about to do without the needless intricacies, but for the sake of what we’re learning I’ll do it the WordPress Rewrite way. Let’s say Bob is a photographer who photographs dogs. He’s got this fancy custom post type called ‘Doggy’, with a number of custom fields – size, hair, color. We know that these can be filtered by providing the keys as arguments when querying for posts like hair=fluffy&color=ginger&size=medium, right? How would we go about helping bob have links like http://bobsdogs.lo/doggy/medium/ginger/fluffy? Exactly, rewriting should help to an extent.

When a custom post type is created new rules are added for that post type automatically, so that /doggy/ will be rewritten to post_type=doggy, convenient, is it not? (You may have to flush the rewrite rules for it to work). So that’s half done, however none of the custom fields are rewritten, so that will have to be taken care of.

After adding the custom post type, we have to add taxonomy for the dog attributes and set query_var to true, in order to be able to query by those dog attributes. And finally, we add a rewrite rule like so add_rewrite_rule( 'doggy/([^/]+)/([^/]+)/([^/]+)', 'index.php?post_type=doggy&size=$matches[1]&color=$matches[2]&hair=$matches[3]', 'top' ); which matches 3 sets of something and maps them to size, color, and hair.

WordPress Internals: Rewrite API

Here is the full draft code that I placed in my theme functions.php:

add_action ( 'init', function() {

    /* Register the doggy post type */
    register_post_type( 'doggy', array(
        'labels' => array(
            'name' => __( 'Dogs' ),
            'singular_name' => __( 'Doggy' )
        ),
        'public' => true,
        'has_archive' => true,
        'supports' => array( 'title', 'editor', 'custom-fields' )
    ) );

    /* Register the taxonomies */
    register_taxonomy( 'hair', 'doggy', array(
        'query_var' => true,
        'labels' => array( 'name' => 'Hair' )
    ) );
    register_taxonomy( 'color', 'doggy', array(
        'query_var' => true,
        'labels' => array( 'name' => 'Color' )
    ) );
    register_taxonomy( 'size', 'doggy', array(
        'query_var' => true,
        'labels' => array( 'name' => 'Size' )
    ) );

    /* Add the rewrite rule */
    global $wp_rewrite;

    add_rewrite_rule( 'doggy/([^/]+)/([^/]+)/([^/]+)', 'index.php?post_type=doggy&size=$matches[1]&color=$matches[2]&hair=$matches[3]', 'top' );
    flush_rewrite_rules();
} );

This is very draft code, do not use it as is. The regular expression is crappy, too. But you get the point, the important lines are the last two. And yes, that is my dog Plushka :)

The rest of the functions exposed by the Rewrite API allow you to add feed types that can be matched by the automatically-generated rules, endpoints (appended to the end of a query), rewrite tags (like %year% that can be used in your permalink structures (and I say “structures”, because you can have more than one by using the add_permastruct())), and lots of other advanced and obscure stuff that we aren’t going to explore today.

Conclusion

This is a very long post and we thank you for joining us on this journey. Hopefully the article was helpful and clear. We appreciate your leaving us your comment or question using the form below. Did you find any particularly interesting uses for the WordPress Rewrite API? Have you added any cool rewrite rules in your WordPress? We’d love to know!

Don’t forget to follow us on twitter and subscribe for more WordPress treats (no pun intended). Also send in your ideas for more WordPress Internals, where we cut through the core to bring you insight into, comprehension of and appreciation for the WordPress Core.