WordPress Internals: The Cron

Last time, we dived into the abyss of WordPress Rewrites, this time we invite you on a journey through time managed by the WordPress Cron. The WordPress Cron API is responsible providing an interface for managing events to the schedule and spawning the main wp-cron.php file when it’s time.

It’s latter function is particularly useful when a crontab is unavailable or not setup. It is enabled by default which means that everyone usually gets approximately crontab-like functionality out of the box.

We shall be looking at the default behavior of WordPress in regards to its Cron. Every time a request comes in for a WordPress page, every time WordPress boots up the Cron attempts to run. When WordPress starts it includes a file called wp-includes/default-filters.php which adds, among lots of actions, an action to sanitize_comment_cookies add_action( 'sanitize_comment_cookies', 'wp_cron' ); and later on in the bootup process, it’s fired off do_action( 'sanitize_comment_cookies' ); (wp-settings.php line 214). The wp_cron() function is defined in wp-includes/cron.php which contains the functions that make up the WordPress Cron API.

wp_cron()‘s purpose is to determine whether to try and spawn the cron (wp-cron.php) or not. First of all a check is made to see whether a redirect loop occurred, and more importantly whether WP_DISABLE_CRON is defined to be true, in which case the function returns. wp_cron() proceeds to get a list of the scheduled events from the database by calling the _get_cron_array() function.

Eventkeeping

The structure of the scheduled events is stored as a serialized associative array, and the structure looks something like this:

WordPress Internals: The Cron

The key is a timestamp, denoting the time when the event is/was supposed to occur. This timestamp array can contain a number of hook names, which contain an md5 hash as a key to a set of schedule and interval times for recurring events and arguments for the action. The md5 key (the hash of the serialized argument array) helps organize events with the same hook name that occur simultaneously but with a different set of arguments. There can be many timestamps, which can have many hooks, which can have many sets of different arguments. Quite complex, but I hope that the representation diagram above worked.

If you actually go ahead and dump the data returned by get_option('cron') you’ll notice an extra version key. Cron event entries were not always structured the way they are now; you’ve been using version 2 since WordPress version 2.1.0, and WordPress makes sure to check the events structure and converts them to version 2 using the _upgrade_cron_array() every time the the _get_cron_array() function misses the ‘version’ key. It also strips the version, that’s why you’ll never see it after _get_cron_array() gets it.

WordPress Internals: The Cron

When events are scheduled and rescheduled this complicated event array is sorted by timestamp, this allows wp_cron() to compare the very first entry to the current time() to determine whether to go on or not; the first entry is the one with the smaller timestamp, the first one to be executed when the time arrives. Next, the schedules are searched for one with the name of the hook and a callback key inside. You can add your own schedule types besides the three default ones (daily, hourly, twicedaily) by hooking into the cron_schedules filter. Before moving on to spawning the actual cron, this obscure and undocumented part is executed if ( isset($schedules[$hook]['callback']) && !call_user_func( $schedules[$hook]['callback'] ) ).

It appears that you can actually add a schedule type with the name of the hook as a key to a callback that will be fed to the standard PHP call_user_func() function, and if it returns false the cron will not be spawned for the current hook; otherwise the cron is spawned. Yet, as soon as the cron is spawned it’s a point of no return (see the break 2; on line 281), so there appears to be no guarantee that your callback will be executed, if a previous hook in the timestabmp has no registered callback with the schedules that returns false, the other hooks will never go through the checking. Ideas anyone? In what scenarios would this be used and useful? The line appears as early as WordPress 2.1.

Spawning the Cron

Anyway, back on track to spawning the cron. spawn_cron( $local_time ); starts by checking if cron is already in progress in various scenarios and exits if something somewhere has raised the $flag (cron in progress), or if the previous cron ran less than a minute ago (this makes sure that cron is spawned no more than once per minute). Next we branch off into one of the two possible scenarios after making sure that this current cron process is reserved to be the only one running:

  • If ALTERNATE_WP_CRON is defined to true (great explanation from Otto) a redirect is set to the same page with an extra doing_cron query key added and the wp-cron.php file is included into the stream and a 302 redirect is forced to keep the visitor from waiting for the wp-cron.php to finish processing each task. While the first response is busy processing the scheduled tasks on the server on its own, the second (forked one) spawns a new request where cron does not run, so the visitor is greeted with the required page in no time.
  • The wp-cron.php file is requested via the wp_remote_post() WordPress HTTP API function, without blocking the flow, with a timeout of 0.01 seconds, which issues a request to the wp-cron.php?doing_cron URL using the PHP fsockopen() function. The minuscule timeout lets the script continue execution without waiting for any responses, showing the original requested page to the visitor while wp-cron.php does it thing in a separate request behind the scenes.

By the way, the doing_cron GET argument in both cases is not being used by wp-cron.php since version 2.8, I’ve no idea why it’s still there. Since 2.8 the WordPress cron mechanism has been using the Transients API to keep track of whether a cron is already running or not, before that an extra GET variable had to denote this lock. I guess they forgot to remove it or it’s just another mystery. Ideas?

WordPress Internals: The Cron

So as you can see, neither wp_cron() nor spawn_cron() WordPress functions actually do any pulling on the hooks (except for those obscure callbacks inside the schedules array). These functions merely check and double check if it’s time to run the cron and make sure that only one cron process is proceeded with. The do_actions are performed from within wp-cron.php, so let’s open it up in our favorite editors to see what it’s all about.

wp-cron.php

First off, ignore_user_abort (standard PHP function) makes sure that the script continues to execute even though the requesting side has discontinued the connection. If wp-cron.php is requested outside of the original request (case 2) then wp-includes/wp-load.php is required to boot up WordPress, otherwise WordPress has been booted up already (case 1). Next, just as inside wp_cron() the crons are fetched from the database and for each and every one of them do_action_ref_array($hook, $v['args']); is executed, which fires off your custom registered action hooks along with any registered default ones (like wp_version_check).

Each executed scheduled event is rescheduled or unscheduled depending on its type (recurring or not) with the use of wp_reschedule_event() and wp_unschedule_event(). The former adjusts the timestamp to a future time by adding the schedule interval to its current timestamp, while the latter simply unsets the event and flushes the event stack to the database.

The wp-cron.php file can be spawned by the regular crontab running on the server instead of on every page request (by means of php-cli, curl or even wget), which should shave a bit off the final response time for regular page request.

WordPress Internals: The Cron

Do remember to define( 'DISABLE_WP_CRON', true ); in your wp-config.php file, though. This is quite useful if you have dozens of request per minute hitting your WordPress site, that’s dozens of saved queries to the database for a list of events. But on the whole, the WordPress Cron internals seem to be very efficient, 1 database query for a list of schedules, a couple of lightweight tests to determine whether to spawn the cron or not, a couple of additional queries to set up the transient flags and a request for the main wp-cron.php file in a non-blocking way. Exquisite!

Experiments In The Unknown

How to add events to the WordPress schedule is documented quite thoroughly, best place to start is the WordPress WP-Cron Functions Codex. However, I can’t get my mind off that mysterious ‘callback’ key that is searched for inside the schedules and apparently executed, so let’s see what it’s all about.

We’ll start off by adding a new schedule type to the schedule type stack. You’d be tempted to do this in your theme’s function.php file, however it is never included before wp_cron() kicks off, so we’ll have to hook into cron_schedules from inside a plugin. Also note that it’s pointless to hook the code to the `init` action as it too is fired off long after `wp_cron()` is called. Latest action to hook to to make it in time appears to be the `plugins_loaded` action (check the WordPress Action Reference for a quick map of when actions are fired during bootup). Create a new plugin file inside your wp-content/plugins directory, call is the_cron.php and hook in a new mystery schedule here:

<?php
    /* Plugin Name: The Cron Experiments */

    add_filter( 'cron_schedules', function($schedule) {
        $schedule['mystery_hook_callback'] = array(
            'callback' => 'mystery_function'
        );

        return $schedule;
    } );
?>

Two things to notice, first of all, I’m using the PHP 5.3+ closures (anonymous functions), so if you’re running a lower version, use the regular method. Second, if you read about the cron_schedules filter in the cron_schedules Codex, you may have seen the following comment:

Be sure to add your schedule to the passed array, as shown in the example. If you simply return only your own schedule array then you will potentially delete schedules created by other plugins.

Yet, ever since the filter was introduced (WordPress 2.1), it had always merged the return value of a hook with its own $schedules stack (line 324 in wp-includes/cron.php return array_merge( apply_filters( 'cron_schedules', array() ), $schedules );), so we’re going to feel a little brave here and notice how nothing bad can happen. Besides, cron_schedules does not pass its original schedules stack to the filter, an empty array is passed… sadly our WordPress Internals series wasn’t around when the person writing the WordPress Codex page wrote that scary comment.

Thanks to Otto for pointing out the blunder. When the filter contents are passed from plugin to plugin if we don’t merge with all the previous schedule additions and return just our additions the scheduler will only get our additions and none of the previously added ones.

See how we added a schedule that doesn’t conform to the documented standard? No interval, no display name. Remember when we dug around we noticed that before the spawn_cron() function is called the following is executed: if ( isset($schedules[$hook]['callback']) && !call_user_func( $schedules[$hook]['callback'] ) ). So we are hoping to trigger the callback by scheduling an event the name of which is inside the $schedules array ($hook).

Two more things are required for our experiment. The mystery_function() and a scheduled mystery_hook_callback, which we don’t even need to register as an action. So let’s add the latter by scheduling a single event wp_schedule_single_event(time(), 'mystery_hook_callback'). For the mystery_function() let’s just echo something and exit. So the final plugin code would look something like this:

<?php
    /* Plugin Name: The Cron Experiments */

    function mystery_function() {
        echo 'I\'ve been executed outside of the wp-cron.php context!';
        exit();
    }

    /* time() makes sure it's scheduled to run with the next cron */
    wp_schedule_single_event(time(), 'mystery_hook_callback');

    add_filter( 'cron_schedules', function() {
        $schedule['mystery_hook_callback'] = array(
            'callback' => 'mystery_function'
        );

        return $schedule;
    } );
?>

Activate the plugin and run.

WordPress Internals: The Cron

Bizarre! Since we’re exiting the event is never removed or rescheduled, don’t refresh too often, don’t forget that every time this runs an even is scheduled. If mystery_function() returns true – the cron will be spawned soon after finishing with it, on the other hand, if mystery_function() returns false, spawning the cron will be skipped for this particular hook argument set.

By changing the mystery_function() a bit, you can have it show a message in the request context; something like ‘Spawning the cron…‘ at the top of the page, and make sure it returns true, otherwise the cron may not be spawned at all. That way you can be aware of the fact that the cron has been spawned with the current request.

WordPress Internals: The Cron

Among other crazy things to try is to have an event that is fired off once every minute and reveals a coupon code ‘Hey, the secret code is CRON!‘ to the lucky request using the callback, after which the event is rescheduled to run again in no less than a minute, when another visit is presented with the secret message.

I can’t really understand why this behavior has been present in wp_cron() all this time, is undocumented and is not used by the WordPress Core. How can this be put to good use when there’s no guarantee that a wp_cron() execution will trigger the callback; an event that is scheduled 1 second before a callback-routed event will go on to spawn the cron which will process all the events in the generic manner. Any ideas?

Conclusion

Well, thank you for joining us on another great journey into the WordPress Core, where we try to understand how WordPress works from the inside. We will be glad to hear your thoughts, questions, ideas, etc. so feel free to use the comment box below. If you want us to do a WordPress Internals episode on some specific part of WordPress (we like obscure and undocumented ones) feel free to drop us a line thorugh twitter and don’t forget to stay in tune with our latest thoughts and ideas.