Speedier PHP Execution in WordPress 6.3

In this write-up, we talk about recent performance improvements that we did on WordPress 6.3, sharing both our findings and journey.

While this post will mostly be around performance improvements at the code level, we want to emphasize that when we write code, we first want it to be readable, correct, secure, and only after that, performant.

In any case, digging into both PHP and WordPress core internals can provide insightful knowledge, providing awareness of how things work – whether at the architecture level, the function level, etc.

Code performance 101

Code performance optimization is about modifying an existing code to use fewer resources, keeping the original behavior unchanged.

Here’s a basic school example that sums all digits from 0 to $n:

function sum_to_n( $n ) {
  $sum = 0;
  for ( $i = 0; $i <= $n; $i++ ) {
    $sum += $i;
  }
  return $sum;
}

This function uses O(n) steps (see Big O Notation) – meaning, on average it will take about $n steps to calculate the sum. Optimizing it involves using a mathematical identity:

function sum_to_n_faster( $n ) {
  return $n * ( $n + 1 ) / 2;
}

Now, sum_to_n_faster takes a single step, so we made an improvement here – took a function and converted it from linear to almost constant running time.

In practice, however, it’s usually trickier, and optimizing things like this is not always straightforward. Though, there are several questions that we can ask ourselves that may assist:

  • Where do we start?
    • What are the most frequently called functions?
    • What are the slowest functions?
    • What is the slowest behavior of the application in general?
  • What are we optimizing, and at which layer?
    • App layer? HTTP layer? Code layer? Caching?
  • How do we measure it?
  • What are the trade-offs?
    • Resources: Do we trade space (how much memory it takes during execution) or time (how much time it takes to execute)? (Space and time is an interesting general philosophical concept)
    • What is the impact of the proposed improvement?
      • How does it affect backward compatibility? How does it affect version compatibility?

Saving processing cycles by optimizing foreach

In WordPress Trac ticket #58457, we worked on optimizing the WP_Theme_JSON::append_to_selector method. The gist of this improvement is that we shift a conditional check one level above. So, instead of doing something like:

$foo = input...
foreach ( $x as $y ) {
  if ( $foo ) {
    a( $y );
  } else {
    b( $y );
  }
}

We are doing this:

$foo = input...
if ( $foo ) {
  foreach ( $x as $y ) {
    a( $y );
  }
} else {
  foreach ( $x as $y ) {
    b( $y );
  }
}

The proposed code is more verbose than the original, but the benefit outweighs the cost – note that this function is called about 1104 times per request.

The original measurement with cachegrind using Xdebug’s profiler showed 1.18% before our patch:

After our patch, the measurement showed 0.22% – an improvement of almost 1% in time!

Saving processing cycles in WordPress actions and filters

Another good target for optimization is the WordPress hooks system, as shown in the cachegrind output below. This naturally led to the investigation of the class-wp-hook.php file.

Reading a cachegrind output from a basic WordPress installation

In WordPress Trac ticket #58290, we use $x instanceof Y instead is_object( $x ) && $x instanceof Y to save a few unnecessary calls to is_object. While a small improvement, this code executes in WP_Hook::build_preinitialized_hooks that gets called in load.php, and thus scales up with the number of hooks added.

You can find more information in the Trac ticket, especially about the impact of another proposed improvement that we did, as an emphasis on the importance of cross-collaboration work.

Improving performance with pre-computed values

As we were digging deeper into the hook class (WordPress Trac ticket #58458), we noticed that the function array_keys (within array_filters) gets called about 1327 times on a basic WordPress installation.

To improve its performance, instead of calling array_keys every time, we pre-compute its value (and maintain it whenever the callbacks array gets changed) so that we don’t have to compute it every time within array_filters. With this improvement, we get the number of calls to array_keys down to 790.

This is an improvement of at least 537 calls to array_keys per request, scaling up with the number of requests and plugins installed.

For the correctness part, WordPress core already has some tests for hooks, and we rely on those, following the testing instructions.

For the abstraction part, even though it’s not expected for WP_Hook to be extended, it’s a possibility it will happen 🙂 The trade-off is if folks extend this, they will have to make sure to maintain callbacks_keys themselves.

This didn’t get in WordPress 6.3 but is scheduled for WordPress 6.4.

PHP is an interpreted language built on top of the C programming language. Naturally, code written in C will (most of the time) run faster than code written in PHP. The trade-off is that C is a more complex language than PHP – we must pay attention to pointers, memory management, etc.

wpboost is one experiment to improve WordPress performance by climbing the programming languages abstraction ladder. We experimented with taking some frequently called WordPress PHP functions and shifting them one abstraction level below – from PHP to C. Practically, this is the lowest level of the ladder, but in theory, there is no lowest level; there is also assembly, binary, etc.

With this experiment, one of the functions we tested is wp_slash. We ran a quick benchmark test using microtime() to compare a function written in PHP with its counterpart written in C. Note that even though the percentage is huge, the numbers are small. However, this scales with the number of times the function is called, the input, and the number of users it serves.

$ ./run-benchmarks.sh
...
Executing benchmarks for benchmarks/wp_slash.php
--------------------
PHP implementation of wp_slash takes 0.0000169277
C implementation of wp_slash takes 0.0000009537
Improvement of 94.366197%
...

Also note that dealing with compatibility here isn’t a big deal, as we can wrap the function definition of wp_slash in a function_exists( 'wp_slash' ) covering the case where the PHP extension wpboost is not enabled.

This approach looked like one of the most promising. Besides wp_slash, we also converted a bunch of other functions, such as _wp_filter_build_unique_id, _wp_array_get, absint, and zeroise. With just these functions, during the Xdebug profiling, the image on the right shows almost half of the time improved during wp-admin visit:

Besides this improvement, one of the trade-offs is that we now introduce a maintenance burden, as we have two different codebases. For example, if wp_slash gets changed in WordPress core PHP (highly unlikely but not impossible), we must update wpboost too.

In any case, working at the PHP C level is beneficial because it will teach you about the PHP internals, and knowing how PHP works internally will make you a better PHP developer.

If you are interested more about PHP internals, there’s the PHP Internals Book that contains good information but is still incomplete. In our experience, navigating the php-src codebase is the easiest/best way to learn.

(Here’s a small quiz on the way. 🙂 Do you know of PHP’s zval tagged union structure?)

One of the improvements we tackled was WordPress Trac ticket #58291. It’s essentially about optimizing the function _wp_filter_build_unique_id, which, given a function will return its unique hash. This function is used by WP_Hook – one of the most frequently used WordPress classes, and this improvement makes add_filter and remove_filter faster.

_wp_filter_build_unique_id uses spl_object_hash to compute the hash. However, as of PHP 7.2, we now have spl_object_id. The idea was to switch from spl_object_hash to spl_object_id as the latter will be faster in that it doesn’t do an additional sprintf call – you can compare both functions here to see that. Saving a single call to sprintf is a tiny improvement, but put on a scale, it will still be beneficial.

PHP_FUNCTION(spl_object_hash) {
// ...
    return strpprintf(32, "%016zx0000000000000000", (intptr_t)obj->handle);
}

PHP_FUNCTION(spl_object_id) {
// ...
	RETURN_LONG((zend_long)obj->handle);
}

Now, since spl_object_id is introduced in PHP 7.2, this is already a problem – while we recommend PHP >=7.4 for WordPress, we support down to PHP 5.6.20. This is easy to solve as we can introduce a polyfill in PHP for spl_object_id.

What about any other impacts of this change? This is tricky because this function is copy-pasted into VaultPress and WP-CLI. This is a problem because if we switch core to use spl_object_id (which has a different return value from spl_object_hash), it might mess up the callbacks array when combined with VaultPress/WP-CLI.

One would not have expected a “private” function to be copy-pasted like this, but the following quote summarizes this pretty well: 🙂

With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviors of your system
will be depended on by somebody.

Hyrum’s Law, Software Engineering at Google

This task is still a work in progress – especially in determining other impacts and finding ways around them.

Besides using GitHub to search for potential impact, another way is to look through the whole WordPress plugin repository by manually cloning that huge repo or using something like WP Directory.

This is a great example that shows how we should be constantly aware of the potential impact of our code, regardless if it is a performance improvement. We need to find a way to balance the risk and the reward.

Work on performance improvements is rarely a single-person job – it involves a lot of cross-collaboration work. Getting feedback/input and information from others is beneficial to get a deeper insight into some of the proposed improvements.

The larger our user base, the more attention we need to pay to performance. Even optimizations in the microseconds will have an impact at scale.

We need to continue experimenting, be aware of how PHP and WordPress work, and how the code we write and the data structures we choose will affect performance in the long run.

Have you done any recent performance improvements? How many users did it impact? How did it impact our systems? How did you measure it?

Props to Matthew Reishus, Romina Suarez, Nikolay Bachiyski, Daniel Bachhuber, and Donna Cavalier for their help and feedback!