WordPress Plugin i18n, Webpack, and Composer

By: Brad Jorsch

A lot of work has been going on in the Jetpack plugin lately. We have UIs built in React, with the JavaScript bundles being created by Webpack. We have Composer packages used for code sharing, increasingly so as we look into creating standalone plugins like Jetpack Backup and Jetpack Search. And we want everything to be translated for people who speak languages other than English.

A few months back we started getting reports that some translations in Jetpack had gone missing. As we looked into it, we eventually found no fewer than six different ways that translation was broken!

  1. JavaScript bundles weren’t being scanned due to bad file naming.
  2. Webpack’s optimizations were breaking the i18n function calls so WordPress.org’s translation infrastructure couldn’t find them.
  3. Lazy-loaded Webpack bundles weren’t lazy-loading translation data.
  4. Shared React component translations didn’t work in plugins other than Jetpack itself.
  5. Bundled Composer packages weren’t being scanned.
  6. Composer package translations didn’t work in plugins other than Jetpack itself.

It took us a few months, but we’ve now fixed them all. This post will describe how we did it.

Background: How plugins get translated

The recommended way to have your plugin translated is to let WordPress.org extract the translatable strings from your plugin, then build language packs for you based on the work of volunteer translators.

In your code, both PHP and JavaScript, you pass translatable strings to functions such as __()_x(), and so on. When that code is uploaded to WordPress.org SVN, it gets scanned for these calls. The strings for each call are collected and passed into the GlotPress installation at translate.wordpress.org, where volunteers translate them into various languages. The translations are later collected into language packs, which can be downloaded and installed into WordPress so people can experience your plugin in their own language.

(Aside: The extraction is part of the WP-CLI tool: wp i18n make-pot. They use the --slug and --ignore-domain options. Generation of JavaScript translation files is done in a manner similar to wp i18n make-json, but skipping any JS files in a src/ directory)

The extracted strings are all associated with a “text domain” matching your plugin’s slug. If the domain parameter passed to __() and so on doesn’t match, your translations won’t be found at runtime.

All this makes some assumptions about your code, some of which turned out not to be true for the way we were doing things in Jetpack.

Problem 1: JavaScript bundle naming

When we dropped support for Internet Explorer 11 in our JavaScript, Babel stopped transpiling modern syntax such as template strings into an ES5 form that IE11 could understand. This then broke when we deployed to WordPress.com, as that environment automatically applies it own minifier to JavaScript and CSS while serving it and their minifier doesn’t understand template strings either. WordPress.com doesn’t apply its minifier if the files are named like “bundle.min.js”, so we renamed our files like that.

But that ran into one of the WordPress.org translation infrastructure’s assumptions: they assume any “bundle.min.js” can be ignored because it will have a corresponding “bundle.js” next to it. 😬

We didn’t want to include several hundred K of non-minified JS in Jetpack though. Jetpack already has an undeserved reputation for being “bloated”, and the extra files wouldn’t help even if the non-minified JS is never used.

Solution: The URL passed to wp_register_script() can include a query part, and WordPress.com’s minifier can also be bypassed by including minify=false in the query part.

Then we took it a step further. The registration of a Webpack bundle usually involves a fair bit of boilerplate since you also need to read the information produced by @wordpress/dependency-extraction-webpack-plugin and often register some CSS too, something like

$relative_to = __FILE__; // Or something.
$assets = require dirname( $relative_to ) . '/build/bundle.asset.php';
wp_register_script(
    'handle',
    plugin_url( 'build/bundle.js', $relative_to ), // TODO: Add "?minify=false".
    $assets['dependencies'],
    $assets['version']
);
wp_set_script_translations( 'handle', 'textdomain' );
wp_register_style(
    'handle',
    plugin_url( is_rtl() ? 'build/bundle.rtl.css' : 'build/bundle.css', $relative_to ),
    array( /* Some other dependencies? */ ),
    $assets['version']
);

So we added a method in our automattic/jetpack-assets Composer package to handle all that in an easier way. And we can have it perform some simple checks, like requiring that a textdomain be given if the dependencies include wp-i18n.

Assets::register_script( 'handle', 'build/bundle.js', __FILE__, array( 'textdomain' => 'domain' ) );

Problem 2: Webpack optimization breaking i18n function calls

The extraction of translatable strings from JavaScript depends on seeing the call to a function or method named __()_x()_n(), or _nx(), with the various parameters being passed as literal strings. These calls may be preceded by a “translator comment” which is also extracted.

In its default configuration, Webpack in production mode is likely to rename these functions to single-character names and to throw away those translator comments. And even if it’s working now, a changed configuration or a new code pattern might break it in the future (as happened to us when we updated to Webpack 5). 😬

Solution, part 1: The first step was to figure out the necessary configuration to preserve the i18n function calls and the translator comments.

  • Set Webpack’s .optimization.concatenateModules false, as the concatenation sometimes winds up renaming the methods.
  • Instead of relying on Webpack’s default configuration for Terser, supply (via .optimization.minimizer) an instance of terser-webpack-plugin configured to preserve the calls and comments.
    • .terserOptions.mangle.reserved set to reserve the four methods.
    • .terserOptions.format.comments set to a callback that identifies translator comments.
    • .extractComments set to a callback that identifies the license comments Terser preserves by default, which will now be extracted to a separate file instead to reduce the size of the bundle.
  • We also included Calypso’s @automattic/babel-plugin-preserve-i18n plugin to further help preserve the i18n method names.

Solution, part 2: To address the “even if it’s working now, it might break later” problem, and to help identify coding patterns that can break the i18n method calls even with the above configuration, we created @automattic/i18n-check-webpack-plugin. This plugin extracts the strings from the original sources and the output bundle to compare them and see if anything seems to have gone missing, so if something breaks it’ll make the build fail instead of having to wait for someone to notice the broken i18n and report it.

The documentation for the check plugin includes some known problematic code patterns and fixes for them.

Problem 3: Lazy-loaded Webpack bundles

For “entry” bundles, Webpack expects your HTML to include any additional files (e.g. CSS extracted by mini-css-extract-plugin) yourself. In a WordPress plugin this is fairly straightforward to do from PHP (and we made it even easier for ourselves using automattic/jetpack-assets as described above), and that includes loading of the appropriate translation data into @wordpress/i18n.

But if you use code like import( /* webpackChunkName: "async" */ './something' ), Webpack will create a “lazy-loaded” bundle that isn’t loaded until that import() call is executed. In Jetpack we have one of these in the Instant Search module. For such lazy-loaded bundles the Webpack runtime knows how to load the extracted CSS, but it knows nothing about WordPress translation data. 😬

When we looked around we saw that Calypso had a fairly complicated solution in their code, a generic hook added in the Webpack runtime and specific code to load data when that hook fired. Woo had tried to adapt that but gave up in favor of tricking WordPress into loading the lazy bundle’s translations non-lazily. Neither solution appealed.

Solution: We created @automattic/i18n-loader-webpack-plugin to teach Webpack how to load the WordPress translation data. It’s designed to work in concert with @wordpress/dependency-extraction-webpack-plugin and automattic/jetpack-assets: when i18n-loader encounters a bundle that can lazy-load other bundles that use @wordpress/i18n, it will register a dependency on a “@wordpress/jp-i18n-state” module via the former that’s provided by the latter. The state data lets the Webpack runtime inside the bundle know how to locate the translation data, which it will then download and register with @wordpress/i18n during the lazy-loading process.

Problem 4: Text domains in shared React components

As part of the “Jetpack RNA” project, we’ve begun creating React components that can be shared by multiple plugins, like our own Jetpack Backup and Jetpack Search plugins.

But remember how the __() call needs to specify a domain, which is supposed to be a constant string (not a variable) and must match the plugin’s slug? 😬

Solution: We created @automattic/babel-plugin-replace-textdomain, a simple Babel plugin to rewrite the domains as the components are being bundled.

Problem 5: Bundled Composer packages being skipped

WordPress core doesn’t really use Composer; they have a composer.json, but just to pull in PHPUnit and a few other development tools. Where WordPress core needs libraries at runtime, they copy them in statically. Plugins either do the same or include Composer’s vendor/ directory in the code checked into WordPress.org SVN.

The WordPress.org translation infrastructure assumes that anything in vendor/ either has no translations or has its own translation mechanism entirely, instead of intending to use WordPress’s. (Although since __() and such would be defined by WordPress rather than the plugin, I’m not sure how that’s intended to work.) In our case, we do really want these packages’ strings included in the plugin’s language pack. 😬

Solution: Composer allows for custom installer plugins, which can install packages into different locations based on the “type” field in the package’s composer.json. We created automattic/jetpack-composer-plugin that installs “jetpack-library” packages into jetpack_vendor/, and set the types of the relevant packages to “jetpack-library”.

Problem 6: Text domains in Composer packages

As with the shared React components, the bundled Composer packages need to be using the plugin’s text domain because that’s where the translations are going to be. And this time we’re not compiling them into a bundle, so a compile-time replacer wouldn’t work. 😬

Solution: The solution here comes in several parts.

  1. We have automattic/jetpack-composer-plugin write an “i18n-map.php” file into jetpack_vendor/, collecting the WordPress plugin’s slug (set in its composer.json) and each package’s textdomain and version (from their composer.jsons).
  2. The WordPress plugin passes that file to automattic/jetpack-assets, which determines the mapping from each package’s domain to an appropriate plugin’s.
  3. Assets hooks into __() and such to try the plugin’s domain if no translation was found for the package’s. It also hooks into the script translation file loader to point to the script translation files included with the plugin’s language pack instead of the nonexistent packs for the packages’ text domains. And finally it includes the mapping in the state data for @automattic/i18n-loader-webpack-plugin so that can load the correct file for any lazy-loaded bundles.

We also made sure that our monorepo’s CI checks would catch common cases where developers might wind up with wrong text domains, using existing linting rules from @wordpress/eslint-plugin and wp-coding-standards/wpcs and custom checks to verify those rules’ configurations are in sync with each other and with composer.json.

If WordPress Core were to take on this problem, I think they could do it a bit better:

  1. Switch the translation infrastructure from being based on plugins and themes (which all use the plugin or theme slug as the text domain) to being based on the text domains directly. For example, instead of https://api.wordpress.org/translations/plugins/1.0/ and https://api.wordpress.org/translations/themes/1.0/ just have one endpoint that takes the domain.
  2. Let code declare to WordPress which domains it needs beyond the defaults of “plugin slug” and “theme slug”. This is so WordPress can download those extra domains, I don’t think WordPress cares beyond that.
  3. Let us register projects that aren’t plugins or themes (e.g. our packages) on translate.wordpress.org, so the packages can be translated and WordPress can fetch those translations like it does plugins and themes.

That way the plugin only needs to declare the packages’ text domains, and the translators would only have to translate each package’s strings once instead of doing so for every plugin using the package.

Summary and conclusion

We created several pieces to make everything work:

We also took advantage of @wordpress/dependency-extraction-webpack-plugin and Calypso’s @automattic/babel-plugin-preserve-i18n, as well as linter rules from @wordpress/eslint-plugin and wp-coding-standards/wpcs (and a fork of phpcs that we’ve been trying to upstream to let us have per-directory configs) to help developers in our monorepo keep text domains straight.

Overall this was quite a bit of work, but Jetpack’s i18n is now better than ever before. And we hope that this post describing the problems we found and our solutions might help other plugin developers improve their i18n as well.