Use code to explore and change JavaScript files

A while back I wrote, as an exercise in minimalism, the nano-i18n library for localizing strings. You write internationalized strings with a template tag:

import { t } from 'nano-i18n';
const name = 'Dan';
console.log(t`Hello, ${name}!`);

It also provides a hook for missing translations, so you can log them to the browser console, or collect them in one place so that they're easier to find when you want to translate them. Pretty straightforward stuff.

import { t, config } from 'nano-i18n';
config({
log: (msg, key) => console.warn(`Missing: "${key}"`);
});

const name = 'Dan';
console.log(t`Hello, ${name}!`);

// => Missing: "Hello, {}!"

This way of collecting strings has its disadvantages. Someone needs to go through all the possible states of the interface to trigger aforementioned hook. Worse, when you redo bits of the interface, you need the presence of mind to remember to clean up the strings that are no longer in use.

Instead, let's see how we can automate our way out of this predicament, and extract from a JavaScript codebase all string literals tagged with the t template with a script.


For answering basic questions about a codebase, or for simple code alterations, I would normally first attempt a well-crafted regular expression. I'll paste it into Sublime Text's Search & Replace interface, or ripgrep (look at the --only-matching and --replace options), or some combination standard command-line tools I'll never remember unless I have it written down somewhere.

This is not one of those times. The t tag can interpolate basically any JavaScript expression within the translation, so it's impossible to deploy a regex to match all the patterns that might occur in the typical codebase. We need a tool that understands the syntax we're working with.

A short survey of the tools available

There are quite a few options for parsing JavaScript: there's esprima, acorn, and @babel/parser. These parsers produce an Abstract Syntax Tree (AST) from a JavaScript string, either adhering the ESTree format, or to something fairly similar.

Further along, recast helps you alter the AST, and then remake it into a string. A combination of astring and astravel serve a similar purpose. But how do you know how to alter the AST? AST Explorer lets you type in sample JavaScript code and shows you the resulting tree as an interactive, navigable JSON.

Finally, jscodeshift is a command-line tool written in JavaScript that uses recast and helps with bits of admin, such as:

It's used for a variety of purposes, such as keeping your React components up to date with the occasional changes in React's API.

You could start with the tools from any level on the abstraction ladder, and build up from there. In this particular case, I'd rather not get bogged down in unrelated complexities & gotchas and focus on the unique aspects of the task at hand. With a few tweaks here and there, we can adapt jscodeshift to not modify JavaScript files, but instead gather statistics on them, so that's going to be the tool of choice today.

Analyzing code with jscodeshift

Install jscodeshift from npm and add a script to your package.json:

{
"scripts": {
"gather": "node node_modules/.bin/jscodeshift --help"
}
}

Running the script with npm run gather just displays its help information for now, but I wanted to get out of the way the non-standard way of invoking it. You don't usually need to run Node.js CLI tools with node, but this form is necessary at the time of writing to work around a quirk in jscodeshift.

To actually do something, the tool takes a JavaScript file containing the transform we want to perform on our JavaScript files, under the --transform command-line argument. It also needs a folder in which to look for files, so a proper invocation looks like this (with the node part omitted):

jscodeshift --transform=gather-translatables.js js/

A word of caution: jscodeshift has the ability to overwrite your JavaScript files. Make sure you use a version control system such as Git on the codebase you're working with, and that you don't have any pending changes to commit, before messing around with transforms.

The transform is listed below. The API looks a bit like jQuery, with chained methods on a collection of AST nodes.

Technically this is the point where you'd modify the AST to write back to the file, the task at which jscodeshift excels. We only want to gather some statistics, so we don't return anything from the transform function — this instructs jscodeshift to keep the source files intact.

Instead we use the report function that writes its argument to the console output.

gather-translatables.js

/*
The jscodeshift transform
-------------------------
*/

module.exports = (fileInfo, api) => {
const { jscodeshift, report } = api;
jscodeshift(fileInfo.source)
.find(jscodeshift.TaggedTemplateExpression)
.filter(isTranslationTag)
.forEach(path => {
const key = toTranslationKey(path);
report('\n' + key);
});
};

/*
Matches template literals tagged with "t":

t`Hello, world!`
*/

function isTranslationTag(path) {
const { type, name } = path.node.tag;
return type === 'Identifier' && name === 't';
}

/*
Converts a tagged template literal
to a `nano-i18n` translation key:

t`Hello, ${stranger}!`

Becomes:

'Hello, ${0}!'
*/

function toTranslationKey(path) {
const { quasis } = path.node.quasi;
return quasis
.map((q, idx) => {
return q.value.raw + (q.tail ? '' : `{${idx}}`);
})
.join('');
}

To see what I need to match and how to process the AST in the transform function, I've looked at the structure of a typical tagged template literal in AST Explorer, with this sample JavaScript code:

t`Hello, ${stranger}!`;
AST Explorer output

This is a simplified JSON output with just the essential properties for each node.

{
"type": "TaggedTemplateExpression",
"tag": {
"type": "Identifier",
"name": "t"
},
"quasi": {
"type": "TemplateLiteral",
"expressions": [
{
"type": "Identifier",
"name": "stranger"
}
],
"quasis": [
{
"type": "TemplateElement",
"value": {
"raw": "Hello, ",
"cooked": "Hello, "
},
"tail": false
},
{
"type": "TemplateElement",
"value": {
"raw": "!",
"cooked": "!"
},
"tail": true
}
]
}
}

The transform does what we want it to do — it extracts all the translation keys used in any JavaScript file from the the js/ folder. The AST processing part is done.

Let's turn our attention to how to shape this raw data we've just produced into something usable. Invoking the gather script, we can see that the output of report() is peppered with miscellaneous output from the tool, some of it colored. To clean up the output, we can use jscodeshift's --no-color and --silent flags, which suppress the color and the informational output, respectively.

Updated gather script in package.json
{
"scripts": {
"gather": "node node_modules/.bin/jscodeshift --no-color --silent --transform=gather-translatables.js js/"
}
}

Running the script again produces a cleaner output as a series of items with the following structure:

 REP js/path/to/hello.js
Hello, ${0}!

This form is not ideal, but the data is good enough to pipe into a separate script to achieve its final form. Here's a small Node.js script that does two things:

sort-gathered.js

const { stdin } = process;
stdin.setEncoding('utf8');

/*
Accumulate the input from `stdin`
into the `content` string.
*/

let content = '';
stdin.on('readable', () => {
let chunk;
while ((chunk = stdin.read()) !== null) {
content += chunk;
}
});

/*
When finished, process `content`.
*/

stdin.on('end', function analyze() {
const items = content.split(/\n? REP .*\n/);
/*
Unique keys, sorted alphabetically
*/

const unique = [...new Set(items)].sort();
console.log(unique);
});

And here's the final gather script, piped into sort-gathered.js:

{
"scripts": {
"gather": "node node_modules/.bin/jscodeshift --no-color --silent --transform=gather-translatables.js js/ | node sort-gathered.js"
}
}

P.S. Nathan points out on Twitter that instead of piping into the second sort-gathered.js script, we could use jscodeshift's undocumented JavaScript API.

# Changing JavaScript files

Finally, let's use jscodeshift for the purpose for which it has been designed: to actually change JavaScript files.

I've recently switched culori to use native ES modules in Node.js for its inaugural 1.x release and realized — by carefully reading the documentation, haha just kidding, I promptly got a couple of hundred error messages — that all the imports need to use the full path, including the .js extension.

// From this:
import { interpolate } from '../interpolate';

// …to this:
import { interpolate } from '../interpolate.js';

Normally at this point I would despair at the prospect of changing hundreds of declarations by hand, but it was effortless with a transform:

imports-add-ext.js

/*
jscodeshift transform: adds the '.js' extension
to all import declarations with relative specifiers:

From './file' to './file.js', and
from '../file' to '../file.js'.
*/

module.exports = function (fileInfo, api) {
let j = api.jscodeshift;
return j(fileInfo.source)
.find(j.ImportDeclaration)
.forEach(path => {
let { source } = path.node;
if (source.value.match(/^\.{1,2}\//)) {
source.value += '.js';
}
})
.toSource();
}
node node_modules/.bin/jscodeshift --transform=imports-add-ext.js src/ test/

The beauty of writing your own transforms is they only have to be as comprehensive as the task requires. In fact, after changing the import declarations I realized I needed to address the export declarations as well. Eh.

Further reading

A few links from the bookmark archive, filtered for "codemods", that you may find useful: