Skip to content

Back to Releasing JavaScript

Writing a Node.js command-line tool

Some notes on writing a Node.js command-line (CLI) tool that follows the Unix philosophy, that is: does one thing well and communicates with other tools via streams of text. The notes are an accumulation of things I’ve learned while writing a few such tools throughout the years.

The samples in this article use some recent features in Node.js, some as new as v22. It is also assumed that the tool is written in ESM.

Setting up the script

The first step of turning a regular JavaScript file into a Node CLI tool is to make it executable:

# before: invoked as a script
node cli.js

# after: invoked as an executable
./cli.js

To do that, include #!/usr/bin/env node as the first line in the script. This is called a shebang and sets Node.js as the interpreter for the rest of the file.

File permissions need to include execution, which we add by running chmod u+x cli.js.

Finally, to expose the command-line tool to npm, we need to map it to a command name in the bin object in package.json. For a single command, it should generally coincide with the name of the package.

{
"name": "mytool",
"version": "1.0.0",
"bin": {
"mytool": "cli.js"
}
}

When publishing the mytool package to npm, users will be able to:

Reading command-line arguments

The arguments passed to the tool are available in the process.argv array. The first element is the path to the Node.js executable and the second is the path to the JavaScript file being executed, so the set of actual arguments is process.argv.slice(2).

The way to interpret these arguments is up to the author, but CLI tools adhere to some conventions, which we examine below.

Doing it like everybody else

The main source is the POSIX standard (Portable Operating System Interface), whose goal is to keep operating systems compatible by making sure a set of built-in utilities work the same everywhere. The standard defines three types of arguments:

  • Argument. In the shell command language, a parameter passed to a utility as the equivalent of a single string in the argv array created by one of the exec functions. An argument is one of the options, option-arguments, or operands following the command name.
  • Option. An argument to a command that is generally used to specify changes in the utility’s default behavior.
  • Option-argument. A parameter that follows certain options. In some cases an option-argument is included within the same argument string as the option — in most cases it is the next argument.
  • Operand. An argument to a command that is generally used as an object supplying information to a utility necessary to complete its processing. Operands generally follow the options in a command line.

The Open Group Base Specifications Issue 7, 2018 edition. 3. Definitions

The 12.2 Utility Syntax Guidelines section further explains how a command works.

GNU extends these guidelines with its own Standards for Command Line Interfaces, which specify:

Parsing command-line arguments

As modern command-line tools have what I’d call a GNU-flavored POSIX-informed interface, getting from process.argv.slice(2) to a clean set of operands and options is a bit involved. Thankfully there are a number of libraries that help with parsing arguments.

At a minimum, the arg-parsing library needs to know which options have a string value (ie. accept an option-argument), and which are simple booleans. Consider the basic fact that in a command like myfind -i index.md, without any further information, the index.md argument is ambiguous: depending on the tool semantics, it could either be an operand or an option-argument to the -i option.

That’s why for opsh, the arg-parsing library I wrote as an exercise in simplicity, the only thing you need to specify is which of the options are boolean:

import opsh from 'opsh';
const args = opsh('-i index.md'.split(' '), ['i']);
/* =>
{
operands: ['index.md'],
options: {
i: true
}
}
*/

In the meantime, Node.js has added a native solution with util.parseArgs() (v18.3+), which is slightly more verbose but packs many more features.

import { parseArgs } from 'node:util';
const args = parseArgs({
args: '-i index.md'.split(' '),
options: {
i: {
type: 'boolean'
}
}
});
/* =>
{
positionals: ['index.md'],
values: {
i: true
}
}
*/

There are many ingrained names for CLI options, but at a minimum any tool should support these:

Reading evironment variables

Another set of options that can potentially influence the behavior of your app comes from the user’s environment. This set of options is available in Node.js as proces.env.

An option commonly used in the Node.js ecosystem is NODE_ENV, with which you can enable optimized behavior across a variety of popular libraries with one declaration:

NODE_ENV=production node my-server.js

Reading input

Reading a file from the disk

The node:fs module handles all things file system, and the Promise-based node:fs/promises flavor works nicely with async/await.

When input files are specified as operands, the expectation is to resolve relative paths against the current working directory (CWD), available as process.cwd(). String paths passed to functions in the fs module are considered relative to the CWD, so this works out of the box:

import { readFile } from 'node:fs/promises';

const content = await readFile(operand, 'utf8');

At othe times, the tool may need to read a file packaged along with it. In this case, the relative path must be resolved against the requesting module’s path, and not against process.cwd(). In any ES module, the import.meta.url property offers an absolute file: URL to the module, and fs functions are equipped to work with these directly:

import { readFile } from 'node:fs/promises';

/*
Before: `templates/default.html` is relative to `process.cwd()`.
_Might_ work when running the tool locally but will fail
when the end user installs the tool globally.
*/

const template = await readFile('templates/default.html', 'utf8');

/*
After: the file is relative to the current path,
works locally as well as for the end user.
*/

const template = await readFile(
new URL('../templates/default.html', import.meta.url),
'utf8'
);

Importing a user-specified module

It’s usual for a CLI tool to accept a path to a JavaScript configuration file relative to the CWD. Dynamic import() expressions let you load modules at runtime, but unlike fs functions, import() expects paths relative to the current module.

The node:path module, which works with file system paths at an abstract level, offers the resolve() method that takes one or more paths or path segments and produces an absolute path. It throws in process.cwd() at the end for good measure, so the import() below works equally well with an absolute or a relative configPath.

import { resolve } from 'node:path';

let config = (await import(resolve(configPath))).default;
if (typeof config === 'function') {
config = await config();
}

The code to load a JavaScript configuration file relative to the current working directory. There’s also a provision to handle what we might find in that configuration file: either an object or an optionally-async function that returns one.

Fetching local JSON

There’s two ways to fetch a local JSON file, using either the classic fs.readFile() + JSON.parse(), or the newer JSON imports (v18.20+). For example, you might read the package.json file to display the version when the tool is invoked with the --version flag:

import { parseArgs } from 'node:util';
import pkg from './package.json' with { type: 'json' };

const { values } = parseArgs({
options: {
version: {
type: 'boolean',
short: 'v'
}
}
});

if (values.version) {
console.log(pkg.version);
}

The dynamic import() expression has the equivalent:

const ATTR_JSON = { with: { type: 'json' } };
const pkg = (await import('./package.json', ATTR_JSON)).default;

Expanding globs

The user’s shell can probably already expand glob patterns into a set of operands:

# glob is expanded by shell…
mytool pages/*.html

# …into something like below
mytool pages/some-file.html pages/some-other-file.html …

However, support of various glob features varies among shells. At the time of writing, macOS still packages bash 3.2.57, a version released circa 2006 that does not support the ** globstar operator. These discrepancies between shells can sometimes cause bugs that go unnoticed.

Rather than relying on shell expansion, some CLI tools may benefit from expanding globs themselves in a predictable way. The user can then provide a quoted glob operand that passes through to process.argv unchanged.

# quoted glob doesn’t get expanded by shell
mytool 'pages/**/*.html'

Node.js 22 added the glob() function to the node:fs module, plus a handy Array.fromAsync() method to ingest the async iterator returned by the function.

import { glob } from 'node:fs/promises';

const contentFiles = await Array.fromAsync(
glob('pages/**/*.html')
);

Earlier Node.js versions can use the popular fast-glob package or one of its alternatives.

Fetching an URL

Node.js 18 adds to its global scope a browser-compatible fetch() function which makes it easy to retrieve the content of a URL. For earlier Node.js versions, node-fetch is a helpful replacement.

It’s fine if your tool covers the basic scenario where you HTTP GET an URL. But a POST here, some credentials there, and things can get complicated fast. Unless fetching web pages is your tool’s main focus, don’t go overboard adding fetching options to the command-line interface. Instead, let the user defer to a dedicated tool such as curl or wget, which devote ample API space to this sort of configuration.

When working with web page content, JavaScript libraries to parse HTML may come in handy.

Two of my projects, percollate and hred, operate on the content of web pages. Even when the content is fetched with a separate tool, the page’s original URL remains important for resolving relative links. For both tools you can pass the URL as an option with -u <url> or --url=<url>:

curl https://danburzo.ro/ | hred "a@.href" -u https://danburzo.ro/

Writing output

Writing a file to the disk

Similar to reading a file from the disk, the writeFile() function resolves relative paths against the current working directory.

A concern specific to writing files is that writeFile() will throw an error when attempting to write inside a directory structure that doesn’t exist. This is a quirk that shouldn’t be the responsability of the user. Before writing files at a location specified by the user, we ensure the required directories exists with mkdir().

import { mkdir, writeFile } from 'node:fs/promises';
import { dirname } from 'node:path';

await mkdir(dirname(outputPath), { recursive: true });
const content = await writeFile(outputPath, 'some content');

When writing to a destination derived from an user-provided (--output=some-file-name.txt) or otherwise dynamic string, it’s important to obtain a filenames and directory structures that conform to the constraints imposed by the operating system. Packages such as slugify and filenamify are useful for this.

Working with input and output streams

Programs have access to a series of standard streams: the standard input (stdin) is a stream for incoming data, the standard output (stdout) is a stream for outgoing data, and the standard error stream (stderr) for diagnostics data. In Node.js, these three streams are available on the process object.

When you pipe programs together, one program’s stdout is connected to the subsequent program’s stdin, so these streams become important for communication between tools:

mytool | yourtool | theirtool

In the command pipeline above, the stdout stream from mytool is piped to the stdin stream of yourtool, whose output is in turn passed to theirtool.

Reading from stdin

A CLI tool can accept input from another program via the stdin stream. Here’s one way to consume it:

async function slurp(stream) {
// Switch to text mode, default is binary
stream.setEncoding('utf8');
return (await Array.fromAsync(stream)).join('');
}

// Usage:
const content = await slurp(process.stdin);

A function to slurp a Readable stream, using Array.fromAsync() for brevity. For Node.js versions prior to v22, replace it with a for await...of loop.

Reading from stdin can be combined with reading the files passed as operands. The POSIX utility guidelines suggest interpreting the hyphen operand as stdin:

Guideline 13: For utilities that use operands to represent files to be opened for either reading or writing, the - operand should be used to mean only standard input (or standard output when it is clear from context that an output file is being specified) or a file named -.

Another common pattern is to read from stdin when no operands are provided.

Writing to stdout and stderr

The global console object in Node.js outputs to these two streams as follows:

These methods introduce a newline character after each message. To control the newlines youself, you can output to these streams directly with process.stdout.write() and process.stderr.write() respectively.

The names of the output streams, as well as Node’s mapping of console methods to them, can somewhat obscure their purpose. To decide whether to output some information to stdout or stderr, consider the following:

A good rule of thumb is that stdout is for content and stderr is for meta (informations on the step taken, any errors encountered, etc.).

It’s common to control the amount of output with --verbose, --quiet, and --debug CLI options.

Further reading

To learn more about how your CLI tool should work, I recommend Command Line Interface Guidelines, a comprehensive open-source guide to help you write better command-line programs, taking traditional UNIX principles and updating them for the modern day.

12 Factor CLI apps by Jeff Dickey is also a widely-cited set of guidelines.

On the implementation side, Dr. Axel Rauschmayer has an entire book about Shell scripting with Node.js, free to read online.