← Back to Releasing JavaScript
Writing a Node.js command-line tool
Some notes on writing a Node.js command-line (CLI) tool that follows the Unix philosophy, that is: does one thing well and communicates with other tools via streams of text. The notes are an accumulation of things I’ve learned while writing a few such tools throughout the years.
The samples in this article use some recent features in Node.js, some as new as v22. It is also assumed that the tool is written in ESM.
Setting up the script
The first step of turning a regular JavaScript file into a Node CLI tool is to make it executable:
# before: invoked as a script
node cli.js
# after: invoked as an executable
./cli.js
To do that, include #!/usr/bin/env node
as the first line in the script. This is called a shebang and sets Node.js as the interpreter for the rest of the file.
File permissions need to include execution, which we add by running chmod u+x cli.js
.
Finally, to expose the command-line tool to npm, we need to map it to a command name in the bin
object in package.json
. For a single command, it should generally coincide with the name of the package.
{
"name": "mytool",
"version": "1.0.0",
"bin": {
"mytool": "cli.js"
}
}
When publishing the mytool
package to npm, users will be able to:
- install the package globally with
npm install -g mytool
, and the CLI tool will be added to the user’sPATH
under the key defined in thebin
object. This makes themytool
command available globally. - add the package as a dependency to their project with
npm install mytool
, and usenpx mytool
to run the tool. - run
npx mytool
directly, which will fetch the package without installing it and run the tool.
Reading command-line arguments
The arguments passed to the tool are available in the process.argv
array. The first element is the path to the Node.js executable and the second is the path to the JavaScript file being executed, so the set of actual arguments is process.argv.slice(2)
.
The way to interpret these arguments is up to the author, but CLI tools adhere to some conventions, which we examine below.
Doing it like everybody else
The main source is the POSIX standard (Portable Operating System Interface), whose goal is to keep operating systems compatible by making sure a set of built-in utilities work the same everywhere. The standard defines three types of arguments:
The 12.2 Utility Syntax Guidelines section further explains how a command works.
- a utility is invoked to operate on some input (the operands);
- its behavior can be tweaked via options specified with a hyphen followed by a letter or digit, eg.
-i
or-0
. - some of these options need an additional option-argument, and others will act as simple toggles.
- the
--
(double-hyphen) argument acts as a delimiter. Everyhing beyond this argument is interpreted as an operand, even if it starts with a hyphen and it would normally be considered an option.
GNU extends these guidelines with its own Standards for Command Line Interfaces, which specify:
- GNU-style long options prefixed with
--
(double hyphen), and whose option-argument is specified as either--option value
or--option=value
(as seen with thegawk
utility). This style is meant to make the interfaces to command-line tools clearer. - where POSIX expects all operands appear after all options, GNU allows us to intermingle options and operands freely, except after the
--
argument, beyond which everything is treated as an operand.
Parsing command-line arguments
As modern command-line tools have what I’d call a GNU-flavored POSIX-informed interface, getting from process.argv.slice(2)
to a clean set of operands and options is a bit involved. Thankfully there are a number of libraries that help with parsing arguments.
At a minimum, the arg-parsing library needs to know which options have a string value (ie. accept an option-argument), and which are simple booleans. Consider the basic fact that in a command like myfind -i index.md
, without any further information, the index.md
argument is ambiguous: depending on the tool semantics, it could either be an operand or an option-argument to the -i
option.
That’s why for opsh
, the arg-parsing library I wrote as an exercise in simplicity, the only thing you need to specify is which of the options are boolean:
import opsh from 'opsh';
const args = opsh('-i index.md'.split(' '), ['i']);
/* =>
{
operands: ['index.md'],
options: {
i: true
}
}
*/
In the meantime, Node.js has added a native solution with util.parseArgs()
(v18.3+), which is slightly more verbose but packs many more features.
import { parseArgs } from 'node:util';
const args = parseArgs({
args: '-i index.md'.split(' '),
options: {
i: {
type: 'boolean'
}
}
});
/* =>
{
positionals: ['index.md'],
values: {
i: true
}
}
*/
There are many ingrained names for CLI options, but at a minimum any tool should support these:
-h
and--help
display the usage information;-v
and--version
display the current tool version.
Reading evironment variables
Another set of options that can potentially influence the behavior of your app comes from the user’s environment. This set of options is available in Node.js as proces.env
.
An option commonly used in the Node.js ecosystem is NODE_ENV
, with which you can enable optimized behavior across a variety of popular libraries with one declaration:
NODE_ENV=production node my-server.js
Reading input
Reading a file from the disk
The node:fs
module handles all things file system, and the Promise-based node:fs/promises
flavor works nicely with async/await
.
When input files are specified as operands, the expectation is to resolve relative paths against the current working directory (CWD), available as process.cwd()
. String paths passed to functions in the fs
module are considered relative to the CWD, so this works out of the box:
import { readFile } from 'node:fs/promises';
const content = await readFile(operand, 'utf8');
At othe times, the tool may need to read a file packaged along with it. In this case, the relative path must be resolved against the requesting module’s path, and not against process.cwd()
. In any ES module, the import.meta.url
property offers an absolute file:
URL to the module, and fs
functions are equipped to work with these directly:
import { readFile } from 'node:fs/promises';
/*
Before: `templates/default.html` is relative to `process.cwd()`.
_Might_ work when running the tool locally but will fail
when the end user installs the tool globally.
*/
const template = await readFile('templates/default.html', 'utf8');
/*
After: the file is relative to the current path,
works locally as well as for the end user.
*/
const template = await readFile(
new URL('../templates/default.html', import.meta.url),
'utf8'
);
Importing a user-specified module
It’s usual for a CLI tool to accept a path to a JavaScript configuration file relative to the CWD. Dynamic import()
expressions let you load modules at runtime, but unlike fs
functions, import()
expects paths relative to the current module.
The node:path
module, which works with file system paths at an abstract level, offers the resolve()
method that takes one or more paths or path segments and produces an absolute path. It throws in process.cwd()
at the end for good measure, so the import()
below works equally well with an absolute or a relative configPath
.
Fetching local JSON
There’s two ways to fetch a local JSON file, using either the classic fs.readFile()
+ JSON.parse()
, or the newer JSON imports (v18.20+). For example, you might read the package.json
file to display the version when the tool is invoked with the --version
flag:
import { parseArgs } from 'node:util';
import pkg from './package.json' with { type: 'json' };
const { values } = parseArgs({
options: {
version: {
type: 'boolean',
short: 'v'
}
}
});
if (values.version) {
console.log(pkg.version);
}
The dynamic import()
expression has the equivalent:
const ATTR_JSON = { with: { type: 'json' } };
const pkg = (await import('./package.json', ATTR_JSON)).default;
Expanding globs
The user’s shell can probably already expand glob patterns into a set of operands:
# glob is expanded by shell…
mytool pages/*.html
# …into something like below
mytool pages/some-file.html pages/some-other-file.html …
However, support of various glob features varies among shells. At the time of writing, macOS still packages bash
3.2.57, a version released circa 2006 that does not support the **
globstar operator. These discrepancies between shells can sometimes cause bugs that go unnoticed.
Rather than relying on shell expansion, some CLI tools may benefit from expanding globs themselves in a predictable way. The user can then provide a quoted glob operand that passes through to process.argv
unchanged.
# quoted glob doesn’t get expanded by shell
mytool 'pages/**/*.html'
Node.js 22 added the glob()
function to the node:fs
module, plus a handy Array.fromAsync()
method to ingest the async iterator returned by the function.
import { glob } from 'node:fs/promises';
const contentFiles = await Array.fromAsync(
glob('pages/**/*.html')
);
Earlier Node.js versions can use the popular fast-glob
package or one of its alternatives.
Fetching an URL
Node.js 18 adds to its global scope a browser-compatible fetch()
function which makes it easy to retrieve the content of a URL. For earlier Node.js versions, node-fetch
is a helpful replacement.
It’s fine if your tool covers the basic scenario where you HTTP GET an URL. But a POST here, some credentials there, and things can get complicated fast. Unless fetching web pages is your tool’s main focus, don’t go overboard adding fetching options to the command-line interface. Instead, let the user defer to a dedicated tool such as curl
or wget
, which devote ample API space to this sort of configuration.
When working with web page content, JavaScript libraries to parse HTML may come in handy.
Two of my projects,
percollate
andhred
, operate on the content of web pages. Even when the content is fetched with a separate tool, the page’s original URL remains important for resolving relative links. For both tools you can pass the URL as an option with-u <url>
or--url=<url>
:curl https://danburzo.ro/ | hred "a@.href" -u https://danburzo.ro/
Writing output
Writing a file to the disk
Similar to reading a file from the disk, the writeFile()
function resolves relative paths against the current working directory.
A concern specific to writing files is that writeFile()
will throw an error when attempting to write inside a directory structure that doesn’t exist. This is a quirk that shouldn’t be the responsability of the user. Before writing files at a location specified by the user, we ensure the required directories exists with mkdir()
.
import { mkdir, writeFile } from 'node:fs/promises';
import { dirname } from 'node:path';
await mkdir(dirname(outputPath), { recursive: true });
const content = await writeFile(outputPath, 'some content');
When writing to a destination derived from an user-provided (--output=some-file-name.txt
) or otherwise dynamic string, it’s important to obtain a filenames and directory structures that conform to the constraints imposed by the operating system. Packages such as slugify and filenamify are useful for this.
Working with input and output streams
Programs have access to a series of standard streams: the standard input (stdin
) is a stream for incoming data, the standard output (stdout
) is a stream for outgoing data, and the standard error stream (stderr
) for diagnostics data. In Node.js, these three streams are available on the process
object.
When you pipe programs together, one program’s stdout
is connected to the subsequent program’s stdin
, so these streams become important for communication between tools:
Reading from stdin
A CLI tool can accept input from another program via the stdin
stream. Here’s one way to consume it:
Reading from stdin
can be combined with reading the files passed as operands. The POSIX utility guidelines suggest interpreting the hyphen operand as stdin
:
Guideline 13: For utilities that use operands to represent files to be opened for either reading or writing, the
-
operand should be used to mean only standard input (or standard output when it is clear from context that an output file is being specified) or a file named-
.
Another common pattern is to read from stdin
when no operands are provided.
Writing to stdout
and stderr
The global console
object in Node.js outputs to these two streams as follows:
console.log()
andconsole.info()
go tostdout
;console.warn()
andconsole.error()
go tostderr
.
These methods introduce a newline character after each message. To control the newlines youself, you can output to these streams directly with process.stdout.write()
and process.stderr.write()
respectively.
The names of the output streams, as well as Node’s mapping of console methods to them, can somewhat obscure their purpose. To decide whether to output some information to stdout
or stderr
, consider the following:
stdout
andstderr
are both printed to the terminal when the output is not piped into another program;- in a command pipeline, only
stdout
will be piped to the next program’sstdin
, whilestderr
will be printed to the terminal.
A good rule of thumb is that stdout
is for content and stderr
is for meta (informations on the step taken, any errors encountered, etc.).
It’s common to control the amount of output with
--verbose
,--quiet
, and--debug
CLI options.
Further reading
To learn more about how your CLI tool should work, I recommend Command Line Interface Guidelines, a comprehensive open-source guide to help you write better command-line programs, taking traditional UNIX principles and updating them for the modern day
.
12 Factor CLI apps by Jeff Dickey is also a widely-cited set of guidelines.
On the implementation side, Dr. Axel Rauschmayer has an entire book about Shell scripting with Node.js, free to read online.