Skip to content
ltr logo

Segment texts from the command line

Ltr is a simple command-line tool that segments plain text into characters, words, or sentences, using the Intl.Segmenter JavaScript API, respecting language rules.

It’s more accurate than artisanal solutions for detecting word and sentence boundaries. I use it to look at character/word frequencies in the digitization process for texts published to llll, which helps me spot OCR errors, typos, and other inconsistencies.

Ltr pairs well with Trimd and Hred for working with Markdown and HTML input.

Ltr runs in Node.js and can be installed globally with npm:

npm install -g ltr

The full documentation is available on the GitHub repository page.

Colophon: The Ltr wordmark is typeset in IBM Plex Mono, a typeface by Bold Monday.

Related projects

Trimd logo Trimd

Convert between HTML and Markdown from the command line, plus a matching online tool.

Hred logo Hred

Extract data from HTML/XML as JSON from the command line, using QSX.