Segment texts from the command line
Ltr is a simple command-line tool that segments plain text into characters, words, or sentences, using the Intl.Segmenter JavaScript API, respecting language rules.
It’s more accurate than artisanal solutions for detecting word and sentence boundaries. I use it to look at character/word frequencies in the digitization process for texts published to llll, which helps me spot OCR errors, typos, and other inconsistencies.
Ltr pairs well with Trimd and Hred for working with Markdown and HTML input.
Ltr runs in Node.js and can be installed globally with npm:
npm install -g ltr
The full documentation is available on the GitHub repository page.
Colophon: The Ltr wordmark is typeset in IBM Plex Mono, a typeface by Bold Monday.
Related projects
Convert between HTML and Markdown from the command line, plus a matching online tool.