Command-Line Tools
Alternatives to, and complements for, standard CLI tools
find
grep
ripgrep
git grep
ack
curl
Data formats
gron
— makes JSON greppable
Benchmarking
General purpose
puppeteer
Run a headless version of Chrome from Node.js
Working with specific formats
See also: structured text tools
jq
For processing JSON files. Reshaping JSON with jq
.
pup
For processing HTML. It reads from stdin, prints to stdout, and allows the user to filter parts of the page using CSS selectors.
Should work great with wget
for web page data extraction.
fonttools
Does TTF/OTF conversion to and from XML. This allows you to edit fonts (e.g. metadata) in plain-text and then rebuild them.
osmosis
Filter & merge OpenStreetMap data files (XML, PBF).
electron-pdf
Generate a PDF from an URL, HTML or Markdown file.
textkit
For manipulating and analyzing text.
monolith
For saving complete web pages as a single HTML file.
csvkit
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
Utilities
For de-warping scans
unproject_text: perspective recovery of text using transformed ellipses. Write-up.
page_dewarp: page dewarping and thresholding using a "cubic sheet" model. Write-up.
For upscaling images
RAISR: Google Rapid and Accurate Image Super Resolution is a technique to use Machine Learning to upscale images. There are a few implementations of the algorithm on GitHub: movehand/raisr, MKFMIKU/RAISR