Alternatives to, and complements for, standard CLI tools
gron— makes JSON greppable
Run a headless version of Chrome from Node.js
Working with specific formats
See also: structured text tools
For processing JSON files. Reshaping JSON with
For processing HTML. It reads from stdin, prints to stdout, and allows the user to filter parts of the page using CSS selectors.
Should work great with
wget for web page data extraction.
Does TTF/OTF conversion to and from XML. This allows you to edit fonts (e.g. metadata) in plain-text and then rebuild them.
Filter & merge OpenStreetMap data files (XML, PBF).
Generate a PDF from an URL, HTML or Markdown file.
For manipulating and analyzing text.
For saving complete web pages as a single HTML file.
A suite of utilities for converting to and working with CSV, the king of tabular file formats.
For de-warping scans
unproject_text: perspective recovery of text using transformed ellipses. Write-up.
page_dewarp: page dewarping and thresholding using a "cubic sheet" model. Write-up.
For upscaling images
RAISR: Google Rapid and Accurate Image Super Resolution is a technique to use Machine Learning to upscale images. There are a few implementations of the algorithm on GitHub: movehand/raisr, MKFMIKU/RAISR