← Back to Toolbox

Ripgrep recipes

Ripgrep is one of my favorite command-line tools. Primarily a better grep, its search & replace features make it useful for much more. On macOS you can install it with Homebrew:

brew install ripgrep

Ripgrep is available at the command line as rg. There's a Guide available, including some of the command-line options. A complete list of command-line options is printed by ripgrep --help.

Recipes

Move localized Markdown sources from one file structure to another

With the Hugo static site generator, your localized posts might be organized with language suffixes:

src/
  posts/
    post-1/
      index.de.md
      index.en.md
      index.es.md
      index.it.md

On the other hand, Eleventy recommends organizing languages in separate folders:

src/
  de/posts/post-1/index.md
  en/posts/post-1/index.md
  es/posts/post-1/index.md
  it/posts/post-1/index.md

To migrate from the Hugo file structure to the Eleventy file structure, we can use rg to produce the necessary arguments for two standard Unix command-line tools that will get the job done:

mkdir -p, that creates the appropriate folder structure; and
mv, that moves/renames the source files.

For the first step we need a list of folder structures to create:

find src/**/index.*.md | rg "src/(.+)/index.(en|es|it|de).md" --replace 'src/$2/$1'

	src/de/posts/post-1
	src/en/posts/post-1
	src/es/posts/post-1
	src/it/posts/post-1

We can pipe the output of this command to mkdir -p, which recursively creates all these folders, using xargs:

find src/**/index.*.md | rg "src/(.+)/index.(en|es|it|de).md" --replace 'src/$2/$1' | xargs mkdir -p

For the second step we'll be using mv, which needs two arguments: the source and the destination path. We can adapt the first command to produce the original match (via $0), along with the replacement, on separate lines:

find src/**/index.*.md | rg "src/(.+)/index.(en|es|it|de).md" --replace $'$0\nsrc/$2/$1/index.md'

	src/posts/post-1/index.de.md
	src/de/posts/post-1/index.md

	src/posts/post-1/index.en.md
	src/en/posts/post-1/index.md

	src/posts/post-1/index.es.md
	src/es/posts/post-1/index.md

	src/posts/post-1/index.it.md
	src/it/posts/post-1/index.md

Let's unpack the --replace argument:

$'…' is needed so we can insert the newline character \n;
$0 is the original string, printed on the first line;
for the second line, src/$2/$1/index.md shuffles around the groups captured in the matched pattern into the desired configuration.

We print the source and destination paths on separate lines because that allows us to pipe the output to mv with xargs -n2, that is to take the input two lines at a time and use these lines as the two arguments to mv:

find src/**/index.*.md | rg "src/(.+)/index.(en|es|it|de).md" --replace $'$0\nsrc/$2/$1/index.md' | xargs -n2 mv

Et voilà!

Using `hyperglot` to find how many glyphs a font is missing for supporting each language

This recipe is very specific but I'm adding it because beyond hyperglot's output, it's mainly about wrangling text with a combination of CLI tools.

Hyperglot is a database and tools for detecting language support in fonts. Its --verbose output contains information about which characters are missing from the font for it to support a certain language, with lines in the form of:

hyperglot font.otf --verbose 2>&1

Missing from base language ron: ș (537) Ș (536) ț (539) ă (259) Ț (538) Ă (258)

Can we match the language code and count the missing characters from lines such as the one above?

hyperglot font.otf --verbose 2>&1 |\
	rg 'Missing from base language (.+):|\((\d+)\)' -or '$1$2' |\
	rg --passthrough '\d+' -r '????' |\
	uniq -c |\
	paste - - |\
	rg '1 ([a-z]+)[^\d]+(\d+)' -or '$2 $1' |\
	sort -n

The command does the following:

Extracts the important bits — the language code on the one hand, and the set of Unicode character numbers on the other — to separate lines, by way of two capturing groups. In the command, -or is short for --only-matching --replace.
Replaces all lines matching a number with the same sequence of characters, so we can count the lines. The sequence itself doesn't really matter; here I'm using ????. Lines that don't match remain unchanged via the --passthrough flag.
Counts the occurrences of consecutive matching lines. This produces all the data we need, but it's in an awkward order.
To coax the data into a more useful shape, pastes each pair of consecutive lines side by side on a single line, then extracts the important bits — the language code, and the character count — and prints them in reverse order, via $2 $1.
Sorts by the first number on each line with sort -n.

And there we have it. Here's an example output:

1 aht
1 cab
1 cak
1 chj
(...)
4 dga
4 ebu
4 fat
4 fuv
(...)
242 tir
300 vai
756 kor
(...)
11172 kor

Ripgrep recipes

Recipes

Move localized Markdown sources from one file structure to another

Using hyperglot to find how many glyphs a font is missing for supporting each language

See also

Using `hyperglot` to find how many glyphs a font is missing for supporting each language