Skip to content

Back to Toolbox

Ripgrep recipes

Ripgrep is one of my favorite command-line tools. Primarily a better grep, its search & replace features make it useful for much more. On macOS you can install it with Homebrew:

brew install ripgrep

Ripgrep is available at the command line as rg. There's a Guide available, including some of the command-line options. A complete list of command-line options is printed by ripgrep --help.

Recipes

Move localized Markdown sources from one file structure to another

With the Hugo static site generator, your localized posts might be organized with language suffixes:

src/
posts/
post-1/
index.de.md
index.en.md
index.es.md
index.it.md

On the other hand, Eleventy recommends organizing languages in separate folders:

src/
de/posts/post-1/index.md
en/posts/post-1/index.md
es/posts/post-1/index.md
it/posts/post-1/index.md

To migrate from the Hugo file structure to the Eleventy file structure, we can use rg to produce the necessary arguments for two standard Unix command-line tools that will get the job done:

  1. mkdir -p, that creates the appropriate folder structure; and
  2. mv, that moves/renames the source files.

For the first step we need a list of folder structures to create:

find src/**/index.*.md | rg "src/(.+)/index.(en|es|it|de).md" --replace 'src/$2/$1'

src/de/posts/post-1
src/en/posts/post-1
src/es/posts/post-1
src/it/posts/post-1

We can pipe the output of this command to mkdir -p, which recursively creates all these folders, using xargs:

find src/**/index.*.md | rg "src/(.+)/index.(en|es|it|de).md" --replace 'src/$2/$1' | xargs mkdir -p

For the second step we'll be using mv, which needs two arguments: the source and the destination path. We can adapt the first command to produce the original match (via $0), along with the replacement, on separate lines:

find src/**/index.*.md | rg "src/(.+)/index.(en|es|it|de).md" --replace $'$0\nsrc/$2/$1/index.md'

src/posts/post-1/index.de.md
src/de/posts/post-1/index.md

src/posts/post-1/index.en.md
src/en/posts/post-1/index.md

src/posts/post-1/index.es.md
src/es/posts/post-1/index.md

src/posts/post-1/index.it.md
src/it/posts/post-1/index.md

Let's unpack the --replace argument:

We print the source and destination paths on separate lines because that allows us to pipe the output to mv with xargs -n2, that is to take the input two lines at a time and use these lines as the two arguments to mv:

find src/**/index.*.md | rg "src/(.+)/index.(en|es|it|de).md" --replace $'$0\nsrc/$2/$1/index.md' | xargs -n2 mv

Et voilà!

Using hyperglot to find how many glyphs a font is missing for supporting each language

This recipe is very specific but I'm adding it because beyond hyperglot's output, it's mainly about wrangling text with a combination of CLI tools.

Hyperglot is a database and tools for detecting language support in fonts. Its --verbose output contains information about which characters are missing from the font for it to support a certain language, with lines in the form of:

hyperglot font.otf --verbose 2>&1
Missing from base language ron: ș (537) Ș (536) ț (539) ă (259) Ț (538) Ă (258)

Can we match the language code and count the missing characters from lines such as the one above?

hyperglot font.otf --verbose 2>&1 |\
rg 'Missing from base language (.+):|\((\d+)\)' -or '$1$2' |\
rg --passthrough '\d+' -r '????' |\
uniq -c |\
paste - - |\
rg '1 ([a-z]+)[^\d]+(\d+)' -or '$2 $1' |\
sort -n

The command does the following:

  1. Extracts the important bits — the language code on the one hand, and the set of Unicode character numbers on the other — to separate lines, by way of two capturing groups. In the command, -or is short for --only-matching --replace.
  2. Replaces all lines matching a number with the same sequence of characters, so we can count the lines. The sequence itself doesn't really matter; here I'm using ????. Lines that don't match remain unchanged via the --passthrough flag.
  3. Counts the occurrences of consecutive matching lines. This produces all the data we need, but it's in an awkward order.
  4. To coax the data into a more useful shape, pastes each pair of consecutive lines side by side on a single line, then extracts the important bits — the language code, and the character count — and prints them in reverse order, via $2 $1.
  5. Sorts by the first number on each line with sort -n.

And there we have it. Here's an example output:

1 aht
1 cab
1 cak
1 chj
(...)
4 dga
4 ebu
4 fat
4 fuv
(...)
242 tir
300 vai
756 kor
(...)
11172 kor

See also