Skip to content

Back to Toolbox

hred recipes

hred extracts JSON data from HTML and XML documents using QSX, a query language inspired by CSS selectors. hred accepts its input via the standard input, so it must be fetched externally, with curl or a similar tool.

Working with XML

Atom and RSS feeds

Below are abridged versions of typical Atom and RSS feeds, with a focus on the URLs of the individual posts.

<!-- Atom -->
<feed>
	<entry>
		<title>My post</title>
		<link href="https://example.com/my-post"/>
	</entry>
</feed>

<!-- RSS -->
<rss>
	<channel>
		<item>
			<title>My post</title>
			<link>https://example.com/my-post</link>
		</item>
	</channel>
</rss>

To extract these URLs with hred, one per line:

# Atom feed
curl https://example.com/posts.xml | hred -xcr 'entry > link:is([rel=alternate],:not([rel]))@href';

# RSS feed
curl https://example.com/posts.xml | hred -xcr 'item > link:is([rel=alternate],:not([rel]))@.textContent';

The -xcr set of flags is short for:

The approach can be adapted to extracting URLs from sitemap.xml files, or from feed subscription lists in OPML format.

See also

The JSON produced by hred can be further procesed with jq.