Skip to content
qsx logo

QSX: a CSS selector language for DOM extraction

CSS selectors are an expressive way to match a set of elements in the DOM with the querySelector*() methods, but the API is geared towards fetching a flat list of DOM elements.

QSX (Query Selector eXtended) is a lightweight extension to the selector syntax that’s useful for extracting things from the DOM into structured JSON data. It introduces nested structures, the ability to grab HTML attributes and DOM properties, and basic reshaping of the resulting JSON — all in a compact format that’s ideal for command-line usage.

Explainer

An informal QSX language specification is available.

Implementation

An initial, slightly out-of-date implementation is available on GitHub at danburzo/qsx.

This implementation is used in the hred command-line tool, which works pretty well for day-to-day scraping from HTML and XML. Its features are dependent on the DOM environment available to jsdom, whose querySelector*() lags behind browsers in terms of CSS selectors.

A full realization of QSX 1.0, and further iteration on the spec, will be possible when I manage to wrap up selery, my CSS selector parser engine.

Feedback

Feedback on the specification and/or reference implementation is appreciated. You can contact me or open an issue in GitHub.


Colophon: The QSX logo is built with glyphs from LTR Principia, the brutally seriffed typeface by Erik van Blokland.

Related projects

Hred logo Hred

Extract data from HTML/XML as JSON from the command line, using QSX.

Selery logo Selery

A CSS selector parser and DOM query engine.