Skip to content
qsx logo

QSX language specification

Introduction

This is an informal specification for version 0.2 of the QSX language, last updated Apr 10, 2024.

QSX is a query language based on CSS selector semantics, with a few changes and additions to help extract data from DOM objects.

Syntax introduced in QSX

Comma , splits selectors

In CSS selector syntax, the selector A, B matches an element that’s matched by either A or B. The selector is equivalent to the more verbose :is(A, B).

In QSX, comma-delimited selectors extract data in separate data structures. In essence, this allows us to extract more than one thing with a single query. The selector A, B fetches data as if we made queries for A and B separately.

To obtain the CSS semantics of the A, B selector, an explicit :is(A, B) selector must be used.

Example: select headings

querySelectorAll('h2, h3') extracts H2 and H3 headings in a single array:

[<h2>, <h3>, <h3>, <h2>,]

qsx(document, 'h2, h3') extracts H2 and H3 headings in separate arrays:

[
['<h2>…</h2>', '<h2>…</h2>',],
['<h3>…</h3>', '<h3>…</h3>',]
]

To get H2 and H3 headings in a single array, use qsx(document, ':is(h2, h3)'):

['<h2>…</h2>', '<h3>…</h3>', '<h3>…</h3>', '<h2>…</h2>',]

Curly brackets {…} introduce sub-scopes

The curly bracket characters {} are not used in CSS selectors directly, but for nested CSS they delimit sub-scopes.

The curly brackets similarly create sub-scopes in QSX. This enables extracting a DOM tree structure into a JSON with nested properties.

The selector A { B, C } is equivalent to:

Array.from(document.querySelectorAll('A')).map(scope => {
return [
Array.from(scope.querySelectorAll('B')).map(el => el.outerHTML),
Array.from(scope.querySelectorAll('C')).map(el => el.outerHTML)
];
});
Example: extract first and last column from each row in a table

To extract the first and last column from each row from a simple HTML table:

qsx(document, 'tr { > td:first-child, > td:last-child }');

[
// first row
[
['<td>…</td>'],
['<td>…</td>']
],

// second row
[
['<td>…</td>'],
['<td>…</td>']
],
// …
]

The implicit relationship between the outer selector and the inner selector is of descendance. The A { B } selector grabs elements matched by A B, albeit to a different JSON shape.

Other combinators (>, '+', '~', etc.) can be used to enable different kinds of relationships between the outer and inner selector.

Caret ^: match-first behavior

By default, all sub-parts of a QSX query function like querySelectorAll(), meaning they will return an array of all matching elements (match-all behavior). With nested queries, this can become unwieldy when the intent is to select the first matching element.

The ^ operator lets you opt into querySelector()-like behavior and select the first element matching the selector (match-first behavior).

At-sign @ addresses HTML attributes, DOM properties

The @ character is not used in CSS selectors.

In QSX, it is used to extract a HTML attribute or a DOM property.

A qsx() query doesn’t return DOM elements. The leaf nodes in the returned structure must be serializable to strings. In the absence of an HTML attribute or DOM property specifier, the outerHTML DOM property is implied. The selector A, B is therefore interpreted as A@.outerHTML, B@.outerHTML.

Raw DOM? A future version of the spec may allow returning DOM elements, perhaps with the element@ syntax without any attribute or property specifier.

Using explicit attribute/property selectors transforms the current scope from an Array into a key-value object, whose keys are:

The * attribute selector produces one key-value pair for each HTML attribute on the element.

Inside nested contexts, attribute and property selectors can be bare, in which case they apply to the element from the outer scope. The element { @attr } construct implies element { &@attr }.

When mixing explicit attribute/property selectors and bare selectors, the result of applying the bare selectors becomes available under the .scoped key.

Example: extract the URL and title from links
qsx(document, 'a { @href, @.textContent }')

[
{ href: '…', '.textContent': '…' },
{ href: '…', '.textContent': '…' },
// …
]

Matching by prop? CSS has an attribute selector [attr=value], which may be extended in a future version of the spec to allow matching props with [.prop=value] (issue #6).

Arrow => assigns keys to selectors

Values in the returned JSON object can be assigned keys using selector => key.

You can assign keys to individual HTML attributes and DOM properties.

Example: extract the URL and title from links (with names)

In this example, we assign names to the attribute and property extracted from anchor elements.

qsx(document, 'a { @href => url, @.textContent => title }')

[
{ url: '…', title: '…' },
{ url: '…', title: '…' },
// …
]

Assign keys to individual scoped selectors.

Example: first and last cell in rows (with names)
qsx(document, 'tr { td:first-child => first, td:last-child => last }');

Assign a key to the entire array of scoped selectors, which by default uses .scoped key.

Example: assign first and last cells to "cells" key
qsx(document, 'tr { @title, td:first-child, td:last-child } => cells');

The special . key name merges the object into the current context.

The spread ... operator, a shortcut for merging

The spread operator ... can be used to merge the subsequent object into the current context. It acts as syntactic sugar for the => . construct.

Example: spread operator

The two queries below are equivalent.

qsx(document, 'tr ...{ td:first-child, td:last-child }');
qsx(document, 'tr { td:first-child, td:last-child } => .');

Open questions

Changelog

Since version 0.1