QSX language specification
Introduction
This is an informal specification for version 0.2 of the QSX language, last updated Apr 10, 2024.
QSX is a query language based on CSS selector semantics, with a few changes and additions to help extract data from DOM objects.
Syntax introduced in QSX
Comma , splits selectors
In CSS selector syntax, the selector A, B
matches an element that’s matched by either A
or B
. The selector is equivalent to the more verbose :is(A, B)
.
In QSX, comma-delimited selectors extract data in separate data structures. In essence, this allows us to extract more than one thing with a single query. The selector A, B
fetches data as if we made queries for A
and B
separately.
To obtain the CSS semantics of the A, B
selector, an explicit :is(A, B)
selector must be used.
Example: select headings
querySelectorAll('h2, h3')
extracts H2 and H3 headings in a single array:
[<h2>, <h3>, <h3>, <h2>, …]
qsx(document, 'h2, h3')
extracts H2 and H3 headings in separate arrays:
[
['<h2>…</h2>', '<h2>…</h2>', …],
['<h3>…</h3>', '<h3>…</h3>', …]
]
To get H2 and H3 headings in a single array, use qsx(document, ':is(h2, h3)')
:
['<h2>…</h2>', '<h3>…</h3>', '<h3>…</h3>', '<h2>…</h2>', …]
Curly brackets {…} introduce sub-scopes
The curly bracket characters {}
are not used in CSS selectors directly, but for nested CSS they delimit sub-scopes.
The curly brackets similarly create sub-scopes in QSX. This enables extracting a DOM tree structure into a JSON with nested properties.
The selector A { B, C }
is equivalent to:
Array.from(document.querySelectorAll('A')).map(scope => {
return [
Array.from(scope.querySelectorAll('B')).map(el => el.outerHTML),
Array.from(scope.querySelectorAll('C')).map(el => el.outerHTML)
];
});
Example: extract first and last column from each row in a table
To extract the first and last column from each row from a simple HTML table:
qsx(document, 'tr { > td:first-child, > td:last-child }');
[
// first row
[
['<td>…</td>'],
['<td>…</td>']
],
// second row
[
['<td>…</td>'],
['<td>…</td>']
],
// …
]
The implicit relationship between the outer selector and the inner selector is of descendance. The A { B }
selector grabs elements matched by A B
, albeit to a different JSON shape.
Other combinators (>
, '+', '~', etc.) can be used to enable different kinds of relationships between the outer and inner selector.
Caret ^: match-first behavior
By default, all sub-parts of a QSX query function like querySelectorAll()
, meaning they will return an array of all matching elements (match-all behavior). With nested queries, this can become unwieldy when the intent is to select the first matching element.
The ^ operator lets you opt into querySelector()
-like behavior and select the first element matching the selector (match-first behavior).
At-sign @ addresses HTML attributes, DOM properties
The @ character is not used in CSS selectors.
In QSX, it is used to extract a HTML attribute or a DOM property.
element@attr
extracts the value of theattr
HTML attribute fromelement
;element@*
extracts all HTML attributes fromelement
;element@.prop
extracts theprop
DOM property fromelement
.
A qsx()
query doesn’t return DOM elements. The leaf nodes in the returned structure must be serializable to strings. In the absence of an HTML attribute or DOM property specifier, the outerHTML
DOM property is implied. The selector A, B
is therefore interpreted as A@.outerHTML, B@.outerHTML
.
Raw DOM? A future version of the spec may allow returning DOM elements, perhaps with the
element@
syntax without any attribute or property specifier.
Using explicit attribute/property selectors transforms the current scope from an Array into a key-value object, whose keys are:
attr
for an attribute selectorelement@attr
,.prop
for a property selectorelement@.prop
.
The *
attribute selector produces one key-value pair for each HTML attribute on the element.
Inside nested contexts, attribute and property selectors can be bare, in which case they apply to the element from the outer scope. The element { @attr }
construct implies element { &@attr }
.
When mixing explicit attribute/property selectors and bare selectors, the result of applying the bare selectors becomes available under the .scoped
key.
Example: extract the URL and title from links
qsx(document, 'a { @href, @.textContent }')
[
{ href: '…', '.textContent': '…' },
{ href: '…', '.textContent': '…' },
// …
]
Matching by prop? CSS has an attribute selector
[attr=value]
, which may be extended in a future version of the spec to allow matching props with[.prop=value]
(issue #6).
Arrow => assigns keys to selectors
Values in the returned JSON object can be assigned keys using selector => key
.
You can assign keys to individual HTML attributes and DOM properties.
Example: extract the URL and title from links (with names)
In this example, we assign names to the attribute and property extracted from anchor elements.
qsx(document, 'a { @href => url, @.textContent => title }')
[
{ url: '…', title: '…' },
{ url: '…', title: '…' },
// …
]
Assign keys to individual scoped selectors.
Example: first and last cell in rows (with names)
qsx(document, 'tr { td:first-child => first, td:last-child => last }');
Assign a key to the entire array of scoped selectors, which by default uses .scoped
key.
Example: assign first and last cells to "cells"
key
qsx(document, 'tr { @title, td:first-child, td:last-child } => cells');
The special . key name merges the object into the current context.
The spread ... operator, a shortcut for merging
The spread operator ... can be used to merge the subsequent object into the current context. It acts as syntactic sugar for the => . construct.
Example: spread operator
The two queries below are equivalent.
qsx(document, 'tr ...{ td:first-child, td:last-child }');
qsx(document, 'tr { td:first-child, td:last-child } => .');
Open questions
- What’s a good mechanism to select text nodes? (Issue #8)
- Often the
textContent
prop contains a bunch of trailing whitespace, should there be a way to trim that off (e.g.@.textContent.trim
?)
Changelog
Since version 0.1
- Cleaned up the initial specification to remove outdated statements