Today I learned

Yak shaving

Recently I've made some updates to Marcel, my personal take on static site generators. I was fond of the elegance of the Twig templating language and was looking for an equivalent. For JavaScript there's Nunjucks which is pretty great, but rather limited. With modest outcomes in trying to understand its parser and extend it to make Nunjucks more like Twig, I'm thinking: ‘how hard can it be to implement a template language from scratch?’

(Greek choir goes mad at this point.)

‘Since I'm basically reinventing the SSG wheel with Marcel, I might as well go a level deeper into yak shaving, right? It's not like there's a deadline to this thing.’

I vacillated around a set of names — feuille (very Proustian but hard to write and pronounce), enaml (which turns out is the name of something else), lagoml ("just enough", but sort of not good enough), fka-twig (coughs intently), and finally the perfect name that is Sontag.

I'm working at the limits of my knowledge here, so it's a learning experience.

Parsing JavaScript

With Sontag, I was curious how much of template parsing you can offload to a JavaScript parser. With some light tokenizing of tags, comments, and expressions, and then evaluating expressions as JavaScript, turns out you can go pretty far. Tools that I found essential in the process: acorn to parse the JavaScript to an ECMAScript Abstract Syntax Tree, acorn-walk and estraverse to walk the tree and swap nodes around, then finally astring to turn the tree back into a string.

How do you evaluate JavaScript? eval() feels icky even for a language that ostensibly treats security as a non-goal. The slighly less bad, and more flexible, approach is to create a new function with a certain body, and then invoke it to get the result:

let fn = new Function(`return ${some_expression}`);; // => result

Where's AsyncFunction though? Unlike Function, the AsyncFunction constructor is not a global object. Instead, you'll need to pick it up from an actual async function's constructor:

let AsyncFunction = Object.getPrototypeOf(async () => {}).constructor;

let fn = new AsyncFunction(`return await ${some_expression}`);
await; // => result

Getting comfortable with regular expressions

Regular expressions are brilliant for matching expressions whose vocabulary is rather contained. Robust regular expressions can become a bit hard to read and debug, though. I found RegExr to be quite helpful in authoring and understanding them. Some surprises:

.+ (any character) does not match \n (newline) characters. Apparently this is a whole thing, and I was momentarily excited there's a /s flag to address it, but it's not supported across the board yet. So naturally I reach for [.\n], a.k.a. any character or the newline character (or so I thought). But,

The . in [.\n] is just a dot. When inside a character set, . loses its special status. TIL.

Supplanting \s. The xregexp library implements the /s flag, and also allows you to name and annotate your capture groups, which seems useful, but I didn't end up using it (mental note for another day). In the interim I'm using [^]+ (all characters except no character) instead of .+.

The limits of regular expressions. It may be that regular expressions can't be the only mechanism to parse things when they become complicated, and you need to bring out the big guns. In a happy coincidence, Benedikt Deicke is writing on the AppSignal blog these days about implementing a templating language in Ruby, with posts on lexers and parsers so far, and the posts seem easy to follow.

Soundtrack: Terror Danjah — Red Flag