Extracting things from JavaScript strings
You want to pull some information out of a JavaScript string and you have a pattern (regular expression, that is) to match against. Surprised that the String/RegExp combo API lacks a direct method, you comb the web for a solution. On StackOverflow you're invited to put a RegExp.exec()
in a while
loop. I'll show you a better if unorthodox way of doing it with String.replace
.
(Ab)using String.replace
Let's start with the real-life-sounding, contrived scenario of wanting to extract all numbers from a JavaScript string. A simple pattern for matching all occurrences of numbers (specifically, decimal and floating point numbers) is:
/[+-]?\d+(\.\d+)?/g
(don't forget the g
flag for Global)
The RegExp.exec
way of doing it:
var str = "Some of the best numbers are 42, and, in particular 42.999.";
var number_regex = /[+-]?\d+(\.\d+)?/g;
var matches = [];
var match;
while ((match = number_regex.exec(str)) !== null) {
matches.push(match[0]);
}
console.log(matches);
// => ["42", "42.999"]
If instead we use String.replace
we can write:
var str = "Some of the best numbers are 42, and, in particular 42.999.";
var number_regex = /[+-]?\d+(\.\d+)?/g;
var matches = [];
str.replace(number_regex, function(match) {
matches.push(match);
});
console.log(matches);
// => ["42", "42.999"]
In effect, we're hijacking String.replace
to act as an iterator over the matches, rather than actually replacing anything in the String.
This has a couple of advantages:
- you do away with the anxiety-inducing
while
loop because what can go wrong I'll tell you what can go wrong; use a regular expression literal in thewhile
statement, instead of a regex saved in variable, and you'll get an infinite loop (courtesy of thelastIndex
property):
while ((match = /[+-]?\d+(\.\d+)?/g.exec(str)) !== null) { FOREVERCODE }
- you get to have named parameters for your matches instead indexes in an array.
Extracting parts of a pattern
Each group we create in a regular expression, using the (...)
construct, will result in an additional parameter to our "replacer" function. Let's take another example.
Assume in an blog article you can add Wordpress-style shortcodes for embedding videos:
Top 10 videos this months:
1. [youtube:FyCsJAj69sc]
2. [vimeo:128373915]
...
A straightforward regular expression to match these shortcodes is:
/\[\w+:\w+\]/g
// match alphanumeric characters, followed by colon,
// followed by another set of alphanumeric characters
// all wrapped in square brackets
but we want to match the parts individually, so we put them in groups:
/\[(\w+):(\w+)\]/g
Our pattern-extraction function now receives two extra parameters, one for each group:
var str = "Top 10 videos this months: \
1. [youtube:FyCsJAj69sc] \
2. [vimeo:128373915]";
var shortcode_regex = /\[(\w+):(\w+)\]/g;
var matches = [];
str.replace(shortcode_regex, function(match, code, id) {
matches.push({
code: code,
id: id
});
});
console.log(matches);
For which we get:
[
{"code": "youtube", "id": "FyCsJAj69sc"},
{"code": "vimeo", "id": "128373915"}
]
Tip: Have a regex group that you don't want to show up in the matcher function? Use the non-capturing group syntax (?: ... )
.
Now you know how to make pattern extraction more readable and less error-prone.