Skip to content

Commit 751946d

Browse files
committed
Add a solution that enables the direct use of atomic groups
The solution described where atomic groups are emulated using lookahead and backreferences is useful but can be tricky to use and error prone (e.g. when quantifying the result, or in longer patterns that rely on multiple atomic groups). So this adds a link to an easy to use solution that enables the direct use of atomic groups via `(?>…)` in native JS regexes.
1 parent 2092da7 commit 751946d

File tree

1 file changed

+4
-4
lines changed
  • 9-regular-expressions/15-regexp-catastrophic-backtracking

1 file changed

+4
-4
lines changed

9-regular-expressions/15-regexp-catastrophic-backtracking/article.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Catastrophic backtracking
22

3-
Some regular expressions are looking simple, but can execute a veeeeeery long time, and even "hang" the JavaScript engine.
3+
Some regular expressions look simple, but can execute a veeeeeery long time, and even "hang" the JavaScript engine.
44

55
Sooner or later most developers occasionally face such behavior. The typical symptom -- a regular expression works fine sometimes, but for certain strings it "hangs", consuming 100% of CPU.
66

@@ -244,7 +244,7 @@ Modern regular expression engines support possessive quantifiers for that. Regul
244244

245245
Possessive quantifiers are in fact simpler than "regular" ones. They just match as many as they can, without any backtracking. The search process without backtracking is simpler.
246246

247-
There are also so-called "atomic capturing groups" - a way to disable backtracking inside parentheses.
247+
There are also so-called "atomic groups" - a way to disable backtracking inside parentheses.
248248

249249
...But the bad news is that, unfortunately, in JavaScript they are not supported.
250250

@@ -266,7 +266,7 @@ Let's decipher it:
266266

267267
That is: we look ahead - and if there's a word `pattern:\w+`, then match it as `pattern:\1`.
268268

269-
Why? That's because the lookahead finds a word `pattern:\w+` as a whole and we capture it into the pattern with `pattern:\1`. So we essentially implemented a possessive plus `pattern:+` quantifier. It captures only the whole word `pattern:\w+`, not a part of it.
269+
Why? That's because the lookahead finds a word `pattern:\w+` as a whole and we capture it into the pattern with `pattern:\1`. So we essentially implemented an atomic group. It captures only the whole word `pattern:\w+`, not a part of it.
270270

271271
For instance, in the word `subject:JavaScript` it may not only match `match:Java`, but leave out `match:Script` to match the rest of the pattern.
272272

@@ -283,7 +283,7 @@ alert( "JavaScript".match(/(?=(\w+))\1Script/)); // null
283283
We can put a more complex regular expression into `pattern:(?=(\w+))\1` instead of `pattern:\w`, when we need to forbid backtracking for `pattern:+` after it.
284284

285285
```smart
286-
There's more about the relation between possessive quantifiers and lookahead in articles [Regex: Emulate Atomic Grouping (and Possessive Quantifiers) with LookAhead](https://instanceof.me/post/52245507631/regex-emulate-atomic-grouping-with-lookahead) and [Mimicking Atomic Groups](https://blog.stevenlevithan.com/archives/mimic-atomic-groups).
286+
The [`regex`](https://github.com/slevithan/regex) package adds support for atomic groups to native JavaScript regexps. There's also more about the relation between atomic groups and lookahead in articles [Emulate Atomic Grouping (and Possessive Quantifiers) with LookAhead](https://instanceof.me/post/52245507631/regex-emulate-atomic-grouping-with-lookahead) and [Mimicking Atomic Groups](https://blog.stevenlevithan.com/archives/mimic-atomic-groups).
287287
```
288288

289289
Let's rewrite the first example using lookahead to prevent backtracking:

0 commit comments

Comments
 (0)