You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a solution that enables the direct use of atomic groups
The solution described where atomic groups are emulated using lookahead and backreferences is useful but can be tricky to use and error prone (e.g. when quantifying the result, or in longer patterns that rely on multiple atomic groups). So this adds a link to an easy to use solution that enables the direct use of atomic groups via `(?>…)` in native JS regexes.
Copy file name to clipboardExpand all lines: 9-regular-expressions/15-regexp-catastrophic-backtracking/article.md
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Catastrophic backtracking
2
2
3
-
Some regular expressions are looking simple, but can execute a veeeeeery long time, and even "hang" the JavaScript engine.
3
+
Some regular expressions look simple, but can execute a veeeeeery long time, and even "hang" the JavaScript engine.
4
4
5
5
Sooner or later most developers occasionally face such behavior. The typical symptom -- a regular expression works fine sometimes, but for certain strings it "hangs", consuming 100% of CPU.
6
6
@@ -244,7 +244,7 @@ Modern regular expression engines support possessive quantifiers for that. Regul
244
244
245
245
Possessive quantifiers are in fact simpler than "regular" ones. They just match as many as they can, without any backtracking. The search process without backtracking is simpler.
246
246
247
-
There are also so-called "atomic capturing groups" - a way to disable backtracking inside parentheses.
247
+
There are also so-called "atomic groups" - a way to disable backtracking inside parentheses.
248
248
249
249
...But the bad news is that, unfortunately, in JavaScript they are not supported.
250
250
@@ -266,7 +266,7 @@ Let's decipher it:
266
266
267
267
That is: we look ahead - and if there's a word `pattern:\w+`, then match it as `pattern:\1`.
268
268
269
-
Why? That's because the lookahead finds a word `pattern:\w+` as a whole and we capture it into the pattern with `pattern:\1`. So we essentially implemented a possessive plus `pattern:+` quantifier. It captures only the whole word `pattern:\w+`, not a part of it.
269
+
Why? That's because the lookahead finds a word `pattern:\w+` as a whole and we capture it into the pattern with `pattern:\1`. So we essentially implemented an atomic group. It captures only the whole word `pattern:\w+`, not a part of it.
270
270
271
271
For instance, in the word `subject:JavaScript` it may not only match `match:Java`, but leave out `match:Script` to match the rest of the pattern.
We can put a more complex regular expression into `pattern:(?=(\w+))\1` instead of `pattern:\w`, when we need to forbid backtracking for `pattern:+` after it.
284
284
285
285
```smart
286
-
There's more about the relation between possessive quantifiers and lookahead in articles [Regex: Emulate Atomic Grouping (and Possessive Quantifiers) with LookAhead](https://instanceof.me/post/52245507631/regex-emulate-atomic-grouping-with-lookahead) and [Mimicking Atomic Groups](https://blog.stevenlevithan.com/archives/mimic-atomic-groups).
286
+
The [`regex`](https://github.com/slevithan/regex) package adds support for atomic groups to native JavaScript regexps. There's also more about the relation between atomic groups and lookahead in articles [Emulate Atomic Grouping (and Possessive Quantifiers) with LookAhead](https://instanceof.me/post/52245507631/regex-emulate-atomic-grouping-with-lookahead) and [Mimicking Atomic Groups](https://blog.stevenlevithan.com/archives/mimic-atomic-groups).
287
287
```
288
288
289
289
Let's rewrite the first example using lookahead to prevent backtracking:
0 commit comments