Most of the regular expression engines use backtracking to try all possible execution paths of the regular expression when evaluating an input, in some cases it can cause performance issues, called catastrophic backtracking situations. In the worst case, the complexity of the regular expression is exponential in the size of the input, this means that a small carefully-crafted input (like 20 chars) can trigger catastrophic backtracking and cause a denial of service of the application. Super-linear regex complexity can lead to the same impact too with, in this case, a large carefully-crafted input (thousands chars).
This rule determines the runtime complexity of a regular expression and informs you if it is not linear.
There is a risk if you answered yes to any of those questions.
To avoid catastrophic backtracking situations, make sure that none of the following conditions apply to your regular expression.
In all of the following cases, catastrophic backtracking can only happen if the problematic part of the regex is followed by a pattern that can fail, causing the backtracking to actually happen.
r* or r*?, such that the regex r could produce different possible matches (of
possibly different lengths) on the same input, the worst case matching time can be exponential. This can be the case if r contains
optional parts, alternations or additional repetitions (but not if the repetition is written in such a way that there’s only one way to match it).
a*b* is not a problem because a* and b* match different things and
a*_a* is not a problem because the repetitions are separated by a '_' and can’t match that '_'. However,
a*a* and .*_.* have quadratic runtime. str.split(/\s*,/) will run in quadratic time on strings that consist entirely of
spaces (or at least contain large sequences of spaces, not followed by a comma). In order to rewrite your regular expression without these patterns, consider the following strategies:
{1,5} instead of +
for instance. (ba+)+ doesn’t cause performance issues, indeed, the inner group can be matched only if there exists exactly one
b char per repetition of the group. . to exclude separators where applicable. For example the quadratic regex
.*_.* can be made linear by changing it to [^_]*_.* Sometimes it’s not possible to rewrite the regex to be linear while still matching what you want it to match. Especially when the regex is not anchored to the beginning of the string, for which it is quite hard to avoid quadratic runtimes. In those cases consider the following approaches:
str.split(/\s*,\s*/) with str.split(",") and
then trimming the spaces from the strings as a second step. x*y could be replaced with x*(y)? and
then the call to str.match(regex) could be replaced with matched = str.match(regex) and matched[1] !==
undefined. The regex evaluation will never end:
/(a+)+$/.test( "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaa!" ); // Sensitive
Possessive quantifiers do not keep backtracking positions, thus can be used, if possible, to avoid performance issues. Unfortunately, they are not supported in JavaScript, but one can still mimick them using lookahead assertions and backreferences:
/((?=(a+))\2)+$/.test( "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"+ "aaaaaaaaaaaaaaa!" ); // Compliant