Can regular expressions also cause a DoS? Yes, when the regular expressions are not well written, they may be used by malicious input and consume a lot of resources, resulting in a DoS. This attack is called the ReDOS.
Different from the resource exhaustion attack mentioned earlier, the ReDOS is a code of vulnerability. We know that regular expressions are based on the nondeterministic finite automaton (NFA), which is a state machine; each state and input symbol may have many different states. Regular parsing engines will traverse all possible paths until the end. Each state has a number of next statess, and therefore the decision algorithm will try each next state until it finds a match, for example, this regular expression:
^(a+)+$
When the input has only four “a’s,”
aaaaX
It has only 16 possible paths, so the engine will be able to finish traversing quickly. But when entering the following string:
aaaaaaaaaaaaaaaaX
it will become 65536 possible paths; each additional “a” will double the number of paths.
This greatly increases the consumption of the regular engine to parse the data. When the user maliciously constructs the input, these defective regex will consume a lot of system resources (such as CPU and memory) and cause a decline in server performance; moreover, as the performance results of the system are very slow, some process or service will not respond. This is the same consequence as a DoS.
As for the regular expression, we can carry out a test. The test code* is as follows:
#
# retime.py - Python test program for regular expression DoS attacks
#
# This test program measures the execution time of the Python regular expression
# matcher to determine if it has problems with regular expression denial-of-service (ReDoS)
# attacks. A ReDoS attack becomes possible in applications which use poorly written regular
# expressions to validate user inputs. An improperly written regular expression has an
# exponential run time when given a non-matching string. Character strings as short as
# 30 characters can cause problems.
#
# The following WikiPedia article provides more information about the ReDoS problem:
#
# http://en.wikipedia.org/wiki/Regular_expression_Denial_of_ Service_-_ReDoS
#
# This program has been tested with both CPython and IronPython. Versions of the
# test program for C#, Java, JavaScript, Perl, and PHP are also available at:
#
# http://www.computerbytesman.com/redos
#
# Author: Richard M. Smith
#
# Please send comments, questions, additions, etc. to info@ computerbytesman.com
#
#
# Test parameters
#
# regex: String containing the regular expression to be tested # maketeststring: A function which generates a test string from a length parameter
# maxiter: Maximum number of test iterations to be performed (typical value is 50)
# maxtime: Maximum execution time in seconds for one iteration before the test program
# is terminated (typical value is 2 seconds)
#
regex = r"^(a+)+$"
maketeststring = lambda n: "a" * n + "!"
maxiter = 50
maxtime = 2
#
# Python modules used by this program # import re import time import sys
#
# Main function
#
def main(): print print "Python Regular Expression DoS demo"
print "from http://www.computerbytesman.com/redos"
print print "Platform: %s %s" % (sys.platform, sys.version)
print "Regular expression %s" % (regex)
print "Typical test string: %s" % (maketeststring(10))
print "Max. iterations: %d" % (maxiter)
print "Max. match time: %d sec%s" % (maxtime, "s"
if maxtime != 1
else "") print cregex = re.compile(regex) for i in xrange(1, maxiter): time = runtest(cregex, i)
if time > maxtime: break
return
# Run one test
# def runtest(regex, n):
teststr = maketeststring(n)
starttime = time.clock()
match = regex.match(teststr)
elapsetime = int((time.clock() - starttime) * 1000) count = 0
if match != None: count = match.end() - match.start() print "For n=%d, match time=%d msec%s, match count=%s" % (n, elapsetime, "s"
if elapsetime == 1
else "", count) return float(elapsetime) / 1000
if __name__ == "__main__": main()
The testing result is as follows
Python Regular Expression DoS demo from http://www.computerbytesman.com/redos
Platform: win32 2.6 (r26:66714, Nov 11 2008, 10:21:19) [MSC v.1500 32 bit (Intel)]
Regular expression ^(a+)+$
Typical test string: aaaaaaaaaa!
Max. iterations: 50
Max. match time: 2 secs
For n=1, match time=0 msec, match count=0
For n=2, match time=0 msec, match count=0
For n=3, match time=0 msec, match count=0
For n=4, match time=0 msec, match count=0
For n=5, match time=0 msec, match count=0
For n=6, match time=0 msec, match count=0
For n=7, match time=0 msec, match count=0
For n=8, match time=0 msec, match count=0
For n=9, match time=0 msec, match count=0
For n=10, match time=0 msec, match count=0
For n=11, match time=0 msec, match count=0
For n=12, match time=0 msec, match count=0
For n=13, match time=1 msecs, match count=0
For n=14, match time=3 msec, match count=0
For n=15, match time=4 msec, match count=0
For n=16, match time=9 msec, match count=0
For n=17, match time=18 msec, match count=0
For n=18, match time=38 msec, match count=0
For n=19, match time=76 msec, match count=0
For n=20, match time=147 msec, match count=0
For n=21, match time=294 msec, match count=0
For n=22, match time=591 msec, match count=0
For n=23, match time=1187 msec, match count=0
For n=24, match time=2394 msec, match count=0
#--------------+-------------------------------------------------
patterns list of malicious RegEx
#--------------+-------------------------------------------------
a++ (a+)+
charclass+ ([a-zA-Z]+)*
a_or_aa (a|aa)+
a_or_a (a|a?)+
a_11 (.*a){11}
a_65 (.*a){65}
Friedl ([^\\"']+)*
#--------------- same as above again enclosed in ^and $ ---------
start_a++ ^(a+)+$
start_charclass^([a-zA-Z]+)*$
start_a_or_aa ^(a|aa)+$
start_a_or_a ^(a|a?)+$
start_a_11 ^(.*a){11}$
start_a_65 ^(.*a){65}$
start_Friedl ^([^\\"']+)*$
#--------------
OWASP ^[a-zA-Z]+((['\,\.\-][a-zA-Z ])?[a-zA-Z]*)*$
DataVault ^\[(,.*)*\]$
EntLib ^([^"]+)(?:\\([^"]+))*$
Java_Classname ^(([a-z])+.)+[A-Z]([a-z])+$
Cox_10 a?a?a?a?a?a?a?a?a?a?aaaaaaaaaa
Cox_25 a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a aaaaaaaaaaaaaaaaaaaaaaaa
#--------------+-------------------------------------------------
At the same time, you can use the following test case to verify whether the ReDOS problem exists in the regular expression:
#--------------+-------------------------------------------------
payloads list of payloads
#--------------+-------------------------------------------------
a_12X aaaaaaaaaaaaX
a_18X aaaaaaaaaaaaaaaaaaX
a_33X aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaX
a_49X aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaX
Cox_10 aaaaaaaaaa
Cox_20 aaaaaaaaaaaaaaaaaaaa
Cox_25 aaaaaaaaaaaaaaaaaaaaaaaaa
Cox_34 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Java_Classname aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!
EmailValidation a@aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!
EmailValidatioX a@aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaX
invalid_Unicode (.+)+\u0001
DataVault_DoS [,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
EntLib_DoS \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"
EntLib_DoSX \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\"X
#--------------+-------------------------------------------------
Though the parsing algorithm of the regular expressions performs better,* popular languages still use the naive algorithm in order to provide an enhanced parsing engine so that there are similar problems in the built-in regular parsing engines of many platforms and development languages.
In the Internet, the regular expression may exist in any place, but as long as there is a defect in this, it is likely to come across a ReDOS.
After checking the application security, we must not ignore the possible impact of the ReDOS.