RegEx DOS attack - Regular Expressions: Now you have 3 problems

by Abe Miessler 26. October 2010 03:19

I recently came across an article in MSDN magazine called Regular Expression Denial of Service Attacks and Defenses.  This article detailed a relatively new breed of Denial of Service (DOS) attack that exploits a feature of Nondeterministic Finite Automaton regex engines.  This is the type of regex engine used by .NET, Perl, Pyton, Ruby and PHP to name a few.  One thing that is so disturbing about this type of attack is how easy it is for malicious web users to find a vulnerability in your site if you are doing any kind of client side validation.  I for one have lived by the motto, "Use client side validation for user experience and server side validation for security."  Unfortunately with this type of attack sending your regex to the client will make it much easier for malicious users to find vulnerabilities in your site.

The problem regexs are ones that have something called exponential complexity.  This basically means that the number of paths that must be evaluated increases exponentially with every character added.  You run into this situation when you create a regex that has a grouping expression with repetition that is repeated itself.  Typically this basic example is given: ^(\d+)+$. 

For the purposes of our example we will use a slightly more complex regex that I found on RegexLib.  This regex is used to validate email addresses (nested repetition has been bolded): ^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$

I suspect that this is a widespread problem and I plan on reviewing my own code to see if I have put any vulnerabilities out there.  To demonstrate this vulnerability I have put together a small application to run some tests against.  First I created an aspx page with a TextBox, a RegularExpressionValidator, a Label and a Button:



For the purpose of this demonstration I have set "CausesValidation" to false in my Button.  This control is only setup like this so that we can time processing time on the server side.  Below is the code for btn_SubmitClick:




In this code we are just measuring how long it takes for the validation to take place.  Lets give our page a run:


I know... not that exciting.  But now lets take a look at the rendered HTML



Notice the area highlighted in red.  This serves to make the life of any evil doers much easier.  Since the regex is on the clientside they can examine each one and look for weaknesses.  Think they will face the same delay on the client side? Not if they use something like BurpSuite to yank the javascript.  Think you can limit the number of characters they enter into the text box?  Again not if they use something like BurpSuite...  Now that we have established that you are screwed lets take a look at what some sample inputs do. 

First lets try 15 characters:

.012 seconds? Piece of cake...  Lets try 20:

.39 seconds?  Still not that scary... Lets try 25:

12.542 seconds! Now it's getting a little scary.... Lets push it to 28:

93.381 seconds...  And lets take a look at my CPU during that time:


90 seconds of hell.  I'll leave it to you to go beyond 28 characters, but I waited for quite a while at 30 characters before I gave up. 

As you can imagine a few malicious users or a small bot net could bring a website to it's knees if they have vulnerable regexs out there.  Unfortunately I don't know of a quick fix for this problem.  The best solution I've seen so far involves finding all vulnerable Regular Expressions and rewriting them.  The article mentioned at the beginning of this post goes over a good "Regex Fuzzer" (a way to find problem regular expressions), and it's definitely worth looking at.

Good luck!


web application security | denial of service | DOS | Regular Expressions

Powered by BlogEngine.NET