Snyking in - regular expression denial of service vulnerability exploit in the ms package

Simon Maple
| By Simon Maple

Welcome to another edition of our Snyking In exploit series! Last time we looked at a directory traversal vulnerability exploit in the st library. In this episode, we’ll be looking at the regular expression denial of service vulnerability, demonstrating how it can be exploited, as well as the potential risk they pose to your data and systems.

We will also show you how to both find and fix this type of vulnerability in your application. Without further ado, here’s the exploit video followed by more information about the regular expression denial of service vulnerability.

To see if any of your applications contain regular expression denial of service vulnerabilities in third-party dependencies, you can scan your application for free, using Snyk.

Test my application!

Regular expression denial of service

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service – DDoS – attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.

The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren’t very intuitive and can ultimately end up making it easy for attackers to take your site down.

The recent state of open source security report, released by Snyk, has shown that regular expression denial of service vulnerability disclosures has increased by 143% in the last year alone.

Catastrophic backtracking

Let’s take a look at the following regular expression:

regex = /A(B|C+)+D/

This regular expression accomplishes the following:

  • A The string must start with the letter A
  • (B|C+)+ The string must then follow the letter A with either the letter B or some number of occurrences of the letter C (the + matches one or more times). The + at the end of this section states that we can look for one or more matches of this section.
  • D Finally, we ensure this section of the string ends with a D

The expression would match inputs such as ABBD, ABCCCCD, ABCBCCCD and ACCCCCD.

It most cases, it doesn’t take very long for a regex engine to find a match:

0.04s user 0.01s system 95% cpu 0.052 total

1.79s user 0.02s system 99% cpu 1.812 total

The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.

Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.

The ms exploit

The following command adds a todo item to our Snyk Goof todo application. The in 20 minutes part of the content text is matched by the regex engine as a representation of time. This could be used by the business logic of the application to create reminders or alerts for example.

$ echo 'content=Call mom in 20 minutes' | http --form https://localhost:3001/create -v

We can try to brute force the length of the todo entry as follows. The command prints 60000 5s as the number of minutes, but this will return very quickly as the pattern still matches.

$ echo 'content=Buy milk in '`printf %.0s5 {1..60000}`' minutes' | http --form https://localhost:3001/create -v

The way to cause a denial of service is to pass a string that would cause a catastrophic backtracking scenario. Part of this requires a large input string like the previous example, but we need to make sure the regex engine never matches the pattern, and as a result, backtracks throughout all possibilities before failing. This causes the delay, or denial of service, we’re after. To achieve this, we can alter the minutes text in our content text to minutea for example, which is a pattern that will not be matched by the regex engine. The following entry command will cause a denial of service of around 10-15 seconds. If we pass 600000 5s, we’ll be waiting around for the rest of the day for the request to complete on the server.

$ echo 'content=Buy milk in '`printf %.0s5 {1..60000}`' minutea' | http --form https://localhost:3001/create -v

To test your application for vulnerabilities in third-party libraries, such as ms, try Snyk for free and get instant results:

Test my application!