sitespeed.io@17.8.3

Vulnerabilities

10 via 12 paths

Dependencies

501

Source

npm

Find, fix and prevent vulnerabilities in your code.

Severity
  • 6
  • 3
  • 1
Status
  • 10
  • 0
  • 0

high severity

Arbitrary File Write

  • Vulnerable module: tar
  • Introduced through: browsertime@12.10.0

Detailed paths

  • Introduced through: sitespeed.io@17.8.3 browsertime@12.10.0 @sitespeed.io/geckodriver@0.29.1 tar@6.0.2

Overview

tar is a full-featured Tar for Node.js.

Affected versions of this package are vulnerable to Arbitrary File Write. node-tar aims to guarantee that any file whose location would be modified by a symbolic link is not extracted. This is, in part, achieved by ensuring that extracted directories are not symlinks. Additionally, in order to prevent unnecessary stat calls to determine whether a given path is a directory, paths are cached when directories are created.

This logic was insufficient when extracting tar files that contained both a directory and a symlink with the same name as the directory, where the symlink and directory names in the archive entry used backslashes as a path separator on posix systems. The cache checking logic used both \ and / characters as path separators. However, \ is a valid filename character on posix systems.

By first creating a directory, and then replacing that directory with a symlink, it is possible to bypass node-tar symlink checks on directories, essentially allowing an untrusted tar file to symlink into an arbitrary location. This can lead to extracting arbitrary files into that location, thus allowing arbitrary file creation and overwrite.

Additionally, a similar confusion could arise on case-insensitive filesystems. If a tar archive contained a directory at FOO, followed by a symbolic link named foo, then on case-insensitive file systems, the creation of the symbolic link would remove the directory from the filesystem, but not from the internal directory cache, as it would not be treated as a cache hit. A subsequent file entry within the FOO directory would then be placed in the target of the symbolic link, thinking that the directory had already been created.

Remediation

Upgrade tar to version 6.1.7, 5.0.8, 4.4.16 or higher.

References

high severity

Arbitrary File Write

  • Vulnerable module: tar
  • Introduced through: browsertime@12.10.0

Detailed paths

  • Introduced through: sitespeed.io@17.8.3 browsertime@12.10.0 @sitespeed.io/geckodriver@0.29.1 tar@6.0.2

Overview

tar is a full-featured Tar for Node.js.

Affected versions of this package are vulnerable to Arbitrary File Write. node-tar aims to guarantee that any file whose location would be modified by a symbolic link is not extracted. This is, in part, achieved by ensuring that extracted directories are not symlinks. Additionally, in order to prevent unnecessary stat calls to determine whether a given path is a directory, paths are cached when directories are created.

This logic is insufficient when extracting tar files that contain two directories and a symlink with names containing unicode values that normalized to the same value. Additionally, on Windows systems, long path portions would resolve to the same file system entities as their 8.3 "short path" counterparts. A specially crafted tar archive can include directories with two forms of the path that resolve to the same file system entity, followed by a symbolic link with a name in the first form, lastly followed by a file using the second form. This leads to bypassing node-tar symlink checks on directories, essentially allowing an untrusted tar file to symlink into an arbitrary location and extracting arbitrary files into that location.

Remediation

Upgrade tar to version 6.1.9, 5.0.10, 4.4.18 or higher.

References

high severity

Arbitrary File Write

  • Vulnerable module: tar
  • Introduced through: browsertime@12.10.0

Detailed paths

  • Introduced through: sitespeed.io@17.8.3 browsertime@12.10.0 @sitespeed.io/geckodriver@0.29.1 tar@6.0.2

Overview

tar is a full-featured Tar for Node.js.

Affected versions of this package are vulnerable to Arbitrary File Write. node-tar aims to guarantee that any file whose location would be outside of the extraction target directory is not extracted. This is, in part, accomplished by sanitizing absolute paths of entries within the archive, skipping archive entries that contain .. path portions, and resolving the sanitized paths against the extraction target directory.

This logic is insufficient on Windows systems when extracting tar files that contain a path that is not an absolute path, but specify a drive letter different from the extraction target, such as C:some\path. If the drive letter does not match the extraction target, for example D:\extraction\dir, then the result of path.resolve(extractionDirectory, entryPath) resolves against the current working directory on the C: drive, rather than the extraction target directory.

Additionally, a .. portion of the path can occur immediately after the drive letter, such as C:../foo, and is not properly sanitized by the logic that checks for .. within the normalized and split portions of the path.

Note: This only affects users of node-tar on Windows systems.

Remediation

Upgrade tar to version 6.1.9, 5.0.10, 4.4.18 or higher.

References

high severity

Directory Traversal

  • Vulnerable module: node-downloader-helper
  • Introduced through: browsertime@12.10.0

Detailed paths

  • Introduced through: sitespeed.io@17.8.3 browsertime@12.10.0 @sitespeed.io/chromedriver@91.0.4472-19 node-downloader-helper@1.0.14
  • Introduced through: sitespeed.io@17.8.3 browsertime@12.10.0 @sitespeed.io/edgedriver@91.0.864-33 node-downloader-helper@1.0.13
  • Introduced through: sitespeed.io@17.8.3 browsertime@12.10.0 @sitespeed.io/geckodriver@0.29.1 node-downloader-helper@1.0.13

Overview

node-downloader-helper is an A simple http file downloader for node.js

Affected versions of this package are vulnerable to Directory Traversal. Since there is no sanitization of file path, malicious server is able to traversal path of victim machine. It leads remote code execution by putting malicious executable to startup folder in Windows (In Linux, it's possible to create authorized_key).

Details

A Directory Traversal attack (also known as path traversal) aims to access files and directories that are stored outside the intended folder. By manipulating files with "dot-dot-slash (../)" sequences and its variations, or by using absolute file paths, it may be possible to access arbitrary files and directories stored on file system, including application source code, configuration, and other critical system files.

Directory Traversal vulnerabilities can be generally divided into two types:

  • Information Disclosure: Allows the attacker to gain information about the folder structure or read the contents of sensitive files on the system.

st is a module for serving static files on web pages, and contains a vulnerability of this type. In our example, we will serve files from the public route.

If an attacker requests the following URL from our server, it will in turn leak the sensitive private key of the root user.

curl http://localhost:8080/public/%2e%2e/%2e%2e/%2e%2e/%2e%2e/%2e%2e/root/.ssh/id_rsa

Note %2e is the URL encoded version of . (dot).

  • Writing arbitrary files: Allows the attacker to create or replace existing files. This type of vulnerability is also known as Zip-Slip.

One way to achieve this is by using a malicious zip archive that holds path traversal filenames. When each filename in the zip archive gets concatenated to the target extraction folder, without validation, the final path ends up outside of the target folder. If an executable or a configuration file is overwritten with a file containing malicious code, the problem can turn into an arbitrary code execution issue quite easily.

The following is an example of a zip archive with one benign file and one malicious file. Extracting the malicious file will result in traversing out of the target folder, ending up in /root/.ssh/ overwriting the authorized_keys file:

2018-04-15 22:04:29 .....           19           19  good.txt
2018-04-15 22:04:42 .....           20           20  ../../../../../../root/.ssh/authorized_keys

Remediation

Upgrade node-downloader-helper to version 1.0.15 or higher.

References

high severity

Arbitrary File Overwrite

  • Vulnerable module: tar
  • Introduced through: browsertime@12.10.0

Detailed paths

  • Introduced through: sitespeed.io@17.8.3 browsertime@12.10.0 @sitespeed.io/geckodriver@0.29.1 tar@6.0.2

Overview

tar is a full-featured Tar for Node.js.

Affected versions of this package are vulnerable to Arbitrary File Overwrite. This is due to insufficient symlink protection. node-tar aims to guarantee that any file whose location would be modified by a symbolic link is not extracted. This is, in part, achieved by ensuring that extracted directories are not symlinks. Additionally, in order to prevent unnecessary stat calls to determine whether a given path is a directory, paths are cached when directories are created.

This logic is insufficient when extracting tar files that contain both a directory and a symlink with the same name as the directory. This order of operations results in the directory being created and added to the node-tar directory cache. When a directory is present in the directory cache, subsequent calls to mkdir for that directory are skipped. However, this is also where node-tar checks for symlinks occur. By first creating a directory, and then replacing that directory with a symlink, it is possible to bypass node-tar symlink checks on directories, essentially allowing an untrusted tar file to symlink into an arbitrary location and subsequently extracting arbitrary files into that location.

Remediation

Upgrade tar to version 3.2.3, 4.4.15, 5.0.7, 6.1.2 or higher.

References

high severity

Arbitrary File Overwrite

  • Vulnerable module: tar
  • Introduced through: browsertime@12.10.0

Detailed paths

  • Introduced through: sitespeed.io@17.8.3 browsertime@12.10.0 @sitespeed.io/geckodriver@0.29.1 tar@6.0.2

Overview

tar is a full-featured Tar for Node.js.

Affected versions of this package are vulnerable to Arbitrary File Overwrite. This is due to insufficient absolute path sanitization.

node-tar aims to prevent extraction of absolute file paths by turning absolute paths into relative paths when the preservePaths flag is not set to true. This is achieved by stripping the absolute path root from any absolute file paths contained in a tar file. For example, the path /home/user/.bashrc would turn into home/user/.bashrc.

This logic is insufficient when file paths contain repeated path roots such as ////home/user/.bashrc. node-tar only strips a single path root from such paths. When given an absolute file path with repeating path roots, the resulting path (e.g. ///home/user/.bashrc) still resolves to an absolute path.

Remediation

Upgrade tar to version 3.2.2, 4.4.14, 5.0.6, 6.1.1 or higher.

References

medium severity

Regular Expression Denial of Service (ReDoS)

  • Vulnerable module: markdown
  • Introduced through: markdown@0.5.0

Detailed paths

  • Introduced through: sitespeed.io@17.8.3 markdown@0.5.0

Overview

markdown is a yet another markdown parser, this time for JavaScript.

Note: This package is no longer actively maintained and should be considered deprecated.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS). It is possible under certain circumstances to abuse the URL regex parse functionality available within the Gruber dialect feature to conduct denial of service attacks.

Note: Exploitation of this vulnerability requires usage of the Gruber dialect (dialects/gruber.js) within markdown, which is not available by default.

PoC by Snyk

console.time('benchmark');
//regex taken from https://github.com/evilstreak/markdown-js/blob/master/src/dialects/gruber.js#L12

var urlRegexp = /(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]+-?)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:\/[^\s]*)?/i.source;

//expoit/payload
const str = '![Blat Blat](https://192.168916891689168916891689168916891689168916891689168916891689168916891689168916891689168916891689168916891689268192.1 "Blat Blat")';

//Duplicate of code from https://github.com/evilstreak/markdown-js/blob/master/src/dialects/gruber.js#L566

var m = str.match(new RegExp("^!\\[(.*?)][ \\t]*\\((" + urlRegexp + ")\\)([ \\t])*([\"'].*[\"'])?")) ||
        str.match( /^!\[(.*?)\][ \t]*\([ \t]*([^")]*?)(?:[ \t]+(["'])(.*?)\3)?[ \t]*\)/ );
console.timeEnd('benchmark');

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service - DDoS - attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.

The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren't very intuitive and can ultimately end up making it easy for attackers to take your site down.

Let’s take the following regular expression as an example:

regex = /A(B|C+)+D/

This regular expression accomplishes the following:

  • A The string must start with the letter 'A'
  • (B|C+)+ The string must then follow the letter A with either the letter 'B' or some number of occurrences of the letter 'C' (the + matches one or more times). The + at the end of this section states that we can look for one or more matches of this section.
  • D Finally, we ensure this section of the string ends with a 'D'

The expression would match inputs such as ABBD, ABCCCCD, ABCBCCCD and ACCCCCD

It most cases, it doesn't take very long for a regex engine to find a match:

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCD")'
0.04s user 0.01s system 95% cpu 0.052 total

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX")'
1.79s user 0.02s system 99% cpu 1.812 total

The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.

Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.

Let's look at how our expression runs into this problem, using a shorter string: "ACCCX". While it seems fairly straightforward, there are still four different ways that the engine could match those three C's:

  1. CCC
  2. CC+C
  3. C+CC
  4. C+C+C.

The engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can use RegEx 101 debugger to see the engine has to take a total of 38 steps before it can determine the string doesn't match.

From there, the number of steps the engine must use to validate a string just continues to grow.

String Number of C's Number of steps
ACCCX 3 38
ACCCCX 4 71
ACCCCCX 5 136
ACCCCCCCCCCCCCCX 14 65,553

By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. These extreme situations can cause them to work very slowly (exponentially related to input size, as shown above), allowing an attacker to exploit this and can cause the service to excessively consume CPU, resulting in a Denial of Service.

Remediation

There is no fixed version for markdown.

References

medium severity

Regular Expression Denial of Service (ReDoS)

  • Vulnerable module: markdown
  • Introduced through: markdown@0.5.0

Detailed paths

  • Introduced through: sitespeed.io@17.8.3 markdown@0.5.0

Overview

markdown is a yet another markdown parser, this time for JavaScript.

Note: This package is no longer actively maintained and should be considered deprecated.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS). The markdown.toHTML() function has significantly degraded performance when parsing long strings containing underscores. This may lead to ReDoS if the parser accepts user input.

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service - DDoS - attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.

The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren't very intuitive and can ultimately end up making it easy for attackers to take your site down.

Let’s take the following regular expression as an example:

regex = /A(B|C+)+D/

This regular expression accomplishes the following:

  • A The string must start with the letter 'A'
  • (B|C+)+ The string must then follow the letter A with either the letter 'B' or some number of occurrences of the letter 'C' (the + matches one or more times). The + at the end of this section states that we can look for one or more matches of this section.
  • D Finally, we ensure this section of the string ends with a 'D'

The expression would match inputs such as ABBD, ABCCCCD, ABCBCCCD and ACCCCCD

It most cases, it doesn't take very long for a regex engine to find a match:

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCD")'
0.04s user 0.01s system 95% cpu 0.052 total

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX")'
1.79s user 0.02s system 99% cpu 1.812 total

The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.

Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.

Let's look at how our expression runs into this problem, using a shorter string: "ACCCX". While it seems fairly straightforward, there are still four different ways that the engine could match those three C's:

  1. CCC
  2. CC+C
  3. C+CC
  4. C+C+C.

The engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can use RegEx 101 debugger to see the engine has to take a total of 38 steps before it can determine the string doesn't match.

From there, the number of steps the engine must use to validate a string just continues to grow.

String Number of C's Number of steps
ACCCX 3 38
ACCCCX 4 71
ACCCCCX 5 136
ACCCCCCCCCCCCCCX 14 65,553

By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. These extreme situations can cause them to work very slowly (exponentially related to input size, as shown above), allowing an attacker to exploit this and can cause the service to excessively consume CPU, resulting in a Denial of Service.

Remediation

There is no fixed version for markdown.

References

medium severity

Regular Expression Denial of Service (ReDoS)

  • Vulnerable module: markdown-it
  • Introduced through: jstransformer-markdown-it@2.1.0

Detailed paths

  • Introduced through: sitespeed.io@17.8.3 jstransformer-markdown-it@2.1.0 markdown-it@8.4.2

Overview

markdown-it is a modern pluggable markdown parser.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS). Parsing __*_… takes quadratic time, this could be a denial of service vulnerability in an application that parses user input.

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service - DDoS - attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.

The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren't very intuitive and can ultimately end up making it easy for attackers to take your site down.

Let’s take the following regular expression as an example:

regex = /A(B|C+)+D/

This regular expression accomplishes the following:

  • A The string must start with the letter 'A'
  • (B|C+)+ The string must then follow the letter A with either the letter 'B' or some number of occurrences of the letter 'C' (the + matches one or more times). The + at the end of this section states that we can look for one or more matches of this section.
  • D Finally, we ensure this section of the string ends with a 'D'

The expression would match inputs such as ABBD, ABCCCCD, ABCBCCCD and ACCCCCD

It most cases, it doesn't take very long for a regex engine to find a match:

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCD")'
0.04s user 0.01s system 95% cpu 0.052 total

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX")'
1.79s user 0.02s system 99% cpu 1.812 total

The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.

Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.

Let's look at how our expression runs into this problem, using a shorter string: "ACCCX". While it seems fairly straightforward, there are still four different ways that the engine could match those three C's:

  1. CCC
  2. CC+C
  3. C+CC
  4. C+C+C.

The engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can use RegEx 101 debugger to see the engine has to take a total of 38 steps before it can determine the string doesn't match.

From there, the number of steps the engine must use to validate a string just continues to grow.

String Number of C's Number of steps
ACCCX 3 38
ACCCCX 4 71
ACCCCCX 5 136
ACCCCCCCCCCCCCCX 14 65,553

By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. These extreme situations can cause them to work very slowly (exponentially related to input size, as shown above), allowing an attacker to exploit this and can cause the service to excessively consume CPU, resulting in a Denial of Service.

Remediation

Upgrade markdown-it to version 10.0.0 or higher.

References

low severity

Regular Expression Denial of Service (ReDoS)

  • Vulnerable module: tar
  • Introduced through: browsertime@12.10.0

Detailed paths

  • Introduced through: sitespeed.io@17.8.3 browsertime@12.10.0 @sitespeed.io/geckodriver@0.29.1 tar@6.0.2

Overview

tar is a full-featured Tar for Node.js.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS). When stripping the trailing slash from files arguments, the f.replace(/\/+$/, '') performance of this function can exponentially degrade when f contains many / characters resulting in ReDoS.

This vulnerability is not likely to be exploitable as it requires that the untrusted input is being passed into the tar.extract() or tar.list() array of entries to parse/extract, which would be unusual.

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service - DDoS - attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.

The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren't very intuitive and can ultimately end up making it easy for attackers to take your site down.

Let’s take the following regular expression as an example:

regex = /A(B|C+)+D/

This regular expression accomplishes the following:

  • A The string must start with the letter 'A'
  • (B|C+)+ The string must then follow the letter A with either the letter 'B' or some number of occurrences of the letter 'C' (the + matches one or more times). The + at the end of this section states that we can look for one or more matches of this section.
  • D Finally, we ensure this section of the string ends with a 'D'

The expression would match inputs such as ABBD, ABCCCCD, ABCBCCCD and ACCCCCD

It most cases, it doesn't take very long for a regex engine to find a match:

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCD")'
0.04s user 0.01s system 95% cpu 0.052 total

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX")'
1.79s user 0.02s system 99% cpu 1.812 total

The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.

Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.

Let's look at how our expression runs into this problem, using a shorter string: "ACCCX". While it seems fairly straightforward, there are still four different ways that the engine could match those three C's:

  1. CCC
  2. CC+C
  3. C+CC
  4. C+C+C.

The engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can use RegEx 101 debugger to see the engine has to take a total of 38 steps before it can determine the string doesn't match.

From there, the number of steps the engine must use to validate a string just continues to grow.

String Number of C's Number of steps
ACCCX 3 38
ACCCCX 4 71
ACCCCCX 5 136
ACCCCCCCCCCCCCCX 14 65,553

By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. These extreme situations can cause them to work very slowly (exponentially related to input size, as shown above), allowing an attacker to exploit this and can cause the service to excessively consume CPU, resulting in a Denial of Service.

Remediation

Upgrade tar to version 6.1.4, 5.0.8, 4.4.16 or higher.

References