textract@0.13.0

Vulnerabilities

9 via 9 paths

Dependencies

33

Source

npm

Find, fix and prevent vulnerabilities in your code.

Severity
  • 3
  • 3
  • 3
Status
  • 9
  • 0
  • 0

high severity

Denial of Service (DoS)

  • Vulnerable module: xlsx
  • Introduced through: j@0.3.11

Detailed paths

  • Introduced through: textract@0.13.0 j@0.3.11 xlsx@0.7.12
    Remediation: Upgrade to j@1.0.0.

Overview

xlsx is a Parser and writer for various spreadsheet formats.

Affected versions of this package are vulnerable to Denial of Service (DoS). An attacker who can send a malicious excel file parsed by this library can crash the Node.JS process.

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its intended and legitimate users.

Unlike other vulnerabilities, DoS attacks usually do not aim at breaching security. Rather, they are focused on making websites and services unavailable to genuine users resulting in downtime.

One popular Denial of Service vulnerability is DDoS (a Distributed Denial of Service), an attack that attempts to clog network pipes to the system by generating a large volume of traffic from many machines.

When it comes to open source libraries, DoS vulnerabilities allow attackers to trigger such a crash or crippling of the service by using a flaw either in the application code or from the use of open source libraries.

Two common types of DoS vulnerabilities:

  • High CPU/Memory Consumption- An attacker sending crafted requests that could cause the system to take a disproportionate amount of time to process. For example, commons-fileupload:commons-fileupload.

  • Crash - An attacker sending crafted requests that could cause the system to crash. For Example, npm ws package

Remediation

Upgrade xlsx to version 0.17.0 or higher.

References

high severity

Denial of Service (DoS)

  • Vulnerable module: xlsx
  • Introduced through: j@0.3.11

Detailed paths

  • Introduced through: textract@0.13.0 j@0.3.11 xlsx@0.7.12
    Remediation: Upgrade to j@1.0.0.

Overview

xlsx is a Parser and writer for various spreadsheet formats.

Affected versions of this package are vulnerable to Denial of Service (DoS). An attacker who can send a malicious excel file parsed by this library can cause maximum CPU usage.

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its intended and legitimate users.

Unlike other vulnerabilities, DoS attacks usually do not aim at breaching security. Rather, they are focused on making websites and services unavailable to genuine users resulting in downtime.

One popular Denial of Service vulnerability is DDoS (a Distributed Denial of Service), an attack that attempts to clog network pipes to the system by generating a large volume of traffic from many machines.

When it comes to open source libraries, DoS vulnerabilities allow attackers to trigger such a crash or crippling of the service by using a flaw either in the application code or from the use of open source libraries.

Two common types of DoS vulnerabilities:

  • High CPU/Memory Consumption- An attacker sending crafted requests that could cause the system to take a disproportionate amount of time to process. For example, commons-fileupload:commons-fileupload.

  • Crash - An attacker sending crafted requests that could cause the system to crash. For Example, npm ws package

Remediation

Upgrade xlsx to version 0.17.0 or higher.

References

high severity

Denial of Service (DoS)

  • Vulnerable module: xlsx
  • Introduced through: j@0.3.11

Detailed paths

  • Introduced through: textract@0.13.0 j@0.3.11 xlsx@0.7.12
    Remediation: Upgrade to j@1.0.0.

Overview

xlsx is a Parser and writer for various spreadsheet formats.

Affected versions of this package are vulnerable to Denial of Service (DoS). An attacker who can send a malicious excel file parsed by this library can crash the Node.JS process.

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its intended and legitimate users.

Unlike other vulnerabilities, DoS attacks usually do not aim at breaching security. Rather, they are focused on making websites and services unavailable to genuine users resulting in downtime.

One popular Denial of Service vulnerability is DDoS (a Distributed Denial of Service), an attack that attempts to clog network pipes to the system by generating a large volume of traffic from many machines.

When it comes to open source libraries, DoS vulnerabilities allow attackers to trigger such a crash or crippling of the service by using a flaw either in the application code or from the use of open source libraries.

Two common types of DoS vulnerabilities:

  • High CPU/Memory Consumption- An attacker sending crafted requests that could cause the system to take a disproportionate amount of time to process. For example, commons-fileupload:commons-fileupload.

  • Crash - An attacker sending crafted requests that could cause the system to crash. For Example, npm ws package

Remediation

Upgrade xlsx to version 0.17.0 or higher.

References

medium severity

Improper Input Validation

  • Vulnerable module: xmldom
  • Introduced through: xmldom@0.1.16

Detailed paths

  • Introduced through: textract@0.13.0 xmldom@0.1.16

Overview

xmldom is an A pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser and XMLSerializer module.

Affected versions of this package are vulnerable to Improper Input Validation. It does not correctly escape special characters when serializing elements removed from their ancestor. This may lead to unexpected syntactic changes during XML processing in some downstream applications.

Remediation

A fix was pushed into the master branch but not yet published.

References

medium severity

XML External Entity (XXE) Injection

  • Vulnerable module: xmldom
  • Introduced through: xmldom@0.1.16

Detailed paths

  • Introduced through: textract@0.13.0 xmldom@0.1.16
    Remediation: Upgrade to xmldom@0.5.0.

Overview

xmldom is an A pure JavaScript W3C standard-based (XML DOM Level 2 Core) DOMParser and XMLSerializer module.

Affected versions of this package are vulnerable to XML External Entity (XXE) Injection. Does not correctly preserve system identifiers, FPIs or namespaces when repeatedly parsing and serializing maliciously crafted documents.

Details

XXE Injection is a type of attack against an application that parses XML input. XML is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. By default, many XML processors allow specification of an external entity, a URI that is dereferenced and evaluated during XML processing. When an XML document is being parsed, the parser can make a request and include the content at the specified URI inside of the XML document.

Attacks can include disclosing local files, which may contain sensitive data such as passwords or private user data, using file: schemes or relative paths in the system identifier.

For example, below is a sample XML document, containing an XML element- username.

<?xml version="1.0" encoding="ISO-8859-1"?>
   <username>John</username>
</xml>

An external XML entity - xxe, is defined using a system identifier and present within a DOCTYPE header. These entities can access local or remote content. For example the below code contains an external XML entity that would fetch the content of /etc/passwd and display it to the user rendered by username.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [
   <!ENTITY xxe SYSTEM "file:///etc/passwd" >]>
   <username>&xxe;</username>
</xml>

Other XXE Injection attacks can access local resources that may not stop returning data, possibly impacting application availability and leading to Denial of Service.

Remediation

Upgrade xmldom to version 0.5.0 or higher.

References

medium severity

Denial of Service (DoS)

  • Vulnerable module: jszip
  • Introduced through: j@0.3.11

Detailed paths

  • Introduced through: textract@0.13.0 j@0.3.11 xlsx@0.7.12 jszip@2.4.0
    Remediation: Upgrade to j@0.4.4.

Overview

jszip is a Create, read and edit .zip files with JavaScript http://stuartk.com/jszip

Affected versions of this package are vulnerable to Denial of Service (DoS). Crafting a new zip file with filenames set to Object prototype values (e.g __proto__, toString, etc) results in a returned object with a modified prototype instance.

PoC

const jszip = require('jszip');

async function loadZip() {
// this is a raw buffer of demo.zip containing 2 empty files:
// - "file.txt"
// - "toString"
const demoZip = Buffer.from('UEsDBBQACAAIANS8kVIAAAAAAAAAAAAAAAAIACAAdG9TdHJpbmdVVA0AB3Bje2BmY3tgcGN7YHV4CwABBPUBAAAEFAAAAAMAUEsHCAAAAAACAAAAAAAAAFBLAwQUAAgACADDvJFSAAAAAAAAAAAAAAAACAAgAGZpbGUudHh0VVQNAAdPY3tg4FJ7YE9je2B1eAsAAQT1AQAABBQAAAADAFBLBwgAAAAAAgAAAAAAAABQSwECFAMUAAgACADUvJFSAAAAAAIAAAAAAAAACAAgAAAAAAAAAAAApIEAAAAAdG9TdHJpbmdVVA0AB3Bje2BmY3tgcGN7YHV4CwABBPUBAAAEFAAAAFBLAQIUAxQACAAIAMO8kVIAAAAAAgAAAAAAAAAIACAAAAAAAAAAAACkgVgAAABmaWxlLnR4dFVUDQAHT2N7YOBSe2BPY3tgdXgLAAEE9QEAAAQUAAAAUEsFBgAAAAACAAIArAAAALAAAAAAAA==', 'base64');

const zip = await jszip.loadAsync(demoZip);
zip.files.toString(); // this will throw
return zip;
}
loadZip();

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its intended and legitimate users.

Unlike other vulnerabilities, DoS attacks usually do not aim at breaching security. Rather, they are focused on making websites and services unavailable to genuine users resulting in downtime.

One popular Denial of Service vulnerability is DDoS (a Distributed Denial of Service), an attack that attempts to clog network pipes to the system by generating a large volume of traffic from many machines.

When it comes to open source libraries, DoS vulnerabilities allow attackers to trigger such a crash or crippling of the service by using a flaw either in the application code or from the use of open source libraries.

Two common types of DoS vulnerabilities:

  • High CPU/Memory Consumption- An attacker sending crafted requests that could cause the system to take a disproportionate amount of time to process. For example, commons-fileupload:commons-fileupload.

  • Crash - An attacker sending crafted requests that could cause the system to crash. For Example, npm ws package

Remediation

Upgrade jszip to version 3.7.0 or higher.

References

low severity

Regular Expression Denial of Service (ReDoS)

  • Vulnerable module: mime
  • Introduced through: mime@1.2.9

Detailed paths

  • Introduced through: textract@0.13.0 mime@1.2.9
    Remediation: Upgrade to textract@2.3.0.

Overview

mime is a comprehensive, compact MIME type module.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS). It uses regex the following regex /.*[\.\/\\]/ in its lookup, which can cause a slowdown of 2 seconds for 50k characters.

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service - DDoS - attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.

The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren't very intuitive and can ultimately end up making it easy for attackers to take your site down.

Let’s take the following regular expression as an example:

regex = /A(B|C+)+D/

This regular expression accomplishes the following:

  • A The string must start with the letter 'A'
  • (B|C+)+ The string must then follow the letter A with either the letter 'B' or some number of occurrences of the letter 'C' (the + matches one or more times). The + at the end of this section states that we can look for one or more matches of this section.
  • D Finally, we ensure this section of the string ends with a 'D'

The expression would match inputs such as ABBD, ABCCCCD, ABCBCCCD and ACCCCCD

It most cases, it doesn't take very long for a regex engine to find a match:

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCD")'
0.04s user 0.01s system 95% cpu 0.052 total

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX")'
1.79s user 0.02s system 99% cpu 1.812 total

The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.

Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.

Let's look at how our expression runs into this problem, using a shorter string: "ACCCX". While it seems fairly straightforward, there are still four different ways that the engine could match those three C's:

  1. CCC
  2. CC+C
  3. C+CC
  4. C+C+C.

The engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can use RegEx 101 debugger to see the engine has to take a total of 38 steps before it can determine the string doesn't match.

From there, the number of steps the engine must use to validate a string just continues to grow.

String Number of C's Number of steps
ACCCX 3 38
ACCCCX 4 71
ACCCCCX 5 136
ACCCCCCCCCCCCCCX 14 65,553

By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. These extreme situations can cause them to work very slowly (exponentially related to input size, as shown above), allowing an attacker to exploit this and can cause the service to excessively consume CPU, resulting in a Denial of Service.

Remediation

Upgrade mime to version 1.4.1, 2.0.3 or higher.

References

low severity

Regular Expression Denial of Service (ReDoS)

  • Vulnerable module: xlsx
  • Introduced through: j@0.3.11

Detailed paths

  • Introduced through: textract@0.13.0 j@0.3.11 xlsx@0.7.12
    Remediation: Upgrade to j@1.0.0.

Overview

xlsx is a Parser and writer for various spreadsheet formats.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS). This can cause an impact of about 2 seconds matching time for data 50k characters long. This vulnerability is due to an incomplete fix for SNYK-JS-XLSX-10909.

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service - DDoS - attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.

The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren't very intuitive and can ultimately end up making it easy for attackers to take your site down.

Let’s take the following regular expression as an example:

regex = /A(B|C+)+D/

This regular expression accomplishes the following:

  • A The string must start with the letter 'A'
  • (B|C+)+ The string must then follow the letter A with either the letter 'B' or some number of occurrences of the letter 'C' (the + matches one or more times). The + at the end of this section states that we can look for one or more matches of this section.
  • D Finally, we ensure this section of the string ends with a 'D'

The expression would match inputs such as ABBD, ABCCCCD, ABCBCCCD and ACCCCCD

It most cases, it doesn't take very long for a regex engine to find a match:

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCD")'
0.04s user 0.01s system 95% cpu 0.052 total

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX")'
1.79s user 0.02s system 99% cpu 1.812 total

The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.

Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.

Let's look at how our expression runs into this problem, using a shorter string: "ACCCX". While it seems fairly straightforward, there are still four different ways that the engine could match those three C's:

  1. CCC
  2. CC+C
  3. C+CC
  4. C+C+C.

The engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can use RegEx 101 debugger to see the engine has to take a total of 38 steps before it can determine the string doesn't match.

From there, the number of steps the engine must use to validate a string just continues to grow.

String Number of C's Number of steps
ACCCX 3 38
ACCCCX 4 71
ACCCCCX 5 136
ACCCCCCCCCCCCCCX 14 65,553

By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. These extreme situations can cause them to work very slowly (exponentially related to input size, as shown above), allowing an attacker to exploit this and can cause the service to excessively consume CPU, resulting in a Denial of Service.

Remediation

Upgrade xlsx to version 0.16.0 or higher.

References

low severity

Regular Expression Denial of Service (ReDoS)

  • Vulnerable module: xlsx
  • Introduced through: j@0.3.11

Detailed paths

  • Introduced through: textract@0.13.0 j@0.3.11 xlsx@0.7.12
    Remediation: Upgrade to j@1.0.0.

Overview

xlsx is a parser and writer for various spreadsheet formats.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS) attacks. This can cause an impact of about 10 seconds matching time for data 40k characters long.

Disclosure Timeline

  • Feb 19th, 2018 - Initial Disclosure to package owner
  • Feb 19th, 2018 - Initial Response from package owner
  • Feb 21th, 2018 - Fix issued
  • Feb 22th, 2018 - Vulnerability published

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service - DDoS - attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.

The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren't very intuitive and can ultimately end up making it easy for attackers to take your site down.

Let’s take the following regular expression as an example:

regex = /A(B|C+)+D/

This regular expression accomplishes the following:

  • A The string must start with the letter 'A'
  • (B|C+)+ The string must then follow the letter A with either the letter 'B' or some number of occurrences of the letter 'C' (the + matches one or more times). The + at the end of this section states that we can look for one or more matches of this section.
  • D Finally, we ensure this section of the string ends with a 'D'

The expression would match inputs such as ABBD, ABCCCCD, ABCBCCCD and ACCCCCD

It most cases, it doesn't take very long for a regex engine to find a match:

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCD")'
0.04s user 0.01s system 95% cpu 0.052 total

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX")'
1.79s user 0.02s system 99% cpu 1.812 total

The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.

Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.

Let's look at how our expression runs into this problem, using a shorter string: "ACCCX". While it seems fairly straightforward, there are still four different ways that the engine could match those three C's:

  1. CCC
  2. CC+C
  3. C+CC
  4. C+C+C.

The engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can use RegEx 101 debugger to see the engine has to take a total of 38 steps before it can determine the string doesn't match.

From there, the number of steps the engine must use to validate a string just continues to grow.

String Number of C's Number of steps
ACCCX 3 38
ACCCCX 4 71
ACCCCCX 5 136
ACCCCCCCCCCCCCCX 14 65,553

By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. These extreme situations can cause them to work very slowly (exponentially related to input size, as shown above), allowing an attacker to exploit this and can cause the service to excessively consume CPU, resulting in a Denial of Service.

Remediation

Upgrade xlsx to verson 0.12.2 or higher

References