Behind the disclosure: the Zip Slip vulnerability

Written by:

Danny Grander

August 15, 2018

0 mins read

In June 2018, the Snyk research team found many exploitable instances of the Zip Slip vulnerability in various ecosystems that affected thousands of applications. This kind of wide reaching vulnerability requires a well thought out private disclosure process so that vulnerable libraries and projects are warned about their exposure before public disclosures are made. But how does one find, fix and disclose such a vulnerability to so many without it becoming public knowledge? This post goes into the details of what we did throughout the process from discovery to disclosure, creating fix PRs and beyond.

It’s important to mention that Zip Slip isn’t a specific vulnerability in a particular package or library, but rather a type of arbitrary file write vulnerability that is extremely prevalent in many projects. Another thing to mention is that Zip Slip is not a new vulnerability per-se – the type of vulnerability has existed for many years, decades even, but given the prevalence and the number of vulnerable examples our security team found, we thought it right to give it a name, similar to the Zip Bomb vulnerability.

The beginning

Where better to start than the beginning! Earlier in 2018, as part of our continuous research we identified a vulnerable pattern in the way FTP clients handle folder downloads and found an exploitable implementation in Apache Hive. This shows how a previously known vulnerability affecting FTP clients back in 1990 (and earlier) still exists in many open source FTP client libraries in various languages. We then continued to find other similar patterns in other widely used open source libraries.

We decided to start with archive extraction. When extracting files from an archive, a possible attack vector exists which allows for directory traversal. The premise of a directory traversal vulnerability is that an attacker can gain access to parts of the file system outside of the target folder in which they should reside. The attacker can then overwrite executable files and either invoke them remotely or wait for the system or user to call them, thus achieving remote command execution on the victim’s machine. The vulnerability can also cause damage by overwriting configuration files or other sensitive resources, and can be exploited on both client (user) machines and servers.

Finding Exploitable Libraries

The team started looking through several code samples from popular open source libraries in the Java, JavaScript and Go ecosystems. After not much digging, and to our surprise, we found more than half of them were vulnerable. This means that over half of the initial libraries we examined did not sanitise or validate filenames existing in archives which, when exploited, can lead to an arbitrary file overwrite. The full list can be seen here.

It became apparent from early on that some languages are affected more than others by this vulnerability, while in others, it practically does not exist. Let’s look at a couple of different ecosystems to see how their different approaches to offering the archive management functionality affected the prevalence of the Zip Slip vulnerability.

Zip Slip in Java

Java’s runtime, for example, provides the Zip Classes as part of the core language, allowing developers to write code that reads from archives and extract files from Zip archives. Popular Java libraries, like Apache Commons-Compress, support other formats which makes the issue wider, but there isn’t a simple API that performs this in a single call, which leads to many different implementations around the ecosystem, the majority of which were vulnerable.

Zip Slip in Python

Python, on the other hand, provides the zipfile class courtesy of the core runtime (zipfile.extractall()), and is not vulnerable. This meant we consistently found secure implementation after secure implementation during our analysis of Python repositories.

It is worth noting that Python’s tarfile happens to still be affected. This is because the core runtime implementation is vulnerable. If a vulnerability exists in a language’s core runtime rather than a library, it can be adopted automatically when an application upgrades to the newer version of the language, without any application or dependency changes. This is often a more common occurrence than developers upgrading to a newer dependency version, unless of course you’re using a tool like Snyk that helps with this! (Sorry, I couldn’t help this shameless plug!)

Who else is Vulnerable?

It was soon apparent how severely affected some of the core languages and libraries were at this point, so we decided to focus our attention on the open source projects that were not libraries. We scanned the most popular open source Java projects that were hosted on GitHub and contained archive manipulation code.

We quickly noticed that the majority of implementations contained the vulnerability. This was most common for Java, for which we saw the same vulnerable implementations being reused. This suggested that these vulnerable snippets were being taken from other projects, documentation or communities.

StackOverflow: The Vulnerability Marketplace

Upon analysing the biggest developer resource available to a copy-paste-developer, it was soon clear that our hypothesis was true. We found many questions asking how to best extract files from archives programmatically. We also found that almost all the answers on StackOverflow are vulnerable. Most worryingly, all the vulnerable answers had more upvotes than you could shake a security-shaped-stick at. Here are some of our favourites, including a couple of comments which helped to brighten up our day:

“Utility to unzip an entire archive to a directory in java” We do have now 2011 and there isn’t even a (common) 3rd party library to extract a ZIP in Java with a single call? WTF
What is a good Java library to zip/unzip files? Fine, so make the library support both files and streams. It’s a waste of everybody’s time - mine, yours, the asker, and all the other Googlers who will stumble upon this that we each have to implement our own Zip Utilities. Just as there is DRY, there is DROP - Don’t Repeat Other People.
How to unzip files programmatically in Android?

Once we started the private disclosure, our security team continued to research and find further examples of this vulnerability. Here are some of the key stats to summarise what we found:

12 vulnerable libraries. Opened 5 fix PRs that were merged.
5 libraries without a high level API.
Thousands of apps with the vulnerable implementation (not necessarily exploitable)

Is it Wrong to be Excited?

The role of a security researcher is to find security vulnerabilities, to plug security holes and to help the software world be a safe place in which billions of individuals can have trust and faith in critical business transactions succeeding every second of every day. That doesn’t mean to say that when you find a critical security vulnerability it isn’t satisfying. The periodic eureka moments of any job drives the precision focus during the less exciting moments. Knowing that an interesting find or exposure could be right around the corner is motivation to spend days, weeks and months searching. Finding a vulnerability is similar to the elation a developer feels when they finally find the root cause of a complex bug. Or perhaps redesign an application to increase performance by 2x, or scalability by 3x.

When we discovered the first instance of the the Zip Slip vulnerability in a big project, it was very exciting. It was our eureka moment, but when we discovered that every other application had a vulnerable implementation, we were extremely surprised. It made us realise that this vulnerability wasn’t just affecting a few apps, but loads of projects across ecosystems.

Digging through all the data and understanding why particular languages are affected more, and how vulnerable code snippets spread through the open source projects was really exciting and enlightening. It was also a completely different research task (compared to vuln hunting). The real hard work started with the disclosure and helping projects fix their vulnerable code – finding one weak link is relatively easy, but making all the links stronger is hard.

The Disclosure

We followed the responsible disclosure process, choosing the 5th of June to be the public disclosure date. This gave project maintainers a 60 day window in which they could fix the issues. Since all the vulnerable projects we found were open source, the activity in the repository, such as commits and pull requests are also open and visible. Therefore, it was important to weigh up both providing a reasonable timeframe for fixing the issues, with risking a wide public exposure to the issue, that would put unfixed projects at risk.

Our first step was to reach out to the maintainers of the most widely adopted libraries that carried the vulnerability, to inform them of the issue. There were 10 in total that we sent communication to in this first wave of emails. Some were not even aware that the library they once wrote for themselves became one of the most popular extraction libraries. Out of the 10 libraries, 5 maintainers asked for help with fixing the issue, so we went ahead and opened fix pull requests (plexus archiver, mholt/archiver, adm-zip, unzipper and zip4j).

After communicating with the library maintainers, we moved on to the affected projects that implemented their own extraction logic, rather than use a central library. This code was likely implemented from scratch, or copy-pasted from documentation, StackOverflow or other OSS projects. All of these outcomes would likely result in vulnerable code being shared around. Once we created a list, we soon realised that there were too many to reach out to, without risking a public leak, either via project fixes or simply irresponsible individuals that would share the information without understanding the consequences. We initially focused on the major projects, starting with Apache, who had 15 projects with vulnerable implementations.

After a call with Mark J Cox, founding member for the Apache Software Foundation and the Apache security lead, we created a collaborative spreadsheet with all the possibly affected projects for them to triage internally and decide which were exploitable. Whether or not they were exploitable, it’s still good practice to fix the vulnerable implementation and almost all did, due to the risk of it being used in the future or being copy-pasted to other projects). Apache Hadoop, Storm, Hive,Maven and Ant were found to be affected, with public CVEs (CVE-2018-8008, CVE-2018-8009) being created at the time of public disclosure, as well as a Maven Zip Slip announcement on the Apache site. Mark was extremely cooperative to work with and helped us uncover, fix and communicate the issue down to each of the affected Apache projects.

We also reached out to Pivotal (zip integration, which was fixed within two days of disclosure!), Oracle, Google, OWASP dependency checker (fixed within a day of disclosure!), and many others.

Lastly, we wanted to privately disclose to all the other affected projects. Since there were too many projects for just one team to test and triage manually, we chose to automate the process by informing the maintainers about the possible risk and to check for themselves whether their project is actually exploitable. Of course, this larger scale bulk disclosure does bring additional risk of public awareness with it. One of the non-vulnerable projects turned to Twitter, which required additional conversations in private to get removed.

Beyond Public Disclosure

When the issue became public, we published an informative page, including vulnerable examples, suggested fixes, etc. We created a collaborative GitHub repository allowing the community to be informed about which libraries and projects are affected, and additionally contribute to it by adding new libraries and projects that we didn’t cover. During the last month there were several libraries and projects added, including closure, sharpziplib, quazip, etc.

If you would like to know more about the Zip Slip disclosure or would like further advice on a disclosure, or would like Snyk to help with a private disclosure, reach out to security@snyk.io and we’d me more than happy to help make open source safer – one commit at a time.

Live Hack: Exploiting AI-Generated Code

Gain insights into best practices for utilizing generative AI coding tools securely in our upcoming live hacking session.