Why do organizations trust Snyk to win the open source security battle?

| By Benji Catabi-Kalman

Defining and explaining the role of a proprietary security team dedicated to researching and analyzing vulnerabilities in open source ecosystems—in order to ensure open source security—is not an easy task. It’s challenging to provide a concise answer when asked the relatively simple question, “what does the security team at Snyk do?”. There is no short answer to explain what it is exactly we get up to and how we specialize in our field. 

The problem, I feel, stems from the fact that “Researcher” and “Analyst” in general and “Security Analyst” or “Security Researcher” specifically have to be some of the most abused and overused titles since “Consultant”. They can mean literally anything, and every company seems to have an entirely different definition as to what its Security Research Team does (when they even bother to have one). 

It’s likely that I will be unable to fully solve the problem of explaining to people I meet outside of work what it is I do. But this blog is at least a stab at minimizing this issue by spelling out once and for all what it is the wonderful Security Research Team does here at Snyk.

Here’s what we’ll discuss in this article:

Our mission

Snyk’s mission is to make the open source world more secure and help empower developers to take an active role in securing their codebases. One of the main ways we do this is by scanning our users’ managed open source libraries and container images for vulnerabilities, and giving them the most accurate and actionable information possible on the vulnerabilities found, as well as providing them with an option to remediate the issues found.

The role of the Security Research Team within the company is to gather and cultivate the Snyk Intel Vulnerability Database that powers our scans and provides users with that information so they can remediate and fix vulnerabilities before they become security threats.

The work required to build—and maintain—a top tier security database is tough and requires us to work hard, not only on the accuracy of the data but also the breadth and depth of our database. It’s probably easiest to explain how exactly we do this by walking you through the way we source, verify, and triage vulnerabilities for the database.

Our methods

We employ several methods to ensure ongoing coverage of open source security issues:

1. Structured community ecosystem databases

The most obvious source of vulnerabilities in open source ecosystems is the community-powered databases, such as rubysec, friends of php, rustsec, and many others. These databases are curated by open source developers who work inside the ecosystem and try to provide as much visibility as possible on the security issues prevalent in them.

The Snyk Security Research Team actively tracks these community-powered databases in order to keep up to date with any community disclosures of vulnerabilities. However, despite the sterling work done by the community in informing the ecosystem users of these vulnerabilities, there is still often additional and necessary work to be done.

Every vulnerability disclosed is further verified and analyzed by the Security Research Team in order to: 

  1. verify this is actually a vulnerability—this includes investigating the vulnerability disclosed and, if necessary, creating Proof of Concepts to verify the vulnerability is in fact exploitable.
  2. define a complete and accurate description of the vulnerability including its severity and CVSS score—by digging into the vulnerability we are able to more fully understand it’s specific likely impacts and attack vectors, as well as the vulnerable package’s likely use inside the ecosystem. Combining this understanding we curate a description and severity advisory in order for developers to best understand the vulnerability and the way it could potentially impact their codebase.
  3. verify that the information regarding important metadata, such as fixed versions and affected packages, is complete and accurate—we dig into the codebase of the package, identifying the exact code changes that introduced the vulnerability and which fixed it. We then cross-reference this information with the precise package name and versions to make sure we are marking all the vulnerable packages and versions only.
  4. identify the vulnerable functions or classes inside a package—during the vulnerability triage we look to identify the exact vulnerable function (or functions) inside the package. Identifying this metadata allows developers to know if their specific use of the package is vulnerable.
  5. monitor and triage disclosed exploits in-the-wild of this vulnerability going forward—even after a vulnerability is published there is a need to track if further information on this vulnerability is disclosed, especially if a mature exploit vector is published. We monitor a variety of sources to identify exploits as they are published and triage them to provide data on the maturity of the exploit so that developers know how to prioritize fixing these vulnerabilities.

2. Unstructured databases and advisories

It’s helpful that structured databases exist inside ecosystems; however, there is a plethora of information on new vulnerabilities inside unstructured databases and public advisories as well. This makes it critical to track and triage these types of sources.

The most obvious example of this is the CVE and NVD databases, which have many vulnerabilities logged in an unstructured and non-machine readable format. Beyond CVENVD there are also countless individual product advisories and mailing lists such as the Apache Mailing List,Node.JS update blogs, Jenkins’ Security Advisories, and many more, all of which require our attention.

While obviously all of the work stated in the section above must be applied here as well by the team, there is also a necessity to curate the data into a machine (and human) readable format. For example, while it might be reported in NVD or the Apache Mailing List that a vulnerability affects, say “Apache Tomcat”, what developers actually need to know is, of the 958 different “Apache Tomcat” related packages currently available in Maven, whether the specific package they are using is vulnerable. 

Through in-depth research into the vulnerable codebase, the team is able to identify what the parameters of the vulnerable code are. After this we use our internal tooling to scan the packages likely to be associated with this vulnerability to check if the vulnerable code is reflected within. Only after verifying the vulnerable code exists and is relevant, will we assign the vulnerability to a specific package. This kind of work greatly reduces the number of false positive reports inside our database which, in turn, allows developers using Snyk to concentrate on fixing vulnerabilities—as opposed to dealing with white noise.

3. Unearthing unpublished vulnerabilities

Up until this point I’ve mainly been describing how the team helps triage and make known vulnerabilities more actionable for developers, which is obviously critical in and of itself. However, another role the team takes is furthering the security strength across the open source ecosystems by identifying not yet disclosed vulnerabilities. 

We have developed several machine learning algorithms alongside our Data Team to help uncover what we like to call “half-day” vulnerabilities. These are vulnerabilities that are possibly discussed in various public forums but, remain as of yet, undisclosed officially. 

It’s well known that these vulnerabilities are often the most dangerous, as the time between when a vulnerability is being discussed and when it is acknowledged and the general public can take steps to remediate against it, is a window in which malicious actors can take advantage of their early knowledge of the vulnerability. 

We help close this gap and are able to discover and alert users about newly-found vulnerabilities very early in their life cycle by sourcing places where code fixes, bug reporting, and potential vulnerabilities are often discussed in their preliminary stages, such as Source Control PRs and Issues, JIRA tickets, or sites like Reddit or StackOverflow.

Thanks to our preexisting and hand cultivated database of vulnerable code snippets, bug reports, and vulnerability descriptions we have a wide range of resources on which we base our machine logics. Our systems are therefore able to process thousands of events across all packages in our supported ecosystems and alert the team on potential vulnerabilities. All told, we believe we have put together probably the widest-reaching alert feed currently available.

These alerts are fed into the team’s early warning system and are then triaged by the team. The information in the alert is disseminated by an analyst and, if necessary, we verify the vulnerability by PoCing it and then reach out to the maintainers of the package to get their insight into the potential vulnerability. 

After a discussion with the maintainers and verification of the vulnerability, we move forward to publishing this vulnerability in our database to make the entire ecosystem aware of the security issue and so they can remediate against it. In the last year alone, we have helped disclose approximately 200 vulnerabilities in this manner, including the Command Injection in Vizion and the Timing Attack affecting the popular Escada and Elliptic packages.

4. Community and academic disclosures

Responsible vulnerability disclosure is a disclosure model commonly used in the cybersecurity world where 0-day vulnerabilities are first disclosed privately, thus allowing code and application maintainers enough time to issue a fix or a patch before the vulnerability is finally made public. Otherwise, we would have sacrificed the security of the end-users. As always, balance is key—the aim is to minimize both the time the vulnerability is kept private, but also the time the application remains vulnerable without a fix.

Snyk launched its vulnerability disclosure program in 2019, with the aim to bridge the gap and provide an easy way for researchers to report vulnerabilities while, of course, fully crediting the researchers’ hard work for the discovery. 

Our security team carefully triages each and every vulnerability report. This requires specific knowledge and understanding of both the language at hand, the package, and its context. Once the vulnerability details are verified, the team proceeds to work hand-in-hand with maintainers to get the vulnerability fixed in a timely manner. 

Finally, as a CVE Numbering Authority (CNA), we assist with assigning the issue a CVE ID and publishing a detailed advisory. In 2019, we helped disclose over 130 vulnerabilities. Some notable examples are RCE in mongo-express and Arbitrary File Write in yarn.

We work with independent researchers, security personnel, and the academic community to disclose vulnerabilities. Individual researchers use our disclosure program to disclose single or sometimes several vulnerabilities affecting packages. On the other hand, academic researchers disclosing new vulnerability types, reach out to Snyk to use our disclosure program for mass-disclosures of multiple vulnerabilities across the ecosystem. We kicked off 2020 with a big partnership with the Johns Hopkins University Security Lab team, where we helped them disclose over 60 vulnerabilities (and counting). Notable examples include CVE-2019-10795 in undefsafe and CVE-2019-10777 in aws-lambda.

5. Proprietary research and vulnerability trend analysis

The final piece of the security puzzle is our internal proprietary research to discover and responsibly disclose new vulnerabilities in the open source ecosystem. At Snyk we see the role of discovering and responsibly disclosing new vulnerabilities as key to keeping the ecosystem safe, and we utilize our database of existing vulnerabilities in order to do so. By looking into trends in vulnerabilities inside a specific language or across multiple languages we identify key characteristics of vulnerabilities that are likely to be prevalent in the ecosystem.

In other words, if we see that a certain vulnerability is popping up in several different packages across the ecosystem—all with similar attack vectors—we know that it is likely there are additional vulnerabilities in the ecosystem, not yet found. Moreover, we know that it is likely that other, more malicious actors, are aware of this possibility and are looking to use it to their advantage. 

Therefore Snyk attempts to use our research capabilities judiciously and efficiently to uncover either large scale new vulnerability types, such as zip-slip, or to discover new instances of trending vulnerability types in popular packages. In order to do this research. we rely on a mix of big data spelunking to identify vulnerable snippet types, code patterns, or other similar characteristics in potentially vulnerable packages—alongside research into various exploit vectors which our internal tooling is then able to run against the potentially vulnerable package and determine if it is indeed vulnerable (see proprietary vulnerabilities).

Once uncovered, we use our responsible disclosure timeline and methodology to disclose vulnerabilities to the affected maintainers and help work on a fix, before disclosing the vulnerability publicly in our database and assigning it a CVE.

Summary

The result of all this hard work is that the security product we provide achieves four critical goals:

  1. Timeliness—the information our users are getting on vulnerabilities and exploits is timely enough for them to act upon it, giving them warnings on vulnerabilities often well before they are made more widely available and malicious actors can act upon them.
  2. Completeness—users can rest easy knowing they are getting the complete picture of their vulnerabilities and not having to worry about the fact that additional, potentially exploitable, vulnerabilities might be lurking in their codebase. 
  3. Accuracy—the data provided is accurate to remove the dreaded white noise and false positives from a security scan.
  4. Actionable—the data allows developers to quickly triage the vulnerabilities that impact them and prioritize their work accordingly.

Together, these four goals allow a Snyk user to know that, despite the never-ending jungle of vulnerabilities inside open source ecosystems, they have a dedicated team of security experts on their side, waiting and ready to spring into action to give them all the information they need to keep themselves (and the rest of open source) safe.