You can’t compare SAST tools using only lists, test suites, and benchmarks
Asaf Biton
Shani Gal
June 16, 2021
0 mins readThere are a lot of challenges one might face when trying to identify the best SAST tool for your team. But how do you measure something that is meant to find unknowns? How do you know if the tool is appropriate for your needs? How do you compare different tools? It’s no wonder that we often get asked, “Does Snyk Code have coverage for the OWASP Top 10?” followed by “How do you suggest we evaluate and compare different SAST tools?”
We all want simple answers, so the “measuring stick" we see most often used in the context of SAST comparisons are papers and lists of the most common vulnerabilities one should protect against (OWASP Top 10, SANS-25, etc). It might seem straightforward to compare the outcomes of two SAST tools using these lists, however, the answer is not that simple. In this blog we will explore the limitations of using these measuring sticks.
Before we can dive into each standard, let’s set some ground terminology.
OWASP Top 10 is a list of the top ten risks a developer should be aware of when building a web application. It is published by The OWASP® Foundation and its last revision is from 2017.
SANS-25 is a list of the 25 most dangerous software error types. It is published by the SANS Institute and its last revision is from 2011.
CWE Top 25 is another list, very similar to SANS-25, albeit more frequently updated. It is published by CWE Team and its revision is from 2020.
Benchmarkis an open source test suite, specifically designed to test SAST tools. It only tests Java, and is being actively maintained , albeit the last major version was released in 2016.
Intentionally vulnerable appsare repositories or projects trying to educate and provide examples for vulnerabilities. They might also follow one of the various standards. These projects were not created with SAST tools in mind. Some examples are OWASP/NodeGoat, appsecco/dvna, WebGoat, and juice-shop.
Let’s get started!
Shortcomings of vulnerability lists for measuring SAST tools
There are a few reasons why different vulnerability lists are not suitable for measuring SAST tools or prioritising SAST issues:
Limited in scope: Some lists are often limited to a specific scope. For example, the OWASP Top 10 only relates to web application security.
Too generic: On the other hand, SANS-25 and CWE Top 25 actually do not discriminate between environments/languages, and so the list may refer to a lot of vulnerabilities (or CWEs) that are not necessarily relevant for all languages. For example, CWE-416: Use After Freeis only relevant for low-level languages like C, C++, Rust, and so on.
Potentially outdated: Often, these lists are not updated on a regular basis. OWASP Top 10 was last updated in 2017, and SANS-25 was last updated in 2011. This means that the lists do not necessarily represent the current state of application security. One notable example is the recent rise of the supply-chain and typosquatting attacks — both lists do not mention these issues.
Irrelevant for SAST - Some issues are not necessarily interesting in the context of SAST (although this is not to say that they are not important in general). For example, OWASP Top 10 mentions Insufficient Logging & Monitoring which in itself is not a vulnerability at all.
Shortcomings of test suites and intentionally-vulnerable apps for measuring SAST tools
Test suites like OWASP Benchmark and vulnerable repositories also come with their own limitations:
Limited in languages: There is not one test suite or intentionally vulnerable apps that can test multiple languages. OWASP Benchmark, for example, only contains Java issues.
Overfitting: Having a “market standard” set of test suites or intentionally vulnerable apps means that companies are able to base their SAST capabilities around those specific issues. This will then result in those products performing exceptionally well in those benchmarks. Unfortunately, this does not necessarily translate into real-world accuracy and depth.
Semantically holistic:The examples included in benchmarks and vulnerable apps are often not representative of real-world applications where data flow is often more complex.
So... how do you compare SAST tools?
We explored the various tools out there, and we looked at why they are not ideal for assessing SAST tools. It’s important to note that this does not mean that such lists, test suites, and benchmarks are useless. Most of these tools were created with the purpose of educating developers and raising awareness for common security issues. While not great for measuring a SAST tool, we still believe they can play a big role in improving security expertise in your organization.
Now you might be wondering, if the above standards are proven subpar, what is the way to measure SAST tools? Check out our follow-up blog to learn 3 parameters to measure SAST testing.
Get started in capture the flag
Learn how to solve capture the flag challenges by watching our virtual 101 workshop on demand.