Snyk at RSAC 2021 — ML in SAST: Distraction or Disruption

Machine learning is a loaded term. While machine learning offers amazing potential for advancing technologies, it often gets used as a marketing buzzword describing glorified pattern recognition. So it becomes increasingly difficult to know if the application of machine learning to existing technology is going to break new ground or sell more licenses. That’s the problem that Frank Fischer, Product Marketing for Snyk Code, explores in his RSAC 2021 talk ML in SAST: Disruption or Distraction.

Static Application Security Testing (SAST) is a technology that finds security vulnerabilities in non-running source code. The goal of SAST is to find vulnerabilities as quickly and as early in the software development lifecycle (SDLC) as possible. As it uses source code as input for the analysis, it makes it the perfect place to apply symbolic AI, allowing SAST tools to run rules against code bases to see where problems exist.

Lately, we’ve seen a lot of hype around GitHub Copilot. This solution is built on a machine learning model, but seems to have a better understanding of the underlying code. As the examples shared on social media show, sometimes it provides amazing suggestions, and sometimes it acts like a parrot that repeats sentences it has heard before. It shows what is possible today and where the current limitations are — and reinforces the whole point of the RSAC talk!

While SAST has been around for a long time, it hasn’t had much in the way of disruption, only incremental improvement. We’ve seen steady improvements in accuracy, but improvements have come at the cost of computational intensity. This means scans that are both sound (no false positives) and complete (no missed vulnerabilities) can take hours or days to finalize. On top of that, SAST tools are often very good at catching problems, but very bad at providing actionable solutions. So the question is, can machine learning create the disruption that SAST needs to be fast, accurate, and actionable?

Before being able to answer that, though, it’s important to look at how SAST tools work. In order for them to find vulnerabilities, SAST tools need a knowledge base full of rules and methods for detecting malicious patterns. This knowledge base needs to be maintained by experts who research vulnerabilities and add new rules as symbolic AI. Knowledge base management is not something that can be automated yet, but it is something that can be improved with machine learning.

By using giant numbers of source code repositories (all available through open source) and applying machine learning, the experts that maintain SAST knowledge bases can exponentially increase the amount of vulnerabilities they can detect. By learning on billions of lines of code, soundness and completeness can take massive leaps. This sort of leap is a disruption.

But catching more vulnerabilities seems like an improvement, not a disruption. The true disruption would be in the application of machine learning to automate code fixes. The implementation of machine learning can’t just stop at detection, it also needs to be applied to the fixes that are applied to those vulnerabilities. When properly applied to SAST, machine learning has the ability to not just detect wrong patterns, but also know the correct patterns to apply for resolution. This is the disruption.

The final disruption that machine learning can offer is the ability to improve accuracy and actionability at scale. Every day, we have more computational power (ex: GPUs), better algorithms, and more pre-trained models (ex: GPT-3 or GitHub Copilot) to help machine learning perform at the scale needed to bring scan time down to minutes.

Based on all that, Frank comes to the conclusion that machine learning will in fact be a disruption in the field of SAST, providing improved speed, accuracy, and actionability. But even with the ability to disrupt, it can still be used as a distraction when implemented incompletely. When researching your next SAST, there are a few things you’ll need to consider:

  • Is some level of machine learning being applied in your SAST already? If not, where is it in the roadmap? Adding machine learning needs to be used to find and fix, and if it’s not on the roadmap for both, the SAST will fall short.
  • Does your SAST offer actionable solutions already? And does it do it directly from the tools developers use? If not, applying machine learning will only catch more vulnerabilities without offering solutions — meaning more work for developers and lower rates of adoption.
  • Does your SAST integrate directly into an automated pipeline? If not, there will still be a bottleneck, no matter how robust the machine learning implementation is. As an added note on this, in Snyk’s recent State of Cloud Native Application Security report, we found that 72% of fully automated teams fixed critical vulnerabilities in under a week. This number can be even higher with machine learning.

Watch the entire RSAC 2021 talk to learn more about machine learning in SAST. If you’re looking for a SAST tool that already offers speed, accuracy, and actionability, sign up and start using Snyk Code free.

Sign up for Snyk

Check your code, dependencies, containers, and IaC for security vulnerabilities — for free.