How Snyk uses AI in developer security
Frank Fischer
April 4, 2023
0 mins readAs a leader in applying AI to developer security, Snyk’s approach is unique. Today, we want to provide a glimpse at how Snyk currently uses AI and data science, as well as a sneak peek at what’s to come.
Before diving in, we want to highlight two aspects of Snyk’s use of AI to set the stage:
Snyk uses a combination of AI approaches and algorithms, combining the two major schools of artificial intelligence, machine learning (ML) and symbolic AI, with human intelligence to form a hybrid AI.
So far, Snyk has used AI predominantly behind the scenes, focused on providing and perfecting service quality and accuracy in our security research. While Snyk continues to innovate with AI in our research, we will be adding new capabilities that are more noticeable in the user interface.
Snyk’s addition to AI: Hybrid AI
Let us talk a bit more about the first highlighted point above, hybrid AI. We use the term hybrid AI because we are combining multiple AI models with human intelligence.
There are two prominent schools of AI: ML and symbolic AI.
ML, or machine learning, is good in ambiguous situations but needs an abundance of training data. It’s called a “black box” because it cannot argue the correctness of its findings. Generative ML techniques like GPT are in this category.
Symbolic AI, on the other hand, is good for tasks with a clear structure, does not need training data and can argue about its findings (transparent instead of black box). But symbolic AI is weak in ambiguous situations.
If you think about producing security results, you need to be able to handle both situations. You need tools that scan code and can understand the intent, even though the structure may be varied from sample to sample. This is an area for ML, but a major drawback of generative ML systems like GPT or Codex is what is called hallucination. The output of these systems, as impressive as it is, is not guaranteed to be correct and can be convincingly misleading.
Enter symbolic AI, which is good at producing results, with reasoning about why the result is correct. To protect against hallucination, Snyk combines generative ML with symbolic AI. Snyk’s generative ML models generate source code for a fix, then use symbolic AI to check that the generated code is free of issues. As a user, Snyk will automatically provide you with the code that fixes the issue at hand and has been scanned to not introduce new problems — right from your IDE.
Last, but certainly not least, is human intelligence. Snyk strongly believes in the natural intelligence provided by Snyk security experts to control and guide the AI.. But there’s more to it. Snyk’s security research team has uncovered and disclosed thousands of security vulnerabilities, and adds metadata and insights to our security findings that make it easier for the humans on the other end — developers and security teams — to use and understand our findings. AI helps discover the smoke, but the humans in our research group are able to tell if there is truly a fire and ensure that our results are both accurate and actionable.
Since combining these areas is extremely beneficial for usability and robustness, Snyk has leaned into hybrid AI. It’s the best tool to support our philosophy of providing innovative security solutions using the latest in AI and our unique security expertise.
Snyk products using AI today
Within Snyk, there are dedicated teams working on various AI and data science technologies. Our core AI team originated from DeepCode and consists of world-renowned researchers and practitioners in the field of AI, static analysis, and security. They have published several academic papers at recognized conferences and in journals and are part of the wider industry discussion regarding the use of AI in-app security.
Here is a snapshot of how Snyk uses AI and data science in our products today:
Snyk uses AI to monitor a multitude of social and community channels to filter interesting issue candidates and bring them to the attention of the security research team. Our advanced natural language processing is able to uncover possible new and exploited vulnerabilities in open source packages and frameworks by using sentiment analysis, keyword search, and triangulation.
Snyk Code uses AI to perform fast and deep semantic code analysis, setting the industry standard for speed. Our AI engine is not only extremely fast and accurate, but the AI models we use provide reasoning behind its findings, data flows to follow, and examples of how other developers in a similar situation addressed the issue. This idea of providing reasoning for why an answer is correct is important when it comes to security and stems from a different model than generative ML tools.
We build and maintain our SAST knowledge base using machine learning algorithms to learn from 100K+ open source repositories per language. Combining AI with our security expertise has allowed us to create a unique and dynamic hybrid AI.
Snyk Cloud combines various real-time data sources to model the security posture of an application from the original code to the cloud and back. Using the advanced analysis and reporting mechanisms provided by the Snyk platform, information is converted to decision-making knowledge or integrated into broader enterprise systems from partners like ServiceNow and Sysdig.
Future use of AI at Snyk
As we continue to learn and innovate, Snyk will build on the following examples and produce more AI-centered product capabilities.
If you caught our April SnykLaunch event, you saw a demonstration of an upcoming feature built on Snyk’s AI. As a user, Snyk will automatically provide you with code that fixes a security issue and checks that the generated code does not introduce new problems and truly remediates the issue, right from your IDE.
Users will have the opportunity to interact directly with Snyk’s AI to write custom code queries. AI adds two important benefits: semantic auto-completion of the query simplifies the searches, producing faster results over the training set used by the Snyk Code engine. Furthermore, Snyk is able to generate rules using evolutionary or genetic algorithms, optimizing rules to reduce noise and improve completeness.
Combining the semantic code querying capabilities of Snyk with natural language processing provides the opportunity for a developer to formulate intentions in natural language and find example implementations in open source projects to learn and adapt. As a user, you can freely formulate your intentions and will be served with reliable code examples addressing your needs.
Moving forward, the amount and complexity of data flowing into Snyk will increase, as the software supply chain continues to get more complex and Snyk gathers more data on code, packages, and cloud (plus data from partners). Hybrid AI will be able to spot problems that may not directly result in a security issue today, but are possible future vulnerabilities.
Snyk Learn, an interactive learning environment, can be adapted to the individual skill level of the developer by using AI to customize the learning modules.
Diving deeper AI: Generative ML, symbolic AI, and hallucinations
For those of you who are really into AI models and how Snyk uses them to produce results, this section is for you!
Much of the recent buzz around AI is based on the impressive results of OpenAI’s GPT-3, and now GPT-4 models, and descendants like Microsoft’s Codex. Generative ML models learn rules by analyzing huge amounts of data and forming their own rules about how certain inputs lead to outputs. While the results are stunning, generative ML suffers from an issue called hallucination. Hallucinations occur when a machine provides a convincing but completely made-up answer that differs from what would be expected or “normal”. In other words, while good at deciphering ambiguous information, the output of these systems is not guaranteed to be correct and can be very convincingly misleading, as they cannot explain why they believe an answer is correct.
Symbolic AI, on the other hand, is good for tasks with a clear structure, does not need training data and can explain why its findings are accurate. Symbolic AI embeds rules and logic which are then used to analyze and decide on an answer, andI can tell you how it arrives at an answer since the rules are part of the system. But its weakness is in ambiguous situations.
To protect against hallucination, Snyk will combine generative ML with symbolic AI. Using Snyk’s own generative models that are optimized to generate source code, Snyk uses symbolic AI to check that the generated code is free of issues. To the end user, Snyk will automatically provide you with the code that fixes the issue at hand and check to ensure the fix does not introduce new problems.