How SAS secures their AI-generated code

Escrito por

Gerald Crescione

15 de abril de 2024

0 minutos de leitura

Coding assistants powered by generative AI are transforming the technical worldview. The innovation has revolutionized the way developers work, increasing efficiency and velocity to new heights. AI brings new challenges to the security teams at that increased velocity. These coding assistant models are trained on both high- and low-quality code from across the web. The code they generate, while functional, brings external quality and security issues into our projects. AI also introduces new attack vectors that could cause significant damage within organizations. In response security teams must rise to the challenge to ensure the safe and responsible use of this powerful tool.

These new realities leave teams wondering where to use AI for software development, which precautions they should take to minimize inconsistent results from these tools, and how to continue innovating with AI while staying safe from emerging risks.

Snyk recently hosted a live webinar on all these topics, moderated by Clinton Herget, featuring Brett Smith and Chris Knackstedt. The speakers covered what these AI code risks look like for organizations in 2024 and which measures they are taking to safeguard their applications against these new threats.

The risks of AI-generated code

Smith and Knackstedt agree that generative AI has unquestionably revolutionized the game for software development, serving as a pair programmer to developers worldwide. Its ability to provide a starting point when writing code from scratch significantly streamlines the development process. They expect that in the near future, harnessing the power of generative AI will enable developers to expedite their workflow and enhance overall efficiency.

Knackstedt also looks forward to seeing the positive impacts of AI tools because they offer fine-tuned solutions for bespoke problems. He said, “I have been extremely impressed with the progressions that generative AI has made in just one short year — especially in the novel use cases that people are thinking about for using some of these tools.”

While AI can increase velocity when appropriately used, it can cause detrimental security issues if misused. AI tools are making it considerably more challenging for security teams to keep up with other technological advances. Many of them are still trying to digest the last round of digital transformation and get up to speed with DevOps, cloud, CI/CD, and many other technologies and processes. These ongoing digital transformations are being accelerated by the generative AI phenomenon, meaning that it’s more crucial than ever for these security teams to adapt.

In addition to speeding up the digital transformation already in motion, AI-generated code can pose security threats to organizations. There are two main reasons for this.

AI-generated code is not innately secure or high-quality

Generative AI tools use both high- and low-quality training data from across the internet — well-written code by senior developers, error-prone code by novices, and even intentional examples of non-functional code that people have posted on forums such as Stack Overflow. Because generative AI ingests such a variety of code, the results are the same level of quality as what you would expect from a junior developer.

The speakers also discussed the importance of realizing that today’s generative AI tools cannot detect vulnerabilities by default. Herget referenced Snyk research that proves this. Our team set up a multi-method class of Java code and named one of the methods “SQL injection.” Then, we asked several AI coding assistants if there was a SQL injection vulnerability in the code, and they all answered, “Yes.” After we simply changed the class name without altering anything about the code's functionality, the same assistants answered, “No.” This experiment proved that AI still does not understand the symbolic significance of specific language within code.

External threats can take advantage of AI models

In addition to accelerating existing issues, AI also brings the risk of new attack vectors. Threat actors can exploit generative AI with code injections, data poisoning, prompt injections, and hijacking. These exploitations can either make the model perform malicious actions (such as writing insecure code on purpose) or allow the malicious actors to steal sensitive training data.

Hallucinations, in which AI tools generate references, code libraries, or functions that do not exist in the code, are another security issue. Bad actors can take advantage of AI hallucinations with techniques such as library squatting. For example, if AI hallucinates a code library that doesn't exist, an attacker could take advantage of this nonexistent library by creating a real version of it containing malicious code. Then, the AI-generated code referencing this library would call this malicious code into the application.

How to use AI-generated code securely

Although it poses security risks, generative AI is still essential for many of today’s developers and isn’t going away anytime soon. It provides invaluable support in helping development teams overcome writer’s block and serving as a junior coding assistant. This support ultimately enables developers to code faster than ever.

But this unprecedented speed directly impacts today’s security teams. It means that they have more code to secure in a shorter amount of time. As a result, security teams must leverage tactics that will allow them to better secure AI-generated code without slowing down technical innovations. Here are a few tips that these teams can leverage to meet today’s fast-paced, AI-supported development cycles:

Prioritize developer education

First, developer teams must understand what each AI tool can and can’t do. Smith said, “This isn't a silver arrow that will solve all your problems. It's not your senior developer program. This is a junior associate program that can put out 170 lines... you need your developers to understand that they're not writing any better or more secure code than they were writing before. They just have a really smart autocomplete now.”

In addition, organizations must educate developers on how to protect intellectual property and privacy when inputting code-related prompts into an AI tool. It’s crucial for development teams to realize that generative AI uses prompts as training data and could put sensitive assets in jeopardy by leveraging this data to respond to others’ prompts.

Behavior analysis plays a vital role in measuring the success of these education initiatives. Knackstedt said, “We're going to have to adopt processes and principles that use more behavioral analysis — not just of the code itself, but also the people that are developing the code, and the people that use the solutions the code are built from… [then], we can establish a more holistic picture about the use of AI as it was intended by design, and then the actual use in practice.”

Double down on tried-and-true security measures

Building security testing into the development lifecycle has been important from a DevSecOps perspective for a while. But, since the velocity of code injection has increased significantly because of AI, it’s more critical than ever for organizations to consider building security testing into their code pipelines.

Smith brought up how his team employs static code scanning to meet the unprecedented velocity of code issues that AI-assisted development brings. He said, “With the rise of generative AI, an increasing amount of vulnerable code tries to make it into our pipeline at a speed we hadn't seen before. Every time you push code, it gets run through a Snyk scan, and you can't merge it until you pass. Having Snyk Code (SAST) as part of our automated checkpoints is how we fight the evil aspects of generative AI. If you can't necessarily tell the difference between what's machine and what's human-generated, then test everything. Test more frequently and test with a toolset that is designed to keep up with the speed of modern DevSecOps, so you're not slowing anyone down.”

Implement multi-faceted testing

Teams also should consider ways to cross-check their AI-generated code with the right combination of automated tools and manual intervention. According to Knackstedt, “The use of generative coding should be taken with some guardrails and consideration. I'd say, fundamentally, ensuring that there is a human in the loop and making sure that you're not blindly accepting things that these generative AI solutions are producing, and really going through and thoroughly reviewing your code.”

Smith discussed how SAS conducts these thorough reviews using automated code checking with Snyk SAST and two human-in-the-loop checkpoints. He said, “We've enforced rules in GitHub that require two people to review the code before it merges — one of them has to be the code owner — before we can merge it… From my standpoint, we have to have the human eyeballs on it before it gets in, and it has to pass the automated checks.”

Focus on consistency

From Smith’s perspective, securing AI processes requires consistent security across all planes of development, from infrastructure-as-code to source code to documentation. He said, “We try to treat everything the same way…If we approach them all with the same level of importance, then we put the same level of due diligence into their security methods.”

Knackstedt also highlighted the importance of consistency, explaining that it’s crucial “to ensure that there is some sort of standardization around what you use to build your code and ensure that those tools are safeguarded within your organization. It’s about understanding what data is being used to fine-tune the models or augment the models with additional context within your organization.”

Moving forward with AI-generated code

As development teams move forward using generative AI, we’ll likely see a few emerging security trends. For one, we will likely see an increase in the number of AI components used within security tooling to respond to issues throughout the SDLC. Knackstedt explained this as “fighting fire with fire and being able to find opportunities to build AI and generative AI into some of these protective measures. This way, you can effectively understand at the same rate of pace and use the same type of logic.”

In addition, it’s not just your development team using generative AI to write code. There’s a good chance that your third-party suppliers are also dabbling in this technology. Smith explained, “As we pick up online vendors, we start to use more software as a service in our SDLC... How do I know they're using AI responsibly?” Software supply chain security best practices are more vital than ever because of the potential AI vulnerabilities that come with using SaaS vendors.

For another, the success of securing AI-generated code will ultimately come down to a shared responsibility model, the way we do with cloud security between the user and the provider. Knackstedt said, “With the wide use of artificial intelligence across an entire organization, it really demands more integration across some of these risk management practices to ensure that there is an overarching, responsible way to deploy, manage, monitor, and maintain visibility over artificial intelligence.”

This conversation with Clinton Herget, Brett Smith, and Chris Knackstedt just scratches the surface of how generative AI impacts code development today and how to use this emerging technology securely. To dive deeper into securing AI-generated code, check out our 2023 report on AI code, security, and trust in modern development.

Comece a proteger o código gerado por IA

Crie sua conta gratuita da Snyk para começar a proteger o código gerado por IA em questão de minutos. Ou agende uma demonstração com um especialista para ver como a Snyk se adapta a seus casos de uso de segurança de desenvolvedores.

Comece grátis Agende uma demonstração ao vivo

A plataforma de segurança para desenvolvedores