Want to try it for yourself?
AI advancements, use cases, and tools hit the headlines every week. But, it’s often challenging to keep up with AI, mainly because it involves so much jargon and science.
We’ve created this glossary of AI terms as a crash course for cybersecurity professionals, developers, and anyone curious about this new technology. We cover the basics of AI, some of its most common use cases, the science of AI and its subsets (e.g., machine learning and natural language processing), and how AI relates to cybersecurity.
AI tools have a ton of variety, with varying degrees of sophistication, versatility, and usefulness. However, most AI tools leverage the same foundational AI methodologies (also referred to as applications of AI), such as natural language processing, machine learning, and neural networks. Often, a single AI tool leverages several of these methodologies at once.
Artificial intelligence (AI) - The simulation of human intelligence, enabling machines to learn from experiences, adapt to new information, and perform tasks that typically require human intelligence, in real-world scenarios.
Natural language processing (NLP) - A subset of AI that combines rule-based human language modeling with statistical, machine learning, and deep learning models, to enable computers to ‘comprehend’ and create human language in a way that mimics how humans do these things.
Machine learning (ML) - A facet of AI that uses statistical techniques and algorithms to analyze patterns in data and make predictions based on these, thereby learning and adapting without clear, specific instructions. In other words, this technology empowers systems to optimize their performance on a specific task through exposure to data, without explicit programming.
Neural network - A computational model that emulates the functionality of the human brain. Neural networks can learn patterns and relationships in data.
AI Model - A mathematical model of algorithms and parameters, based on specific AI methodologies like machine learning, that analyzes data, recognizes patterns, and then uses this information to perform a specific task. Examples of AI models include GPT-4, LLama, and Palm.
Parameters - The settings or configurations of an AI model that determine how it makes predictions or generates outputs. Users adjust parameters during training to optimize the model's performance.
AI Agent - A form of AI that makes decisions autonomously. They perceive their existing environments with models and algorithms and then take automated actions to maximize their chance of achieving predefined goals.
Transformer - A neural network architecture well-suited for natural language processing, because it gains an ‘understanding’ of context and meaning by observing the connections within sequential data. Transformers analyze words contextually by sentence rather than one at a time, shortening training time for AI models.
Temperature - A parameter used in generative AI models to control randomness. Higher temperatures increase the model's creativity and variance in outputs.
Prompts - Text inputs or queries for guiding generative AI models towards a desired outcome. Prompts prime the model and give it context that is related to the desired response.
Tokens - Individual words or subword units created by natural language AI models, the latter also known as Large Language Models (see below). Tokens are used by these natural language AI models to generate and process language. Generative models produce token sequences as outputs.
Technologies like ChatGPT and Google’s Bard have popularized LLMs, AI tools that understand language and text. LLMs can provide many benefits within a business context. For example, they can enable customer-facing apps to provide intelligent assistance.
LLM (Large Language Model) - A subset of machine learning AI that is a type of artificial neural network pre-trained on massive, text-based datasets, allowing it to perform different NLP tasks, such as recognizing, translating, predicting, or generating human-like text or other content, and to ‘understand’ language to some extent.
Generative Pre-trained Transformer (GPT) - A subset of LLM, a GPT is an advanced neural network-based model that specifically uses the transformer architecture for generating coherent, human-like content. It uses extensive language training with billions of parameters to generate these responses.
Chatbot - AI-powered software that simulates conversation with users. It leverages natural language processing to 'understand' text or speech inputs and generative AI to respond to the input prompts with relevant output.
AI tools use several methods to ingest and process data. Each method is intended for specific use cases. For example, a team using advanced terminology could implement expert system AI to navigate their knowledge base (GARVAN-ES1 is a great example.) While a team trying to solve a problem creatively could leverage evolutionary AI, this method has shown great success at DeepMind, where they combined other types of AI with evolutionary AI to develop AlphaStar, which can play and win at the highest levels of the game StarCraft II.
Machine Learning AI - A subset of AI that enables a system to automatically ‘learn’ patterns from datasets and then use this data to refine its performance, like making predictions or performing specific tasks, without explicit programming. ML subtypes include supervised, unsupervised, and reinforcement learning.
Neural Network AI - A subset of machine learning that uses artificial neural networks made of interconnected nodes or artificial neurons to simulate the way that the human brain processes information (i.e., imitate the way that neurons in human brains signal each other), recognizing patterns, features, and relationships in data.
Symbolic AI (or Symbolic Reasoning) - A type of AI that processes symbols with rules and logic. Knowledge about entities, relationships, and facts about the world is represented using symbols, often in the form of logical statements or rules. Human experts manually encode the aforementioned knowledge and rules into the system, defining the symbolic representations and relationships. Unlike some machine learning approaches that focus on pattern recognition, knowledge here is represented in a declarative form, stating what is known rather than how it was learned. Symbolic AI uses formal logic and inference rules, where such rules dictate how the system should process and manipulate symbols to reach conclusions or make decisions to derive new knowledge from existing symbols. Symbolic AI is used to develop expert systems (see below) and works well for tasks involving human-readable expressions and formal reasoning.
Subsymbolic AI - Subsymbolic AI uses machine learning and neural networks, rather than explicit symbolic rules, to create mathematical representations of knowledge. It focuses on learning from patterns in data to make predictions. Or, more specifically, subsymbolic AI uses mathematical models a) that combine i) the way we infer general laws from particular instances, and ii) simulations of the way humans think (the way our brains spontaneously order and process data), or b) that are the application of statistical methodologies, in order to simulate human intelligence. (Note: “subsymbolic AI” is not a widely accepted term; instead, “neural AI” is more commonly accepted.)
Evolutionary AI (or Genetic Algorithms) - An AI approach that simulates the process of natural selection through optimization algorithms, to evolve solutions to complex problems that often cannot be solved with more traditional methods. Evolutionary AI works well for optimization and design tasks.
Expert System AI - A type of AI that provides expertise in a specific domain by referencing a knowledge base to make decisions or provide recommendations in a way that mimics how a human expert within a specific field of knowledge would make decisions or provide advice.
Generative AI - AI that creates new content — like images, audio, and text — that resembles or fits within a given dataset, rather than simply analyzing existing data. Generative models learn patterns from training data and then use this knowledge to generate novel outputs.
Hybrid AI - A combination of multiple AI approaches, overcoming the limitations of individual techniques. For example, Snyk Code uses hybrid AI by generating code fix suggestions with generative AI and then checking the security level of these suggestions with symbolic AI.
Machine learning is one of the most common AI approaches. Users train machine learning algorithms with a dataset, enabling the ML tool to draw inferences. Businesses can leverage machine learning for numerous use cases, such as recommending the best next steps to users, facilitating the detection of potential threats, and calculating dynamic pricing for prospective customers.
Training data - A set of examples used to train machine learning algorithms, helping them to learn patterns and make predictions or decisions.
Deep learning - A subset of machine learning that is a specialized form of neural network with multiple layers (“deep” architectures), the former which tries to mimic the way that the human brain works, through a combination of data inputs, weights, and bias. This type of machine learning involves the training of the aforementioned deep architectures to learn hierarchical representations of data by using these data inputs, weights and bias to identify, categorize and define specific items within the data, each layer of interconnected nodes optimizing and refining the network’s predictions for increasing accuracy. Unlike classical machine learning, deep learning algorithms can take in and process unstructured data. The depth of the network’s architecture helps the network to automatically learn features with different levels of abstraction.
Labeled training data - Data used in supervised machine learning, and containing input examples paired with corresponding desired outputs, enabling the relevant algorithms to learn the relationship between inputs and outputs. Put simply, such training data requires humans to attach informative labels to raw input data (before the machine learning model is trained on the data), to give context that helps the model to adjust its parameters as it learns the patterns present in the data. This enables the machine learning model to improve the accuracy of its predictions and its performance on specific tasks, over time.
Supervised learning - A subset of machine learning in which the algorithm learns from labeled training data, automatically adjusting its parameters to minimize the difference between its predictions and the human-created output labels in the training data, then applying its learning more broadly to make inferences on similar but previously unseen inputs.
Unsupervised learning - A type of machine learning in which the algorithm in a deep network learns to identify patterns and relationships in data without explicit guidance in the form of labeled examples (see “Labeled training data” above).
Federated learning - A distributed machine learning approach that trains the model with decentralized data across multiple sources, without ever exchanging or sending this raw data to the centralized server or coordinator. Only the model updates are sent from the distributed servers or devices containing the local data to the centralized server, to help improve the global model. This approach upholds data privacy best practices, because the raw data never leaves the individual devices where it is held. Well-known examples of federated learning solutions include voice recognition, facial recognition and word prediction in tools like Siri, Google Assistant or Alexa.
AI relies on complex data processing techniques and human oversight to process input and create output that’s accurate and helpful to users. By understanding how AI works behind the scenes, businesses can better understand its capabilities and limits.
Data mining - The process of extracting patterns, insights, trends and other useful information from large datasets with statistics, machine learning, and database system techniques. Data mining is also known as knowledge discovery in data (KDD).
Fine-tuning - The process of adjusting and optimizing a pre-trained model — a model initially trained on a large and diverse dataset for a general task, so that it may recognize general features and patterns — to adapt its broad knowledge, thus helping the model to perform better for a narrower, specialized use case in a shorter space of time (than it would take to train a model from scratch). To fine-tune a model, users must input additional, curated data for their desired use case, define the specific task, adjust the training parameters, evaluate the accuracy of the model, and then iterate if need be.
Bias and fairness - The potential for AI systems to reflect and perpetuate pre-existing biases in training data, leading to unfair outcomes or decisions.
Explainability - The ability to explain how and why an AI model makes certain predictions or decisions. The explainability of an AI model is essential for building trust and transparency.
AI can have many fascinating uses, from securing enterprise-level networks to writing application code. As application development and security teams consider using AI for everyday work, they can look at the following use cases:
AI-assisted development - Generative AI to help developers write, review, and document code.
AI-assisted applications - AI capabilities, such as chatbots, used within applications to increase efficiency and accessibility of information.
AI-assisted tooling - AI built into tools and processes to make them more effective, for example, increasing the accuracy and speed of vulnerability detection and fixes in cybersecurity tools.
AI can both help and harm the cybersecurity efforts within an organization. Cybersecurity teams can leverage AI to implement intelligent security tooling or rapidly identify suspicious system activity. However, bad actors can also use or target AI, meaning cybersecurity teams must defend their systems against these new, evolving threats.
AI cybersecurity - The usage of artificial intelligence in cybersecurity tools or strategies. AI cybersecurity also describes the efforts to secure AI models from vulnerabilities such as prompt injection.
AI attacks - Refers to cyber attacks made using AI, or attacks on AI models such as prompt injections or data-poisoning.
AI vulnerabilities - Vulnerabilities within AI models, such as susceptibility to prompt injections. The OWASP LLM top 10 is a helpful resource for learning about the top threats to LLMs; you can check out Snyk's analysis of the LLM top 10 here.
Hidden classifiers - Here, “hidden classifiers” refer to the features, patterns, or variables in a machine learning model that contribute to its decision-making but are not, or not easily, identifiable or interpretable by humans. Identifying or interpreting these classifiers is sometimes challenging, because of what is commonly referred to as the “black box” nature of LLM models that arises for a variety of reasons, including the complexity of models, and automated feature-learning. At present, users can’t tell which classifiers the relevant LLM model uses to produce an output or make a decision and since the same input will not always generate the same result, it is far more difficult to secure these models.
Adversarial attacks - Deliberately manipulating inputs to trick AI systems into making wrong predictions or categorizations.
Hallucinations - The generation of coherent but nonsensical or inaccurate information that is created from patterns or objects that the LLM perceives, but is non-existent or undetectable by human eyes, in the model’s training data. It could be thought of as the AI "imagining" information—creating content with no factual basis or grounding in the learning it received. Hallucinations are a security concern, making the AI model more unpredictable and unreliable.
AI ethics - Ethical principles for developing and using artificial intelligence systems to prevent potential harm. These principles account for algorithmic bias, privacy, accountability, transparency, etc.
As with any technology, AI can be helpful or harmful — depending on how the technology is used. To get the most out of AI, learn how each tool ingests data, processes it, and creates outputs. By understanding what each type of AI can and cannot do, organizations can make the best and most secure decisions for their tech stacks and move forward with confidence.
Next in the series
4 Advantages of using AI code review
AI code review reports on critical bugs in real time and shows you how to fix them. Discover even more benefits and how Snyk code review can help.Keep reading