Dive Deep into AI Agent Security: Comprehensive Risk Categorization and Assessment

Detailed introduction of the next generation AI agents and Virtue AI’s comprehensive risk categorization and security framework

Continuing our last post about AI agents and their security risks, in this post, we will dive deeper into the technical side and explain the internal components and mechanisms of modern AI agents. More importantly, we will introduce our comprehensive risk assessment framework of AI agents, covering more than 50 risk categories and 10 attack vectors. Our automated red-teaming framework supports more than 500 testing scenarios and is compatible with multiple popular agent development frameworks (e.g., MCP and A2A).

What is AI Agent

Article content

Figure.1 – Overview of AI agents. It contains three main components, LLMs, memory, and tools. The agent workflow involves retrieving necessary knowledge from the environment and the memory, as well as taking sequential actions in the environment.

AI agents are next-level AI systems designed to automatically handle complex tasks. At their core, they’re powered by large language models (LLMs), but they go beyond just text generation. A typical AI agent is a hybrid system that combines the reasoning and planning power of LLMs with traditional software components like memory and a suite of tools it can use to interact with the outside world (public or private environment). An AI agent typically finishes a task via a sequence of actions. For example, if a user asks to make a payment for past transactions, the agent might break down the request into steps: check the account balance using a banking tool, review the transaction history, and finally complete the payment — all on its own. These agents aren’t one-size-fits-all either. You’ll find web agents that browse the internet, coding agents that write and debug software, and computer-use agents that automate workflows on your desktop. Powered by LLMs’ ability to plan and reason, AI agents are pushing AI and automation to the next level. As shown in Figure 1, a typical AI agent contains the following components:

  1. LLMs (Large Language Models): The reasoning and planning engine of the system. They are responsible for analyzing user tasks, coming up with overall plans and workflows, receiving external information, and taking a sequence of actions to finish the tasks. An agent system can have more than one LLM component.
  2. Internal Memory: This component stores the internal data of the agent, such as previous action trajectories, which are helpful for the agent to make future decisions.
  3. Tools: A set of functions that are controlled by the LLMs to interact with external environments through interfaces like MCP. The typical tool actions include reading and search tools, that are responsible for information retrieval, as well as writing and execution tools for taking actions.
  4. Environment: The arena where agents execute to finish their tasks; Popular environments include web browser for web agents, IDE for coding agents, as well as software (calendar, email, notebook) for computer use agents.

Why AI Agent Security Is Fundamentally Different and challenging

As AI agents become more integrated into daily life and critical sectors, their security is becoming increasingly important. Unlike standalone models or traditional software, AI agents are hybrid systems that combine large language models (LLMs) with various software components like tools, memory, and APIs. This unique architecture introduces new and more complex security risks. First, attackers can target the LLM indirectly by compromising the components it interacts with, or exploit vulnerabilities in those components through the LLM’s outputs. Second, the internal risks of LLMs such as jailbreaking, hallucinations, make it easier for launching diverse attacks through the models. As a result, both the attack surfaces and vectors are broader and more diverse than in traditional software systems.

More specifically, we demonstrate the challenges of AI agent security by comparing it with AI model security and traditional software security.

Article content
Figure.2 – Comparison of traditional software security and AI security in attack targets, methods, and defenses.

As shown in Figure 2, AI model security is very different from software security in multiple aspects. First, the attack targets of AI model security are more continuous, where the attackers can attack the entire model development and deployment pipeline. The continuity makes the attacks more flexible and potentially harder to detect and defend against given the large attack spaces. Besides, the attack methods of AI security are totally different from the traditional software security, requiring AI native defense solutions.

Article contentFigure.3  – Comparison of AI model security and AI agent security in attack targets, vectors, and defenses.

Figure 3 further compares AI model security with AI agent security. Given the complexity of the agent system, attacks against AI agents have more attack targets. As we will elaborate later, every system component can become the attack targets, not only the data and models. Besides, the attack vectors can be much more complex that involve more than one system component. As a result, attacks against AI agents are much harder to detect and defend against. It requires a combination of AI expertise as well as system and software security expertise, as the agents are hybrid systems with symbolic and non-symbolic components.

Virtue AI’s comprehensive risk categorization for AI agents

At a high level, we define the risks of AI agents from five perspectives based on their target components.

Article content
Figure.4 – As AI agents gain autonomy in reasoning and taking action across private enterprise environments, they introduce new attack surfaces and security risks. This diagram outlines the key threat vectors around AI agents.
  1. Tool security:Security risks associated with the potential unauthorized and malicious uses of different tools, such as unauthorized read from databases or websites; unauthorized execution of dangerous code (remote code execution). Tool security is related to the different environments, which leads to our second category.
  2. Environment security: Security risks specific to different environments caused by the unauthorized and malicious uses of different tools. In Figure 4, we list the most common environments, including web browsers, computer use environments, and databases.
  3. Memory security:Security risks associated with the internal memory of the agents, including memory poisoning (poison the internal memory), backdoor (embed trigger into the memory), and data leakage (leak data from internal memory). These risks require the attackers to have access to the agent internals.
  4. Model security: risks associated with the intrinsic limitations of the LLMs, such as jailbreaking, hallucinations, and weak instruction followings. This set of risks is highly related to the model risks we defined here.
  5. Agent system-level security: The overall security issues of the agent at a system level, such as policy following violation (force the agent not to follow the designed policies) and security misconfigurations (wrong security configurations of the agent).

In the table below, we specify our detailed definitions of individual risks, including their attack targets, attack goals, and fine-grained attack categories. Note that this categorization decouples attack goals from attack vectors. For example, prompt injection is a way to construct attack paths rather than actual attack goals. It can be used for different attack goals.

Article content
Table.1 – Definitions of individual risks, attack targets, attack goals, and fine-grained attack categories.

Virtue AI’s comprehensive risk categorization for AI agents

VirtueAgent-Red platform addresses these risks through a unified, modular system with four core components.

  1. Sandbox environment: We provide sandbox environments covering web interactions, computer use interfaces, and command-line operations, allowing comprehensive testing without real-world risk.
  2. Attack path construction: We support over 15 different attack vectors that cover the most severe attacks (shown in the table below). These attack vectors can be used individually or be chained together to form more complex attack paths.
  3. Attack Generation Engine & Goal-Based Validation: Given an attack goal/risk category defined above, our system generates contextual attack scenarios using different methods and automatically validates whether attacks achieve their intended goals, supporting large-scale testing and providing actionable intelligence for Agent construction teams.

More specifically, our red-teaming works as follows. It begins in a fully isolated sandbox that emulates web browsers, desktop GUIs, and command-line shells, letting us probe malicious behavior without endangering production systems and real-world environments. Then, we can draw from a catalog of 15+ high-severity attack vectors, combining them as needed to craft different attack paths. Then, given an attack goal, we automatically call multiple attack generation methods to generate various attacks that potentially can reach this goal and automatically validate whether each attack achieved its intended outcome. This end-to-end loop enables large-scale testing of more than 500 attack scenarios.

Article contentFigure.5 – In VirtueAgent-Red, four core components work in unison to red team an agentic system. This integrated process uncovers vulnerabilities across memory, LLMs, and tool use before agents are deployed in real-world settings

Below is an example testing case.

Testing goal: Force a web agent to sign in to a customer’s account with stolen sensitive data.

  1. Construct a simulated website where attackers can inject malicious contents (including links)
  2. Construct attack paths using the indirect prompt injection vector: create automatically website content manipulation tools and plunge ins to inject malicious instructions into the website.
  3. Attack generation: generate attack instructions that finish the attack tasks, e.g.,
A complex attack typically involves multi-turn actions.
    1. Attack validation: If the sandbox’s validator sees a successful login followed by the file upload, the attack is marked successful, showing the agent can hijack an account and exfiltrate sensitive data
Article content
Table.2 – Definition of attack vectors and real-world attack examples

In summary:

    • We cover all the risk categories defined above, the fine-grained categories can go to more than 50
    • We cover more than 10 different types of attack vectors
    • Our system is highly modulated as the attack vectors can be used for all attack goals mentioned above, resulting in more than 500 unique attack/red-teaming scenarios against various agents
    • Sandbox covering: web; computer use UI; commend line
    • Covering multiple attack generation methods
    • We support different types of agent frameworks and protocols: Openai-MCP, google Agent Development Kit
    • We support both API testing and system-level testing

Why Virtue AI? Our Unique Advantage

The Virtue AI team brings together deep expertise in both AI security and software and system security—a rare combination that’s essential for addressing agent security challenges. Our background includes:

Pioneering Research:

We’ve published foundational papers in agent security:

First red-teaming against agent’s reasoning: UDora: A Unified Red Teaming Framework against LLM Agents by Dynamically Hijacking Their Own Reasoning

First red-teaming against agent’s memory and knowledge bases: Agentpoison: Red-teaming llm agents via poisoning memory or knowledge bases

First red-teaming against coding agents: Redcode: Risky code execution and generation benchmark for code agents

Early exploration on web agent red teaming: Advweb: Controllable black-box attacks on vlm-powered web agents and Eia: Environmental injection attack on generalist web agents for privacy leakage

First automated and generic red-teaming approach for Agent: AGENTVIGIL: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents

First agent based blue-teaming for agents: GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

First verifiable safety guarantee for agent’s policy following: ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

First provable defense for tool call security in AI agents: MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents

One of the early exploration on system-level defense for AI agent with privilege control: Progent: Programmable privilege control for LLM agents

System-Level Expertise:

Our team understands that AI agents are fundamentally AI and systems problems requiring AI native capabilities and system-level security solutions

Industry Collaboration:

We work closely with leading agent builders (MSFT, Glean, Google AI) to integrate security from the ground up


Take Action: Secure Your AI Agents Today

Don’t wait for a security incident to realize your agents are vulnerable. Virtue AI’s security platform provides:

    • Comprehensive risk assessment across all agent components
    • Automated red-teaming with hundreds of attack scenarios
    • Real-time agent guardrail and threat detection
    • Actionable remediation guidance for identified vulnerabilities
    • Compliance support for industry standards and regulations

The future of AI is agentic, but it must also be secure. Let Virtue AI help you build agents that are both powerful and protected.


Ready to secure your AI agents? Contact our team today to learn more about Virtue AI’s comprehensive security platform and schedule a demonstration tailored to your specific use cases.

[Request Demo]


About Virtue AI: We are a leading provider of security solutions for AI agent systems, committed to enabling the safe and secure deployment of autonomous AI in enterprise environments. Our team of AI and cybersecurity experts is dedicated to staying ahead of emerging threats and protecting organizations as they adopt agentic AI technologies.