Reference

AI Agent Security Glossary

Plain-English definitions of 30 terms used in AI agent security, supply chain auditing, and EU AI Act compliance. Every definition links to deeper reading where relevant.

A

Agent skill

A packaged set of instructions, tool definitions, and capability declarations that extends what an AI agent can do. Skills are distributed through marketplaces like ClawHub and installed into agent runtimes like OpenClaw. Because skills execute with the permissions of the agent, a malicious or poorly written skill can cause the agent to exfiltrate data, contact attacker-controlled servers, or take destructive actions.

Full definition →Deep dive →

B

Behavioral integrity

The degree to which a skill or AI system does what its documentation says it does and nothing else. Unit 42's Behavioral Integrity Verification (BIV) research found that 80% of ClawHub skills show at least one mismatch between declared and actual behavior. A skill with low behavioral integrity may be malicious, poorly written, or simply underdocumented — but from a security standpoint, undeclared behavior is a risk regardless of intent.

Full definition →

C

C2 callback

A Command and Control (C2) callback is a network connection from a compromised system back to an attacker-controlled server. In the context of AI agent skills, a C2 callback typically appears as an HTTP request or WebSocket connection to a domain the skill documentation does not disclose — such as webhook.site, requestbin, pipedream, or a custom attacker-owned domain. TrustSkills checks every skill for C2 callback patterns as part of its standard scan.

Full definition →

ClawHavoc

A coordinated supply chain attack campaign that planted 1,184 malicious skills across 12 publisher accounts on the ClawHub marketplace between January and February 2026. ClawHavoc skills delivered the AMOS infostealer on macOS, exfiltrated credentials and API keys, and used obfuscated payloads to evade file-hash scanners. ClawHavoc is named after the campaign identifier used in attacker infrastructure.

Full definition →Deep dive →

ClawHub

The official marketplace for OpenClaw agent skills. Skills are published by third-party authors and can be installed into OpenClaw with a single command. ClawHub is the primary supply chain vector for AI agent skill attacks — the same dynamic that made npm and PyPI targets for software supply chain attacks. As of early 2026, Snyk's ToxicSkills research found that 36% of scanned ClawHub skills contained at least one security flaw.

Full definition →

D

Data exfiltration

The unauthorized transfer of sensitive data from a system to an attacker-controlled destination. In AI agent skill attacks, data exfiltration typically targets .env files, AWS credentials, SSH private keys, environment variables, and browser-stored passwords. The exfiltration is usually disguised as a legitimate HTTP request, telemetry call, or API interaction so it appears normal in logs.

Related:C2 callback ClawHavocCredential harvesting

Full definition →Deep dive →

Direct prompt injection

A prompt injection attack where the attacker places malicious instructions directly in the input sent to the AI model — for example, a user message that tells the agent to ignore its system prompt and perform a different action. Contrasted with indirect prompt injection, where the attack is embedded in external content the agent reads, such as a web page, email, or document.

Full definition →Deep dive →

E

EU AI Act

The European Union's comprehensive regulation governing AI systems, which entered into force on 1 August 2024 and is rolling out in phases. The regulation classifies AI systems by risk tier (prohibited, high-risk, limited risk, minimal risk) and imposes obligations accordingly. For agentic AI deployments, the most immediately relevant milestone is 2 August 2026, when the European Commission gains enforcement authority over General-Purpose AI (GPAI) model providers and can issue fines up to €15 million or 3% of global turnover.

Related:GPAIHigh-risk AI system Transparency obligationTechnical documentation

Full definition →Deep dive →

Excessive agency

OWASP's term (LLM06:2025) for the condition where an AI agent is granted more capabilities, permissions, or autonomy than its task requires. Excessive agency amplifies the blast radius of any mistake or attack — a skill that can delete files, send emails, and execute shell commands does far more damage when compromised than one with read-only access. OWASP's mitigation guidance focuses on minimizing extensions, reducing functionality, and enforcing user approval for high-impact actions.

Full definition →Deep dive →

G

GPAI (General-Purpose AI)

A category defined by the EU AI Act for AI models capable of performing a wide range of tasks across different domains — such as Claude, GPT-4, and Gemini. GPAI providers have mandatory compliance obligations under the AI Act including technical documentation, copyright compliance, and transparency about training data. Providers of GPAI models with systemic risk (very high capability or wide deployment) face additional obligations including adversarial testing and incident reporting.

Full definition →Deep dive →

H

High-risk AI system

An AI system classified under Article 6 and Annex III of the EU AI Act as posing significant risks to health, safety, or fundamental rights. High-risk categories include AI used in employment decisions, credit scoring, access to essential services, law enforcement, and critical infrastructure. High-risk AI systems face the most demanding compliance obligations — conformity assessments, data governance, and quality management systems. Annex III obligations for use-based high-risk systems were deferred to December 2027 in the Omnibus amendments.

Full definition →Deep dive →

I

Indirect prompt injection

A prompt injection attack where malicious instructions are embedded in external content that the agent reads during its task — a web page, email, document, file, database entry, or image. When the agent processes the content, it may interpret the injected instructions as legitimate directives. Indirect injection is harder to prevent than direct injection because the attacker controls content in the environment rather than the user's input. OWASP explicitly warns that indirect injections can be either intentional or accidental.

Full definition →Deep dive →

L

Least privilege

A security principle requiring that every component in a system — including AI agent skills — operate with only the minimum permissions necessary to perform its intended function. A skill designed to summarize documents should have read access to documents, not write access to your file system or the ability to execute shell commands. Applying least privilege to agent skills reduces the blast radius of a compromise or misconfiguration.

Full definition →

M

MCP (Model Context Protocol)

An open standard protocol, published by Anthropic, that allows AI agents to connect to external services, data sources, and tools through a common interface. MCP defines how agents discover available tools, call them, and process their responses. Because MCP servers run with process-level permissions and can respond to agent tool calls, they are a significant security surface — BlueRock Security found 36.7% of audited MCP servers vulnerable to SSRF attacks.

Related:MCP server Tool poisoningSSRF

Full definition →Deep dive →

MCP server

A server implementing the Model Context Protocol that wraps a service, API, or data source and exposes it to an AI agent as a set of callable tools. MCP servers can be local (running on the user's machine) or remote (hosted by a service provider). Local MCP servers run with the user's file system and environment variable access, making them a high-value target for credential exfiltration. Remote MCP servers introduce network trust boundaries and SSRF risks.

Related:MCPSSRFTool poisoningCredential harvesting

Full definition →Deep dive →

N

NemoClaw

NVIDIA's runtime security framework for agentic AI deployments, focused on infrastructure-layer guardrails, memory isolation, and secure multi-agent orchestration. NemoClaw addresses the infrastructure layer — compute, networking, and agent runtime security — but explicitly leaves the application layer (skill supply chain auditing, behavioral monitoring, compliance reporting) unaddressed. TrustSkills is designed to fill that gap and is positioning for formal NemoClaw integration.

Full definition →

O

Obfuscated payload

Malicious code that is deliberately encoded or wrapped to avoid detection by scanners that rely on keyword or hash matching. Common obfuscation techniques in AI agent skill attacks include base64 encoding with eval() or exec(), nested function constructors (Function(atob())), and staged payloads that download the malicious code from a remote server at runtime rather than including it in the skill package. TrustSkills checks for obfuscated payload patterns as part of its standard scan.

Full definition →

OpenClaw

An agentic AI platform that allows users to build, install, and run AI agents that can use skills from the ClawHub marketplace. OpenClaw agents can read files, execute shell commands, browse the web, send communications, and perform other real-world actions depending on which skills are installed and what permissions are granted. The OpenClaw ecosystem experienced a major security crisis in early 2026 when the ClawHavoc campaign planted over 1,000 malicious skills in ClawHub.

Full definition →

Operator controls

Deterministic controls implemented in code — outside the AI model — that govern what an agent is permitted to do. Operator controls include approval gates for high-risk actions, scoped API credentials, read-only file system access, network allowlists, and kill switches that override queued agent actions. OWASP and OpenClaw's own security documentation emphasize that operator controls are the authoritative boundary, not system prompt instructions, which can be overridden by prompt injection or context compaction.

Full definition →

P

Permission scope

The set of capabilities and data access a skill declares in its manifest and actually uses at runtime. A skill may declare minimal permissions but use additional capabilities at runtime — a behavioral mismatch that Unit 42 found in 80% of ClawHub skills. TrustSkills checks for permission scope violations by comparing a skill's declared capabilities against patterns of behavior detected in its code, tool definitions, and descriptor files.

Full definition →

Prompt injection

A class of attack where malicious input alters an AI model's behavior in ways the system designer did not intend. OWASP categorizes it as LLM01:2025 and considers it the most fundamental vulnerability in agentic AI systems. The attack is particularly dangerous when the model has tool access — a successfully injected instruction can cause the agent to exfiltrate data, call external services, or execute destructive operations using the agent's existing permissions.

Full definition →Deep dive →

R

Reverse shell

A technique where a compromised system initiates an outbound connection to an attacker-controlled server, giving the attacker interactive shell access to the compromised machine. In AI agent skill attacks, reverse shells are typically implemented as bash TCP connections, netcat (-e), Python socket+subprocess combinations, or PowerShell one-liners embedded in skill scripts. TrustSkills checks for reverse shell patterns as a critical-severity finding.

Full definition →

S

SOUL.md

A configuration file used by some AI agent frameworks to define the agent's core personality, values, and behavioral constraints. SOUL.md files can be modified by malicious skills to override the agent's safety instructions, inject persistent behavioral changes, or establish a foothold that persists across agent sessions. TrustSkills checks skill packages for SOUL.md instruction overrides as part of its prompt injection detection.

Full definition →

SSRF (Server-Side Request Forgery)

A vulnerability where an attacker causes a server to make requests to unintended internal or external resources. In the context of MCP servers, SSRF can allow an attacker to use the MCP server as a proxy to reach internal APIs, cloud metadata endpoints (such as AWS instance metadata at 169.254.169.254), or other services that are otherwise unreachable from the public internet. BlueRock Security found 36.7% of audited MCP servers vulnerable to SSRF.

Full definition →Deep dive →

Supply chain attack

An attack that compromises software or services upstream of the target organization, so the malicious payload is delivered through a trusted distribution channel. For AI agent skills, the supply chain attack vector is ClawHub: an attacker publishes a skill that appears legitimate and accumulates installs before the malicious payload is detected. The ClawHavoc campaign is the canonical example of an AI agent skill supply chain attack at scale.

Full definition →Deep dive →

System prompt

Instructions passed to an AI model before the user's message, typically used to define the model's persona, constraints, and task context. System prompts are often treated as authoritative, but they are not a security boundary — prompt injection can cause the model to override or ignore system prompt instructions. OWASP and Microsoft's security guidance both emphasize that security-critical constraints must be enforced in deterministic code, not in system prompt text.

Full definition →Deep dive →

T

Technical documentation (EU AI Act)

A compliance requirement under the EU AI Act for GPAI model providers and high-risk AI system providers to maintain documentation sufficient for regulators to assess compliance. For organizations deploying agentic AI, this creates a parallel requirement: you must be able to document what AI systems you use, what they are authorized to do, and what data they access. TrustSkills scan reports are designed to serve as part of this documentation record.

Full definition →Deep dive →

Tool poisoning

An attack where a malicious MCP server or skill defines tool names that shadow or intercept calls intended for legitimate tools. When an agent calls what it believes is a trusted tool, the malicious tool handler executes instead — potentially altering the action taken, exfiltrating inputs, or injecting instructions into the agent's context. Tool poisoning is a form of indirect prompt injection at the MCP layer.

Full definition →Deep dive →

Transparency obligation

A requirement under the EU AI Act that AI systems interacting with humans must disclose their AI nature at the moment of contact. The Commission's guidance specifies that disclosure must be proactive and prominent — not buried in terms of service or settings menus. For agentic systems acting on a user's behalf, transparency extends to disclosing what the AI is doing and on whose authority, not just that an AI is involved.

Related:EU AI ActGPAIOperator controls

Full definition →Deep dive →

Trust boundary

A point in a system architecture where trust assumptions change — where data or control moves from a trusted context to a less trusted one, or vice versa. In agentic AI deployments, key trust boundaries include the boundary between the system prompt and user input, between the agent's local environment and external services reached via MCP, between the agent's declared permissions and the actual permissions granted by the OS, and between content retrieved from the web and instructions the agent should execute.

Full definition →

Scan a skill before you install it

TrustSkills detects the threats described in this glossary — C2 callbacks, data exfiltration, prompt injection, obfuscated payloads — before you install a ClawHub skill. Free. No account required.

Run a free scan