TrustSkills
Blogs
<- Back to blogs
Basic knowledge7 min read

What is prompt injection?

Prompt injection is not just a clever string. It is any input that changes a model's behavior in a way the system designer did not intend, especially when the model can reach tools, data, and accounts.

Why this matters

  • Prompt injection can come from the user directly or indirectly through emails, web pages, files, images, and other external content.
  • A strong system prompt helps, but it is not a security boundary.
  • The real danger appears when injected instructions reach tools with read, write, or execute permissions.
  • Defenders reduce risk by treating external content as unsafe, minimizing privileges, and requiring approval for high-risk actions.

The short definition

OWASP defines prompt injection as a vulnerability where prompts alter an LLM's behavior or output in unintended ways. That sounds abstract until you connect it to agents. In an agentic system, a malicious or malformed input can do more than change a sentence. It can influence which tool is called, what data is retrieved, and whether an action is taken.

That is why prompt injection should be treated as a control-plane problem, not just a content-quality problem. If the model can touch sensitive systems, then prompt injection becomes an access and authorization issue.

Direct versus indirect prompt injection

Direct prompt injection happens when the attacker puts the instruction into the message they send to the model. Indirect prompt injection happens when the attacker plants instructions in external content that the model later reads, such as a web page, email, document, issue comment, or image.

OWASP explicitly warns that indirect injections can be either intentional or accidental. In practice, that means the dangerous input might be a malicious page, but it might also be a normal business artifact that contains surprising text the agent interprets as instructions.

  • Direct example: a user tells the assistant to ignore previous rules and reveal hidden data.
  • Indirect example: a web page or email contains hidden instructions that cause the assistant to change behavior when it summarizes the content.
  • Multimodal example: an instruction is hidden in an image or other non-obvious format that the model can still parse.

Why system prompts are not enough

Microsoft's prompt-injection guidance for Semantic Kernel shows the core engineering lesson clearly: content inserted into prompts should be treated as unsafe by default and encoded unless the developer has a good reason to trust it. That is a far stronger pattern than hoping the model always remembers which text is authoritative.

OpenClaw's own security docs say something similar in operational language. Prompt guardrails reduce abuse risk, but hard enforcement comes from policy, sandboxing, authentication, allowlists, and downstream authorization. In other words, models can assist with security, but they cannot replace it.

How defenders reduce the blast radius

There is no magic prompt that permanently solves injection. The practical goal is to make injected instructions less likely to succeed and far less damaging when they do.

  • Treat all untrusted external content as unsafe by default and separate it from system instructions.
  • Validate outputs and tool arguments in deterministic code, not just in the model.
  • Grant the minimum privileges the workflow needs. Read-only beats read-write. A specific action beats open-ended shell or browser control.
  • Require a human to approve high-impact actions such as deletion, posting, payment, or credential changes.
  • Run adversarial tests against the exact agent workflow you plan to deploy, including emails, documents, links, and file uploads.

Trusted sources

OWASP

LLM01:2025 Prompt Injection

Open source

Primary source for the definition, direct versus indirect injection, and mitigation patterns.

Microsoft Learn

Protecting against Prompt Injection Attacks in Chat Prompts

Open source

Used for the engineering pattern of treating inserted content as unsafe by default and encoding it unless explicitly trusted.

OpenClaw Docs

Security

Open source

Used for the operational framing that access control, sandboxing, and policy must sit outside the model.

Continue reading

View all blogs