Design Patterns for Securing LLM Agents Against Prompt Injection Attacks
Large Language Model (LLM) agents are transforming how we interact with software systems, but they introduce critical security vulnerabilities. Among the most dangerous threats are prompt injection attacks, where malicious instructions embedded in content manipulate an agent’s behavior to perform unauthorized actions.
While securing general-purpose agents remains challenging, researchers have identified six design patterns that can make application-specific agents resistant to prompt injection attacks.
The Core Security Challenge
Prompt injection attacks occur when adversaries introduce malicious instructions into content processed by an LLM. These attacks can lead to:
- Data exfiltration and privilege escalation
- Remote code execution through tool access
- Manipulation of agent reasoning and outputs
- Denial of service through resource exhaustion
Traditional security frameworks struggle with these attacks because LLMs process natural language input, making it difficult to distinguish between legitimate instructions and malicious injections.
Six Security Design Patterns
1. Action-Selector Pattern
The agent acts as a translator between natural language requests and predefined actions. It selects from a fixed list of allowed operations without processing feedback from those actions.
Example: A customer service chatbot with predetermined responses like “retrieve order status” or “redirect to password settings.”
Security benefit: The LLM never directly processes untrusted data, making prompt injection impossible.
2. Plan-Then-Execute Pattern
The agent creates a fixed execution plan before processing any untrusted data. While tool outputs may contain malicious content, they cannot alter the predetermined sequence of actions.
Example: An email assistant that commits to specific tool calls (read calendar, compose email) before executing them.
Security benefit: Provides control flow integrity—malicious data cannot change what actions the agent takes.
3. LLM Map-Reduce Pattern
The system dispatches isolated sub-agents to process individual pieces of untrusted data (map operation), then aggregates results through either non-LLM operations or LLMs with strict output constraints (reduce operation).
Example: A file analysis system that processes each document separately, returning only boolean results about whether files contain invoices.
Security benefit: Malicious content in one document cannot affect processing of other documents.
4. Dual LLM Pattern
This pattern separates privileged and quarantined LLM instances:
- Privileged LLM: Receives instructions and can use tools, but never processes untrusted data
- Quarantined LLM: Processes untrusted data but cannot use tools or take actions
Results from quarantined LLMs are stored symbolically and dereferenced only by traditional software components.
Security benefit: Untrusted data never reaches the decision-making LLM.
5. Code-Then-Execute Pattern
The agent writes a formal program to solve tasks, which may call tools and spawn unprivileged LLMs to process untrusted content. This generalizes the plan-then-execute pattern by replacing planning with explicit programming.
Security benefit: Provides the strongest isolation between untrusted data and agent control flow.
6. Context-Minimization Pattern
The system removes user prompts from the LLM’s context after processing initial requests, preventing prompt injections from affecting subsequent operations.
Example: A customer service system that translates user requests into database queries, then removes the original prompt before returning results.
Security benefit: Eliminates user prompt injections from downstream processing.
Real-World Applications
The researchers demonstrate these patterns through ten case studies:
- OS Assistant: Uses dual LLM pattern to safely process file contents
- SQL Agent: Applies code-then-execute to prevent database manipulation
- Email Assistant: Combines plan-then-execute with user confirmation
- Customer Service: Implements action-selector with context minimization
- Medical Diagnosis: Uses context minimization to prevent manipulation of doctor responses
Implementation Trade-offs
Each pattern involves security-utility trade-offs:
High Security, Lower Utility: Action-selector and context-minimization patterns provide strong security but limit flexibility.
Balanced Approach: Plan-then-execute and map-reduce patterns maintain reasonable utility while providing meaningful security guarantees.
Complex but Powerful: Dual LLM and code-then-execute patterns offer the strongest security but require more sophisticated implementation.
Key Recommendations
- Prioritize application-specific agents over general-purpose ones when security matters
- Combine multiple patterns for robust defense—no single pattern addresses all threats
- Define clear trust boundaries between privileged and unprivileged components
- Implement traditional security practices like sandboxing and least privilege alongside these patterns
The Path Forward
While general-purpose agents that can solve arbitrary tasks remain vulnerable to prompt injection, these design patterns show that secure, useful AI agents are achievable today. The key is accepting intentional constraints on agent capabilities in exchange for security guarantees.
As LLM agents become more prevalent in critical applications, adopting these principled design patterns will be essential for safe deployment. The research provides a practical foundation for building AI systems that remain secure even when processing untrusted content.