Defense-in-Depth for AI Agents: Why Input Security Isn't Enough

Most security approaches concentrate on input-layer defenses: prompt injection detection, input filtering, and request validation. This is only half of a complete security architecture.

The OWASP LLM Top 10 2025 identifies threats across two attack surfaces:

Input-layer threats:

LLM01: Prompt Injection - malicious instructions embedded in user inputs
LLM04: Data and Model Poisoning - compromised training or fine-tuning data
LLM07: System Prompt Leakage - exposure of internal instructions and configuration

Output-layer threats:

LLM02: Sensitive Information Disclosure - unauthorized data exposure in responses
LLM05: Improper Output Handling - SQL injection, XSS, or code execution from LLM-generated content
LLM06: Excessive Agency - unauthorized tool calls or actions beyond intended scope

Most organizations focus security resources on input validation, assuming it prevents all attacks. This fails when agents make autonomous decisions.

Even when input validation succeeds, agents still generate outputs - tool calls, database queries, API requests. Without output validation, there’s no mechanism to verify these actions are authorized.

The gap in input-only security

An agent receives a request that passes input validation but constructs a SQL query joining unauthorized tables. An agent processes a legitimate request but attempts to invoke restricted functions. An agent generates a response containing PII that should never be disclosed.

Input-layer defenses provide no protection because the vulnerability manifests in what the agent executes, not what it receives.

Why output validation is essential

Input monitoring detects attack attempts. Output monitoring detects unauthorized operations before execution.

For AI agents with access to tools, databases, or APIs, the output layer is where business logic attacks succeed. These attacks manipulate the agent’s reasoning to generate operations that violate authorization policies.

Output validation provides:

Authorization enforcement for tool calls
Query validation before database execution
Content filtering before data disclosure
Action approval workflows for high-risk operations

Implementation requirements

Contextual authorization - validating whether the agent is permitted to perform this specific action
Real-time monitoring - inspection before execution, not post-incident analysis
Business logic awareness - understanding legitimate vs unauthorized operations

Input validation prevents attacks from entering. Output validation prevents unauthorized execution.

Organizations deploying AI agents without output monitoring assume input controls are sufficient. This assumption fails predictably.

The question is whether you implement output validation before or after the first incident.