Prompt Injection in Production: Why Your AI Deployment Has an Attack Surface Your Policy Doesn't Cover

Prompt injection is not a theoretical concern. It is a class of attack that exploits the fundamental mechanism of large language models: the fact that they cannot reliably distinguish between developer instructions and malicious content embedded within seemingly legitimate inputs.

For CISOs thinking about AI governance, prompt injection sits in an uncomfortable space. It is not a traditional software vulnerability you can patch. It is not a misconfiguration you can fix in a cloud console. It is a behaviour that emerges from how the model works, and mitigating it requires governance controls alongside technical ones.

What Is Prompt Injection and Why Does It Affect Production AI Systems?

Prompt injection occurs when an attacker crafts an input that overrides or manipulates the original instructions given to an AI system. In a production deployment, this might mean embedding instructions in a document your AI assistant is asked to summarise, or crafting a user message that causes an AI agent to take actions its operator never intended.

Direct prompt injection targets the user-facing interface. Indirect prompt injection targets content the AI processes on your behalf: emails it reads, documents it summarises, web pages it retrieves. Indirect injection is harder to detect because the malicious instruction never appears in the direct conversation between user and system.

What Can an Attacker Actually Do With a Prompt Injection Attack?

The answer depends entirely on what your AI system has access to. An AI assistant that reads emails and can send replies is a different attack surface from a read-only summarisation tool.

LimitedView's analysis of AI deployments across enterprise environments shows that most organisations deploying AI agents with tool access have not mapped the full capability chain from injection to action. The question is not whether a prompt injection is possible. For any model in production today, some form of injection is feasible under the right conditions. The operational question is: if an injection succeeds, what can the model be made to do?

If the answer includes reading files, sending communications, calling internal APIs, or interacting with line-of-business systems, then prompt injection is a breach-class risk. It should be treated accordingly in your governance framework.

How Should AI Governance Frameworks Address Prompt Injection?

Governance frameworks need to address three things: capability restriction, monitoring, and incident response.

Capability restriction means applying the principle of least privilege to AI agents. An AI system that summarises contracts does not need write access to your document management system. An AI assistant that answers HR policy questions does not need access to payroll data. The architecture question of what access an AI agent requires is a governance question. It belongs in your AI control framework before deployment, not after an incident prompts the review.

Monitoring means treating AI system behaviour as a source of security telemetry. If an AI agent begins making unusual API calls, accessing data outside its normal operating pattern, or producing outputs inconsistent with its assigned task, that is an anomaly worth investigating. Most organisations currently have no detection capability for this. They know if a user does something unusual. They do not know if an AI agent does.

Incident response means having a defined process for when an AI system is suspected of having been manipulated. This includes who owns the decision to suspend the system, what evidence is preserved for investigation, and how affected users or data subjects are notified if the manipulation resulted in data exposure or unauthorised action.

Does Output Filtering Prevent Prompt Injection?

Output filtering helps with some scenarios. It does not solve the problem. Filtering checks what the model says. Prompt injection is often about what the model does. An AI agent that exfiltrates data to a third-party service via an injected instruction may produce no harmful visible output whatsoever.

The governance implication is practical. Output filtering is a useful control for preventing an AI system from producing harmful content. It is an incomplete control for preventing AI agents from taking harmful actions. Actions require architectural constraints.

What Governance Controls Actually Reduce Prompt Injection Risk?

Four controls matter most.

Capability scoping comes first. Define and enforce what each AI deployment can access and do. Document this explicitly. Review it before any capability expansion, not just before the initial deployment.

Input validation at the boundary between external content and AI processing comes second. Documents, emails, and web content that your AI system will process should be treated as untrusted input, the same way you treat user input in any other application. The discipline exists. Apply it here.

Human approval gates for high-consequence actions come third. Any AI agent that can send communications, modify records, or interact with external systems should require human approval for actions above a defined threshold. Where that threshold sits is a governance decision, not a technical one, and it should be made explicitly.

AI-specific incident response capability comes fourth. When a prompt injection incident occurs, you need people who understand what the AI was doing, what tools it had access to, and how to reconstruct the chain of actions that followed the injection. That requires a different skill set from traditional incident response. Develop it before you need it.

Prompt injection governance is not about making AI systems perfect. It is about knowing what your AI deployments can be made to do if they are manipulated, and ensuring that the answer is as constrained as your actual business requirements allow.

Prompt Injection in Production: Why Your AI Deployment Has an Attack Surface Your Policy Doesn't Cover

What Is Prompt Injection and Why Does It Affect Production AI Systems?

What Can an Attacker Actually Do With a Prompt Injection Attack?

How Should AI Governance Frameworks Address Prompt Injection?

Does Output Filtering Prevent Prompt Injection?

What Governance Controls Actually Reduce Prompt Injection Risk?

More Insights

Role-Based Security Training: What Behaviour Change Data Shows When You Stop Treating Everyone the Same

Third-Party Data Breach: What Your Organisation Owes When a Supplier Leaks Your Customer Data

Account Takeover Incidents: What Credential Stuffing Looks Like When It Hits Your Organisation

Ready to Move from 12% to 73%?