BlogAI

Securing AI applications: a checklist for Azure OpenAI deployments

Cloudsa Systems··
#azure-openai#ai-security#llm
AI and machine learning concept

Teams ship Azure OpenAI into production fast because the API is easy. The security usually lags the deployment. In a regulated environment that gap is a finding waiting to happen, and the threat model is broader than people assume.

Start with what you’re defending against. Data exfiltration, where sensitive context gets pulled out through crafted prompts. Model abuse, where the deployment is driven to do something it shouldn’t. Cost explosion, where a runaway agent or an abuser burns through tokens and budget. Output problems, where the model returns toxic or non-compliant content. And prompt injection, where untrusted input hijacks the model’s instructions. Each of these maps to a control.

The reason this needs its own checklist is that a language model deployment has a wider blast radius than a typical API. It often sits close to sensitive data because that’s what makes it useful, it can be steered by anyone who can influence its input, and its cost scales with usage in a way that turns abuse into a bill. The controls below close those gaps in roughly the order we apply them.

Here are the eight we apply to every Azure OpenAI deployment that handles anything sensitive.

1. Private endpoint on the resource

Put a private endpoint on the Azure OpenAI resource and disable public network access. Traffic stays on the Microsoft backbone and reaches the model over your VNet rather than a public endpoint anyone can hit. This removes the deployment from the public internet entirely and is the foundation everything else sits on.

2. Customer-managed keys for encryption

Use customer-managed keys (CMK) in Key Vault to encrypt the data the service handles at rest, rather than relying on platform-managed defaults. For regulated workloads this gives you control over the key lifecycle and a revocation path, and it’s frequently a hard requirement in the compliance frameworks these deployments live under.

3. Regional pinning

Know where the prompt and completion data physically goes, and pin it. Deploy the resource into the region your data residency obligations allow, and use the deployment types that keep processing in that region rather than routing elsewhere for capacity. For a business with sovereignty requirements, “the prompt left the region” is a compliance incident, so make the location deliberate.

4. Audit log retention to a SIEM

Send the resource’s diagnostic logs to a SIEM with at least a year of retention. You want a durable record of who called the model, when, with what, and what came back, well beyond the platform’s default retention. When an investigation asks what data went through the model six months ago, the answer has to exist.

5. Content filter configuration

Azure OpenAI ships content filters. Don’t leave them at the default; tune them to your policy. Set the severity thresholds for the categories that matter to your context, and configure the behaviour for borderline content deliberately. The built-in filtering is capable, but the defaults are generic and your acceptable-use line is specific to you.

6. Prompt injection defences at the application layer

The model can’t fully defend itself against injection, so the application has to help. Validate and constrain untrusted input before it reaches the prompt, separate system instructions from user content clearly, and treat any text that originates from outside your trust boundary, including retrieved documents, as potentially hostile. Injection is the attack that turns a helpful assistant into a confused deputy, and the defence lives in your code, not the model.

7. Output filtering

Filter what comes back before it reaches a user or a downstream system. Scan completions for PII, scrub sensitive data, and check for content that violates your policy. The model can surface data it was given in context or generate something non-compliant, and output filtering is the layer that catches it before it leaves your boundary.

8. Rate limiting and abuse monitoring

Put rate limits on usage, per user and per token budget, and monitor for abuse patterns. This caps the damage from a runaway agent looping on itself, a compromised credential, or a deliberate abuser, all of which show up first as a cost spike. Alert on anomalous consumption the way you would on any other billing anomaly, because with token-billed AI the runaway case gets expensive fast.

Set the limits in two places. The application layer enforces per-user and per-session budgets so one account can’t dominate the deployment. The Azure quota and a budget alert on the resource act as a backstop so a bug or an abuse run can’t quietly run up an unbounded bill before anyone notices. The two layers cover different failure modes: the first stops a single bad actor, the second caps the total exposure regardless of source.

AI security is cloud security, one layer deeper

None of these controls are exotic. Private networking, managed keys, regional control, durable logging, and rate limiting are the same disciplines you apply to any sensitive Azure workload. The AI-specific pieces, content filtering, injection defence, and output scrubbing, sit on top of that foundation rather than replacing it. AI security isn’t a separate practice. It’s your existing cloud security with a few additional layers for the parts a language model introduces.

If you’re putting Azure OpenAI into a regulated environment and want these controls built in from the start, our AI and ML team does this work alongside the cloud security that underpins it. Book a consultation and we’ll review your deployment against this checklist and tell you which layers you’re missing.