Security for LLM and agent systems

Home
AI
Agent, RAG, MCP & ML
Security for LLM and agent systems

Date: 09.02.2026

Introduction

Generative features inherit the same engineering bar as any internet-facing service—with extra attack surface from untrusted text in context, tools and MCP servers callable by the model, and retrieval that can pull in poisoned or private material. The checklist below matches the Security lens used on Production Agent-RAG Architectures: data ingest and index, API and edge, LLM and prompt safety, then privacy and operations.

Data ingestion: cleaning and sanitization

Everything you chunk, embed, or index is attacker-controlled surface once it can enter a prompt. Ingestion should assume messy sources, deliberate poisoning, and compliance constraints.

🔥 Trust and scope — define allowed sources; control who may upload or sync; require approval for untrusted corpora when risk warrants it.
🔥 Pipeline shape — extract → normalize → de-boilerplate → preserve structure → chunk → embed → index; keep failure reasons inspectable.
🔥 Uploads — allowlisted MIME types; hard max size; reject executables; malware scan; extract text in a sandbox with no execution of file payloads.
🔥 Metadata and access — ACL, tenant, and role on every chunk; enforce at query time; dedupe by content hash (update or skip re-uploads).
🔥 Failure handling — dead-letter queue with reason (parse errors, oversize, virus); never silently drop audit-critical rejects.
🔥 Storage — object storage with no execution from the bucket; short-TTL signed URLs when you must hand out direct reads.

AI API security

Model and orchestration endpoints are high-cost, high-blast-radius APIs: abuse for scraping, credential theft, and chained tool calls.

🔥 Transport and edge — TLS only; HSTS for browser surfaces; CDN or reverse proxy with DDoS and WAF (coarse rules, not a substitute for app validation).
🔥 AuthN and AuthZ — short-lived access JWTs with refresh rotation and revocation; mTLS or service identity for internal hops; tenant-scoped authorization on every request.
🔥 Headers and browser policy — set secure headers (XSS, CSRF, SSRF protections as applicable); CORS intentionally; CSRF strategy when you use cookie sessions.
🔥 Validation and abuse — schema-based bodies; global and per-tenant rate limits and quotas; timeouts on LLM and vector DB calls; max payload sizes.
🔥 API design — versioned paths; idempotency keys for mutating and ingest jobs.
🔥 Errors and logging — centralized handling; generic errors to clients; no secrets or raw stack traces in responses; redact tokens from logs; lock down OpenAPI in production if it leaks internals.
🔥 Orchestration and tools — allow-lists for MCP and HTTP tools; scoped credentials; confirmations on destructive, financial, or cross-tenant actions.

Input handling for AI workloads

Chat, uploads, and retrieved spans share one context unless you separate policy from data—where indirect prompt injection and tool abuse actually land.

🔥 Moderation — classify user input and optionally model output; block or clamp when above risk thresholds.
🔥 Tools — strict allowlist; no shell, arbitrary fetch, or code execution unless routed through a gated pipeline; stricter limits on agent and tool routes than on plain chat.
🔥 Prompting — clearly delimit user or document content; keep developer instructions apart from retrieved facts; treat every retrieved string as untrusted (indirect injection).
🔥 Policy and governance — system rules for scope and refusal; human review for high-risk corpora or admin-only uploads when needed.
🔥 RAG hygiene — cap and score context; tombstone or rebuild poisoned segments; never let a hit override system policy by wording alone.
🔥 Multimodal — sandbox parsers; treat extracted text as hostile; do not let filenames or EXIF steer behavior.
🔥 Downstream use — encode outputs when rendered as HTML; validate tool arguments before side effects.

Privacy & operations

🔥 Logging — PII classification; retention limits; redaction where required (GDPR-style).
🔥 Monitoring — latency, retrieval health, cost, failures (as in Monitoring / Analytics).

Conclusion

Ingest, API edge, prompt surface, and privacy ops are one system: weak ingestion poisons retrieval; weak edge control leaks keys; weak prompt boundaries let tools misfire; weak logging hides abuse. Revisit the checklist whenever you change models, corpora, or tool allowlists—and keep it aligned with how you ship Agent-RAG in production.