Trust boundaries between agents
Zero-trust principles apply strongly to multi-agent systems. Each agent should be treated as a distinct trust domain with its own perimeter, permissions, and monitoring. Compromise or malfunction in one agent must not automatically grant access to other agents, tools, or data sources.
Trust boundary types in multi-agent systems
| Boundary type | What it constrains | Implementation example |
|---|---|---|
| Network isolation | Which agents can communicate directly | Service mesh policies, network segmentation, or separate VPCs/subnets |
| Permission scope | What each agent is allowed to do | Least-privilege IAM roles, scoped API tokens, or RBAC policies |
| Data access | Which data sources an agent can query | Column-level security, row-level filters, or data product contracts |
| Action allowlist | Which tools or APIs an agent may invoke | Tool gateway that only permits pre-registered tool calls |
| Rate limits | How frequently an agent can act | Per-agent quotas, throttling, or circuit breakers |
Trust boundaries in a multi-agent workflow
Loading diagram...
Each agent is a distinct trust domain
In a multi-agent system, the orchestrator, specialist agents, and tool gateways each represent separate trust domains. Compromise at one layer should not automatically grant trust at another. Apply zero-trust principles: verify explicitly, use least privilege, and assume breach.
The Agentic Enterprise — Security and Governance Layer
Salesforce Architect guide covering the Security and Governance cross-layer in the 11-layer IT reference architecture, including LLM I/O guardrails, Zero Trust with AI verification, Agent Security Framework, Privacy-Preserving AI, and Policy-as-Code engine.
Read the Security and Governance sectionIdentity-aware RAG and security trimming
In an enterprise, an agent's access to information must be strictly governed by the permissions of the invoking user. Security Trimming ensures that RAG processes do not retrieve or "leak" information that a user is not authorized to see.
Identity-Aware RAG Execution
Loading diagram...
Least Privilege per Agent
AWS and Azure both emphasize that agents should not use "God-mode" service accounts. Instead, each specialized agent (e.g., Billing Agent) should be assigned a narrow IAM role or scoped token that only permits access to the specific data and tools required for its domain. This limits the blast radius if an agent is tricked into a malicious action.
Identity-aware RAG and security trimming
In an enterprise, an agent's access to information must be strictly governed by the permissions of the invoking user. Security Trimming ensures that RAG processes do not retrieve or "leak" information that a user is not authorized to see.
Identity-Aware RAG Execution
Loading diagram...
Least Privilege per Agent
AWS and Azure both emphasize that agents should not use "God-mode" service accounts. Instead, each specialized agent should be assigned a narrow IAM role or scoped token that only permits access to the specific data and tools required for its domain. This "Defense-in-Depth" approach ensures that even if one agent is compromised, the overall enterprise attack surface remains minimized.
Content and behavior guardrails
Guardrails are the primary mechanism for constraining agent behavior to stay within policy, safety, and operational boundaries. Unlike suggestions or prompts, guardrails are enforced policies that can block, modify, or route agent actions.
Guardrail types and their roles
| Guardrail type | What it catches | Implementation approach |
|---|---|---|
| Input filtering | Malicious, out-of-scope, or malformed user requests | Pre-processing validation, topic classification, or prompt injection detection |
| Output filtering | Harmful, inaccurate, or policy-violating responses before they reach the user | Post-processing classification, redaction, or refusal routing |
| Topic restriction | Attempts to steer the agent outside its defined scope | Intent recognition, scope allowlists, or off-topic detection |
| Action gating | Unauthorized tool calls or dangerous operations | Tool allowlists, parameter validation, or human approval gates |
| Tone/style enforcement | Unprofessional, inconsistent, or brand-inconsistent language | Style classifiers, template constraints, or rewrite guards |
Defense in depth matters. Apply guardrails at the model level (system prompts and tool definitions), at the tool level (API-side validation), and at the action level (workflow-side checks). A failure at one layer should be caught by another.
Guardrails are testable policies, not suggestions
Design guardrails as measurable, testable rules. Log guardrail triggers, measure false positive/negative rates, and treat guardrail violations as signals for policy refinement, not mere noise.
The Trust Layer Pattern
The Trust Layer pattern isolates security, data masking, and compliance checks from the core reasoning loop of the agent. This ensures that privacy and safety policies are consistently enforced regardless of the specific agent or underlying language model.
A well-architected Trust Layer typically includes:
- Secure Data Retrieval (Grounding): Enforcing data access policies so an agent only retrieves records the invoking user is permitted to see.
- Data Masking: Automatically detecting and masking Personally Identifiable Information (PII) or Payment Card Industry (PCI) data before the prompt is sent to the LLM.
- Prompt Defense: Scanning the input for injection attacks or jailbreak attempts.
- Toxicity & Safety Detection: Evaluating the model's output for bias, toxicity, or policy violations before returning it to the user.
- Zero Data Retention: Contractual and technical guarantees that the LLM provider will not retain enterprise data or use it for model training.
- Demasking: Restoring masked data to its original form securely before the final response is delivered.
Trust Layer Execution Flow
Loading diagram...
Standardized Interoperability (MCP)
As agentic systems scale, governing how agents access tools and data becomes a bottleneck. The Model Context Protocol (MCP) provides a standardized, secure architecture for connecting AI models to data sources and tools.
MCP establishes a client-server architecture where the agent (Client) connects to specialized MCP Servers. This creates an explicit trust boundary:
- Decoupled Permissions: The MCP Server enforces access control to the underlying data source, not the agent itself.
- Standardized Tool Definitions: Agents discover available tools dynamically through the protocol rather than hardcoded integrations.
- Isolated Execution: Tool execution happens within the MCP Server's secure environment, protecting enterprise systems from arbitrary agent code execution.
Model Context Protocol Boundary
Loading diagram...
Audit trails and responsible AI governance
Production agents need full decision traceability. When something goes wrong, investigators must be able to reconstruct what the agent considered, what tools it used, what policies applied, and what humans approved.
What to log for agent decision traceability
| Log category | What to capture | Why it matters |
|---|---|---|
| Agent decision | Final choice or action with confidence score | Shows what the agent decided and how certain it was |
| Reasoning chain | Steps, evidence retrieved, and intermediate conclusions | Enables reconstruction of the decision process |
| Tool inputs/outputs | Parameters sent to tools and responses received | Critical for understanding external system interactions |
| Human approvals | Who reviewed, when, and what they decided | Human review is an operating control, not an afterthought |
| State changes | Workflow objects modified or created | Required for rollback and reprocessing after incidents |
| Error events | Failures, timeouts, and guardrail violations | Pattern detection in errors signals systemic issues |
Responsible AI governance frameworks provide structure for these controls. The NIST AI Risk Management Framework (AI RMF) emphasizes govern, map, measure, and manage. ISO 42001 specifies requirements for AI management systems. The EU AI Act categorizes systems by risk level and imposes corresponding obligations.
Governance is an ongoing process, not a one-time checklist
Deploy with guardrails, but monitor continuously. Set up alerts for guardrail violations, review traces periodically, and reassess risk as models, usage patterns, and regulations evolve.
Cloud-native security & governance mapping
| Pattern / Control | AWS (Bedrock) | Azure (AI Foundry) | GCP (Vertex AI) |
|---|---|---|---|
| Input/Output Guardrails | Bedrock Guardrails | Azure AI Content Safety | Vertex Safety Filters |
| Identity & Access | IAM Roles & Policies | Microsoft Entra ID | IAM Conditions & Scopes |
| Audit & Traceability | CloudWatch & CloudTrail | Azure Monitor Logs | Cloud Logging & Monitoring |
Knowledge Check
Test your understanding with this quiz. You need to answer all questions correctly to mark this section as complete.