SK CREATION
System Architecture

Agentic Workflow System Design

A production-ready reference architecture for multi-agent applications. Features LangGraph stateful orchestration, conditional HITL routing, isolated tool execution, and unified safety constraints.

Web App

Mobile App

API Gateway & Auth

Authentication & tenant context

Input Guardrail

Safety, PII, scope & permission checks

LangGraph Supervisor

Reads graph state & routes intent

Shared Graph State

Session state, context & tool results

Knowledge Agent

Vector RAG & docs

RAG Pipeline
Query Rewrite
Retriever
Rank Chunks
Generate Answer
Citations

Task Execution

Action execution & writes

Tool Execution
Plan Action
Permission Check
MCP / API Call
Result Normalization

Account Manager

Billing & seat limits

State & Entitlements
Billing & Seats
Entitlements Check
Account State

Retention Agent

Churn risk mitigation

Escalation Route
Detect Churn Risk
Prepare Escalation

Output Validation & Guardrails

Policy, hallucination, citation & PII checks

Route by Source Agent & Risk Level
Direct Pathif Knowledge / Account Agent → passes Safety Level

User Response

Instant resolution via UI

Human Handoff Routeif Task Agent (MCP) → High Risk
or Retention Agent → Escalation

ServiceNow / Salesforce

Human queue & CRM logging

User Response

Guided handoff / CTA

Continuous System Tracking
Observability
Traces, Latency & Cost
Evaluation
Hallucination & Quality Checks
Auditability
Action & Tool Execution Logs

Why This Architecture Matters in Production

Each layer reduces a real failure mode in enterprise AI agent deployments.

Trigger Layer

Handles user requests from web, Slack, API, webhook, or scheduled jobs. Examples: Next.js, Slack App, API Gateway, Event Grid, Kafka.

Security Layer

Protects access, identity, tenant data, and risky actions. Examples: OAuth, RBAC/ABAC, Azure AD, Okta, policy engine, secrets manager.

Orchestration Layer

Controls routing, state transitions, retries, agent handoffs, and human approval. Examples: LangGraph, Semantic Kernel, AutoGen, custom state machine.

Retrieval Layer

Brings trusted business context into the agent before answering. Examples: vector DB, Azure AI Search, OpenSearch, Pinecone, Databricks, Confluence, Glean.

Tools Layer

Executes approved actions through controlled tool interfaces. Examples: MCP servers, REST APIs, internal services, ServiceNow, Salesforce, Jira, Datadog.

State & Memory Layer

Stores session state, graph checkpoints, tool results, user context, and approval status. Examples: LangGraph checkpointing, Redis, Postgres, Cosmos DB.

Observability Layer

Tracks logs, traces, latency, cost, failures, and tool-call history for debugging and audits. Examples: LangSmith, Datadog, Grafana, CloudWatch, Azure Monitor.

Evaluation Layer

Measures answer quality, tool accuracy, hallucination risk, and workflow drift. Examples: LangSmith evals, Databricks MLflow, RAGAS, custom test sets, offline grading.

System Design Document

Enterprise Multi-Agent AI Assistant

Comprehensive requirements, architecture logic, and operational metrics gathered for production deployment.

1Clarify the Problem

We are designing an enterprise AI assistant for a web and mobile experience. The assistant helps users ask questions, retrieve trusted business knowledge, understand account or subscription details, perform approved actions, and escalate complex issues to human support when needed.

The system should support both self-service answers and action-oriented workflows, while keeping security, accuracy, permission checks, and auditability in place.

Example User Requests

Why was my billing seat limit exceeded?
What does my subscription include?
Can you explain this policy?
Create a support case for this issue.
Update my account preference.
Cancel this subscription, but confirm before taking action.
Show me the status of my recent support case.

The Goal: Design a secure, scalable, reliable, observable, and production-ready multi-agent AI assistant using web/mobile entry points, API gateway, authentication, LangGraph-style orchestration, RAG, MCP/tool execution, guardrails, and human-in-the-loop escalation.

2Define Functional Requirements

Functional requirements describe what the system must do.

Accept user requests from web app, mobile app, or API.
Authenticate user and identify role, tenant, account, subscription, and session.
Validate request using input guardrails.
Detect unsafe prompts, prompt injection, unsupported scope, sensitive data, and unauthorized intent.
Classify the user's intent.
Route request to correct specialist agent.
Support knowledge-based Q&A using trusted documents and RAG.
Generate grounded answers with citations when using enterprise knowledge.
Support account, billing, subscription, entitlement, and seat-limit questions.
Execute approved actions through MCP tools, internal APIs, or enterprise services.
Check permissions before every read or write action.
Require user confirmation or human approval for risky write actions.
Escalate unresolved, risky, or sensitive cases to human support or CRM.
Return a final validated response to the user.
Store session state, tool results, approval status, traces, and audit logs.
Support feedback collection for quality improvement.
Support web and mobile responsive behavior.

3Define Non-Functional Requirements

Describing how well the system must work across performance, security, and scale.

3.1 Performance and Latency

For a web/mobile AI assistant, we separate normal UI performance from AI workflow performance.

Web & Mobile App Performance
AreaIdeal TargetAcceptable TargetNotes
Initial page/app shell load≤ 2.5 seconds LCP≤ 4 secondsAligns with Core Web Vitals loading guidance.
Server response / TTFB≤ 800 ms≤ 1.8 secondsGood backend responsiveness target.
UI tap/click feedback≤ 100 ms≤ 200 msUser should feel the UI reacted immediately.
Interaction responsiveness≤ 200 ms INP≤ 500 msGood Core Web Vitals responsiveness target.
Layout stabilityCLS ≤ 0.1CLS ≤ 0.25Avoid content jumping during load.
Cached/revisited page reload≤ 1 second≤ 2 secondsUse caching, CDN, and client-side hydration.
API read request300–800 ms≤ 1.5 secondsFor normal account/profile/status reads.
API write request≤ 1.5 seconds≤ 3 secondsFor preference updates or simple case creation.
AI Assistant Response Performance
Request TypeIdeal TargetAcceptable TargetUX Behavior
Simple greeting/help prompt≤ 1 second≤ 2 secondsReturn immediately.
Basic account/status answer2–4 seconds≤ 6 secondsShow loading state if needed.
RAG-based answer with citations4–7 seconds≤ 10 secondsStream partial response or show progress.
Tool/MCP action workflow5–9 seconds≤ 15 secondsShow step status: checking permission, calling tool, validating result.
High-risk action requiring approvalDepends on approvalNot fully automatedAsk user/human for confirmation.
Human escalationCase created ≤ 10s≤ 20 secondsShow ticket/case reference if available.
NoteFor UI performance, I would target Core Web Vitals: LCP under 2.5 seconds, INP under 200 milliseconds, CLS under 0.1, and TTFB under 800 milliseconds. For AI workflows, I would separate simple answers from tool-heavy workflows. A simple answer should return within 1–2 seconds, RAG answers should ideally complete in 4–7 seconds, and tool-based workflows can take 5–9 seconds with progress indicators and streaming.

3.2 Availability & Reliability

AreaTarget
Web/mobile frontend99.9% or higher
API/backend availability99.9% or higher
Critical account/action services99.9% or higher
AI model fallback pathRequired
Tool/API retry supportRequired
Graceful degradationRequired
  • If the AI model fails, retry or use a fallback model.
  • If RAG retrieval fails, return a safe fallback.
  • If API fails, retry with backoff. If it still fails, escalate.
  • If confidence is low, ask clarifying question.

3.3 Scalability

AreaRequirement
UsersSupport many concurrent web/mobile users.
Tenants/accountsIsolate data and scale per tenant.
RAG queriesScale vector/search independently.
Tool callsUse queueing, rate limits, and retries.
Traffic spikesUse autoscaling and CDN caching.

“I would scale the frontend through CDN/edge caching, scale the API layer horizontally, scale RAG independently, and isolate long-running tool workflows through queues/async workers.”

3.4 Security Requirements

  • Use authentication for every user.
  • Use RBAC/ABAC for authorization.
  • Enforce tenant isolation.
  • Permission checks before retrieval and tool execution.
  • Never expose secrets to the model.
  • Use a tool allowlist & detect prompt injection.
  • Sanitize inputs/outputs and encrypt data.
Risky Actions (Require Approval):

cancellation, refund, payment change, downgrade, entitlement change, ownership change, CRM escalation with sensitive data.

3.5 Accuracy & Grounding

  • Use RAG for policy, docs, and product answers.
  • Retrieve only documents the user can access.
  • Use reranking or top-chunk selection.
  • Provide citations and validate answers.
  • Detect hallucination risk and ask clarifying questions.
MetricTarget
Citation coverage≥ 95%
Unsupported answer rate< 2–5%
Tool-call success rate≥ 95% for stable tools
Human escalation accuracy≥ 90%

3.6 Observability

Track: Request ID, Intent, Agent route, Model/Retrieval/Tool latency, Cost, Errors, Feedback.

Tools: LangSmith, Datadog, CloudWatch, MLflow

3.7 Auditability & Compliance

Requirements: Log every tool call, write action, approval decision, user confirmation. Track citations. Support retention policies.

3.8 Maintainability

Modular architecture: Agents separated by responsibility, versioned prompts, registered tools, refreshable indexes.

3.9 Cost Efficiency

Use caching, route simple intents to smaller models, stream responses, track cost per workflow, set budget alerts.

4Users & Entry Points

Users
  • Customers & Employees
  • Support agents
  • Account managers
  • Admin / Ops teams
Entry Points
  • Web application
  • Mobile application
  • API Gateway
  • Webhook / Event Job

5High-Level Architecture

See the interactive visual diagram at the top of this page.

The main idea is that the system first validates the user and request, then uses a stateful orchestrator to route work to the right specialist agent. Knowledge requests go through RAG. Action requests go through MCP tools or internal APIs. Risky workflows require permission checks and human approval. The final response is validated before being returned, and all steps are logged.

6End-to-End Request Flow

Example Request Trigger

“Why was my billing seat limit exceeded, and can you create a support case?”

1.User sends a message through the web or mobile app.

2.Request reaches the API Gateway.

3.Auth layer validates user identity, role, tenant, and account context.

4.Input Guardrail checks prompt injection, PII risk, unsafe content, and permission boundaries.

5.LangGraph StateGraph Supervisor receives the request.

6.Supervisor reads shared graph state, previous conversation context, and session metadata.

7.Supervisor classifies the intent as both an account question and a support action.

8.Account Manager Agent checks billing, subscription, seat usage, entitlement, and account state.

9.Knowledge Agent retrieves billing rules or product policy using RAG if needed.

10.Task Execution Agent prepares a plan to create a support case.

11.System performs permission checks before case creation.

12.If allowed and low risk, the MCP Client calls the approved CRM or case-management tool.

13.If risky, the system asks for confirmation or routes to human approval.

14.Tool result is normalized and returned to the supervisor.

15.Output Validation layer checks final answer for citations, hallucination risk, policy compliance, PII, and action confirmation.

16.User receives a final answer with the reason for the seat-limit issue and the support case status.

17.Logs, traces, tool calls, state updates, and evaluation data are stored.

Requirement Summary

For this system, I would first gather functional requirements around web/mobile user input, authentication, intent routing, RAG-based answers, tool execution, permission checks, human approval, escalation, and final response validation.

For non-functional requirements, I would define clear targets: initial page load under 2.5s, TTFB under 800ms, interaction response under 200ms, simple AI answers in 1–2s, RAG answers in 4–7s, and tool workflows in 5–9s with progress indicators. I would also define availability, scalability, security, accuracy, observability, auditability, maintainability, and cost-efficiency requirements.