From d73a53735c9d8234f74bc7e7f9fc29fbd8bbd66d Mon Sep 17 00:00:00 2001 From: "claude[bot]" <41898282+claude[bot]@users.noreply.github.com> Date: Thu, 19 Feb 2026 07:35:00 +0000 Subject: [PATCH] docs: add comprehensive MCP Support PRD Addresses #892 with a full Product Requirements Document covering: - Streamable HTTP MCP client for browser-based connections - MCP-to-AIX tool bridge architecture (schema conversion) - Tool execution loop integrated with ConversationHandler - Session management paired with Big-AGI conversation state - Internal loopback provider for Big-AGI capabilities (search, browse, etc.) - Search strategy (native vs MCP loopback fallback) - Settings UI design for server management - Human-in-the-loop tool approval flow - 4-phase implementation plan (MVP through advanced features) - Security considerations and CORS handling Based on community input from @jayrepo, @ligix, @dogmatic69, @darinkishore and research of MCP spec 2025-11-25. Co-authored-by: Enrico Ros --- kb/PRD-MCP-Support.md | 932 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 932 insertions(+) create mode 100644 kb/PRD-MCP-Support.md diff --git a/kb/PRD-MCP-Support.md b/kb/PRD-MCP-Support.md new file mode 100644 index 000000000..5bd82fc4e --- /dev/null +++ b/kb/PRD-MCP-Support.md @@ -0,0 +1,932 @@ +# PRD: MCP (Model Context Protocol) Support for Big-AGI + +**Status**: Draft +**Author**: Research synthesis from issue #892 +**Date**: 2026-02-19 +**Spec Version**: MCP 2025-11-25 +**Stakeholders**: @enricoros, community contributors (@jayrepo, @ligix, @dogmatic69, @darinkishore) + +--- + +## 1. Overview + +### 1.1 Problem Statement + +Big-AGI users need to connect AI conversations to external tools, data sources, and services in a standardized way. Currently, Big-AGI supports native tool/function calling through individual LLM providers (Anthropic, OpenAI, Gemini, etc.), but there is no unified protocol for discovering, configuring, and invoking external tools independent of the LLM vendor. + +MCP (Model Context Protocol) has become the industry standard for connecting AI applications to external capabilities, with adoption by Anthropic, OpenAI, Google DeepMind, and thousands of community-built servers. Big-AGI must support MCP to remain competitive and to unlock a rich ecosystem of external integrations. + +### 1.2 Goals + +1. **Enable web clients to connect to MCP servers via Streamable HTTP** (local and remote) +2. **Bridge MCP tools to any LLM provider** through Big-AGI's existing AIX framework +3. **Support MCP sessions** correctly paired with Big-AGI conversation state +4. **Expose internal Big-AGI capabilities** via an MCP-compatible loopback interface +5. **Provide a clean UX** for server management without requiring users to understand transport details +6. **Maintain Big-AGI's local-first architecture** without requiring server-side infrastructure for MCP + +### 1.3 Non-Goals (for MVP) + +- stdio transport support (deferred to future desktop application) +- Hosting MCP servers on Big-AGI's infrastructure +- MCP server implementation (Big-AGI as an MCP server) +- Full OAuth 2.1 authorization server +- Tasks (experimental spec feature, deferred) +- Sampling (server-initiated LLM requests through Big-AGI) + +--- + +## 2. Community Requirements Summary + +Based on input from issue #892 participants: + +| Contributor | Key Requirements | +|-------------|-----------------| +| **@jayrepo** | Streamable HTTP for web. Bridge MCP to tool calling. Works with different models. History management for tool calls/results is the hard part. | +| **@ligix** | Custom HTTP MCP servers for personal use. OpenAI, Gemini, Anthropic models. Export/transform data to external formats. | +| **@dogmatic69** | HTTP for interoperability. Not specific tools—general MCP support. Tool-use capable models required. | +| **@darinkishore** | HTTP only, skip stdio. MCP UI is P2. HTTP MCP support is the most impactful thing. Sequential thinking server as reference. | +| **@enricoros** | Direct browser-to-MCP connection. Sessions paired with chats. Internal loopback for Big-AGI capabilities. Search as potential MCP tool. | + +### Consensus + +- **Transport**: Streamable HTTP only for web (unanimous) +- **Approach**: MCP client in browser, bridge to existing tool calling (unanimous) +- **Models**: Must work with any tool-calling LLM, not just one vendor (unanimous) +- **Priority**: HTTP MCP connectivity first, UI polish second + +--- + +## 3. MCP Specification Alignment + +### 3.1 Spec Version Target + +Target: **MCP 2025-06-18** (stable) with forward-compatible design for **2025-11-25** features. + +The 2025-06-18 spec is the current stable release. The 2025-11-25 spec adds experimental features (Tasks, Extensions) that we design for but don't implement in MVP. + +### 3.2 Protocol Primitives + +MCP defines three server-side primitives. Big-AGI must handle all three: + +| Primitive | Description | Big-AGI Integration | MVP Priority | +|-----------|-------------|---------------------|-------------| +| **Tools** | Functions the AI model can invoke | Bridge to AIX tool definitions → pass to any LLM | **P0** | +| **Resources** | Contextual data (files, DB schemas) | Inject into conversation as context/attachments | **P1** | +| **Prompts** | Templated message sequences | Surface as user-selectable actions (slash commands) | **P2** | + +### 3.3 Client Capabilities to Implement + +| Capability | Description | MVP | Future | +|-----------|-------------|-----|--------| +| **Tool Discovery** | `tools/list` to enumerate available tools | Yes | - | +| **Tool Invocation** | `tools/call` to execute tools | Yes | - | +| **Tool Annotations** | `readOnlyHint`, `destructiveHint`, etc. | Yes (display) | Enforcement | +| **Resource Discovery** | `resources/list` to enumerate resources | Yes | - | +| **Resource Reading** | `resources/read` to fetch resource content | Yes | - | +| **Resource Templates** | URI templates (RFC 6570) | No | Yes | +| **Resource Subscriptions** | `resources/subscribe` for change notifications | No | Yes | +| **Prompt Discovery** | `prompts/list` to enumerate prompts | Yes | - | +| **Prompt Retrieval** | `prompts/get` to fetch prompt templates | Yes | - | +| **listChanged Notifications** | React to server-side changes in tools/resources/prompts | Yes | - | +| **Elicitation** | Server-initiated user input requests | No | Yes | +| **Sampling** | Server-initiated LLM requests | No | Yes | +| **Roots** | Filesystem boundaries | No | Desktop app | +| **Tasks** | Long-running operation tracking | No | Yes | +| **Completions** | Argument autocompletion | No | Yes | +| **Logging** | Structured log ingestion | No | Yes (debug) | + +### 3.4 Transport: Streamable HTTP + +Big-AGI implements MCP client over Streamable HTTP: + +``` +Browser (Big-AGI) ──HTTP POST/GET──> MCP Server (local or remote) + <──JSON/SSE──── +``` + +**Client Requirements:** +- Send JSON-RPC 2.0 messages as HTTP POST to single MCP endpoint +- Include `Accept: application/json, text/event-stream` header +- Include `MCP-Protocol-Version: 2025-06-18` header on all requests post-initialization +- Handle responses as `application/json` (single) or `text/event-stream` (streaming SSE) +- Track `Mcp-Session-Id` header from initialization response +- Include session ID on all subsequent requests +- Support GET for server-initiated SSE stream (notifications) +- Send DELETE to terminate sessions on cleanup + +**CORS Consideration:** MCP servers accessed from the browser must return appropriate CORS headers (`Access-Control-Allow-Origin`, `Access-Control-Allow-Headers`). If a server does not support CORS, Big-AGI may optionally proxy the connection through its server-side (Cloud Runtime), but the default path is direct browser-to-server connection. + +--- + +## 4. Architecture + +### 4.1 High-Level Architecture + +``` +┌─────────────────────────────────────────────────────────┐ +│ Big-AGI Browser Client │ +│ │ +│ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐ │ +│ │ MCP Manager │ │ AIX Client │ │ Chat/Beam │ │ +│ │ (new module) │ │ (existing) │ │ (existing) │ │ +│ └──────┬───────┘ └──────┬───────┘ └──────┬──────┘ │ +│ │ │ │ │ +│ ┌──────▼───────────────────▼───────────────────▼──────┐ │ +│ │ MCP-to-AIX Tool Bridge │ │ +│ │ (converts MCP tool schemas ↔ AIX tool definitions) │ │ +│ └──────┬───────────────────────────────────────┬──────┘ │ +│ │ │ │ +│ ┌──────▼────────┐ ┌──────▼──────┐ │ +│ │ MCP Client │ │ Loopback │ │ +│ │ (HTTP) │ │ Provider │ │ +│ └──────┬────────┘ └──────┬──────┘ │ +│ │ │ │ +└─────────┼────────────────────────────────────────┼────────┘ + │ Streamable HTTP │ Direct fn calls + ▼ ▼ + ┌───────────┐ ┌──────────────┐ + │ MCP Server │ │ Big-AGI │ + │ (external) │ │ Internal │ + │ │ │ Services │ + └───────────┘ └──────────────┘ +``` + +### 4.2 Module Structure + +New module: `/src/modules/mcp/` + +``` +src/modules/mcp/ +├── client/ +│ ├── mcp.client.ts # MCP protocol client (Streamable HTTP) +│ ├── mcp.client.transport.ts # HTTP transport layer (POST/GET/SSE/DELETE) +│ ├── mcp.client.session.ts # Session management (Mcp-Session-Id) +│ └── mcp.client.auth.ts # OAuth 2.1 + PKCE flows (future) +│ +├── bridge/ +│ ├── mcp-to-aix.tools.ts # Convert MCP tool schemas → AixWire_Tooling +│ ├── mcp-to-aix.resources.ts # Convert MCP resources → DMessage context +│ ├── mcp-to-aix.prompts.ts # Convert MCP prompts → UI actions +│ └── aix-to-mcp.invocation.ts # Convert AIX tool invocation → MCP tools/call +│ +├── loopback/ +│ ├── loopback.provider.ts # Internal MCP-like tool provider +│ ├── loopback.search.ts # Google Search as loopback tool +│ ├── loopback.browse.ts # Browse as loopback tool +│ └── loopback.registry.ts # Registry of internal tools +│ +├── state/ +│ ├── store-mcp-servers.ts # Persisted: configured MCP server list +│ ├── store-mcp-sessions.ts # Ephemeral: active sessions per conversation +│ └── mcp.types.ts # MCP-specific type definitions +│ +├── ui/ +│ ├── MCPSettingsPanel.tsx # Settings UI for server management +│ ├── MCPServerCard.tsx # Individual server configuration card +│ ├── MCPToolApproval.tsx # Human-in-the-loop tool approval dialog +│ └── MCPStatusIndicator.tsx # Connection status badge +│ +└── index.ts # Public API +``` + +### 4.3 Integration with Existing Architecture + +#### AIX Integration + +MCP tools are bridged to AIX's existing tool infrastructure: + +```typescript +// MCP Tool Definition (from tools/list) +{ + name: "github_create_issue", + description: "Create a new GitHub issue", + inputSchema: { + type: "object", + properties: { + repo: { type: "string", description: "Repository name" }, + title: { type: "string", description: "Issue title" }, + body: { type: "string", description: "Issue body" } + }, + required: ["repo", "title"] + } +} + +// → Converted to AixWire_Tooling.Tool_schema +{ + type: 'function_call', + function_call: { + name: 'github_create_issue', // MCP tool names use DNS-like format (e.g., server.tool) + description: 'Create a new GitHub issue', + input_schema: { + properties: { + repo: { type: 'string', description: 'Repository name' }, + title: { type: 'string', description: 'Issue title' }, + body: { type: 'string', description: 'Issue body' } + }, + required: ['repo', 'title'] + } + } +} +``` + +This conversion is straightforward because AIX already uses OpenAPI-compatible JSON Schema for tool definitions, which is what MCP uses. + +#### Tool Execution Loop + +When an LLM invokes an MCP tool: + +``` +1. LLM generates tool_invocation (via AIX streaming) +2. ContentReassembler creates DMessageToolInvocationPart +3. MCP Bridge identifies tool as MCP-sourced (by name prefix or registry lookup) +4. MCP Bridge sends tools/call to appropriate MCP server +5. MCP server executes and returns result +6. MCP Bridge creates DMessageToolResponsePart +7. Result appended to conversation as tool message +8. Conversation re-sent to LLM with tool response for continuation +``` + +This loop integrates with the existing `ConversationHandler` orchestration pattern. The key addition is step 3-6: intercepting tool invocations that target MCP servers. + +#### Conversation Store Integration + +MCP server connections and sessions are tracked per conversation: + +```typescript +// Per-conversation MCP state (ephemeral, in ConversationHandler overlay) +interface MCPConversationState { + /** Active MCP sessions for this conversation */ + activeSessions: Map; + + /** Tools enabled for this conversation (subset of all available) */ + enabledTools: Set; // tool fully-qualified names +} +``` + +### 4.4 Data Flow: Tool Discovery and Invocation + +``` +┌─ Discovery (on server connect or listChanged) ─────────────────────┐ +│ │ +│ MCP Server ─── tools/list ──> MCP Client ──> Tool Registry │ +│ <── tool[] ─── ──> (merged with AIX │ +│ tools for LLM) │ +└──────────────────────────────────────────────────────────────────────┘ + +┌─ Invocation (during chat generation) ──────────────────────────────┐ +│ │ +│ LLM ─── tool_use(name, args) ──> AIX ContentReassembler │ +│ │ │ │ +│ │ ┌─────────▼─────────┐ │ +│ │ │ Is MCP tool? │ │ +│ │ │ (registry lookup) │ │ +│ │ └──┬──────────┬──────┘ │ +│ │ Yes │ │ No (native tool) │ +│ │ ┌─────────▼──┐ ┌───▼──────────┐ │ +│ │ │ MCP Client │ │ Existing │ │ +│ │ │ tools/call │ │ tool handler │ │ +│ │ └─────────┬──┘ └───┬──────────┘ │ +│ │ │ │ │ +│ │ ┌─────────▼──────────▼──────┐ │ +│ │ │ DMessageToolResponsePart │ │ +│ │ └─────────┬──────────────────┘ │ +│ │ │ │ +│ │ ┌─────────▼──────────────────┐ │ +│ │ │ Re-send to LLM with result │ │ +│ LLM <─────────────│ (continuation) │ │ +│ └────────────────────────────┘ │ +└──────────────────────────────────────────────────────────────────────┘ +``` + +--- + +## 5. Session Management + +### 5.1 MCP Protocol Sessions + +Each MCP server connection has its own protocol session: + +- **Session ID**: Assigned by server in `Mcp-Session-Id` response header during `initialize` +- **Lifecycle**: Created on connect, destroyed on explicit close (DELETE) or server disconnect +- **Resumability**: On SSE reconnect, client sends `Last-Event-ID` for missed message replay + +### 5.2 Big-AGI Session Pairing + +MCP sessions are paired with Big-AGI conversation state: + +| Pairing Strategy | Description | When Used | +|-----------------|-------------|-----------| +| **Per-Conversation** | Each conversation gets its own MCP sessions | Default for interactive tools | +| **Global** | Single shared session across conversations | For resource-only servers (reference data) | +| **On-Demand** | Session created when tool first invoked | For infrequently used servers | + +The pairing is managed by the `MCPConversationState` stored in the per-conversation overlay (`PerChatOverlayStore`). + +### 5.3 Session Lifecycle + +``` +Conversation Created/Opened + │ + ├─ If MCP servers configured for this conversation: + │ ├─ initialize() each server (negotiate capabilities) + │ ├─ tools/list to discover tools + │ ├─ resources/list to discover resources (if supported) + │ └─ Store session IDs in MCPConversationState + │ + ├─ During conversation: + │ ├─ Tool invocations route through MCP client + │ ├─ Handle listChanged notifications (re-fetch tools/resources) + │ └─ Maintain SSE connection for server notifications + │ + └─ Conversation Closed/Switched: + ├─ Send DELETE to each active session (graceful cleanup) + └─ Clear MCPConversationState +``` + +### 5.4 Reconnection Strategy + +If an SSE stream disconnects: +1. Attempt reconnection with `Last-Event-ID` header +2. If 404 (session expired): re-initialize with full handshake +3. If server unreachable: mark server as disconnected, queue tool calls +4. Surface connection status in UI via `MCPStatusIndicator` + +--- + +## 6. Internal MCP Loopback + +### 6.1 Concept + +Big-AGI's internal capabilities can be exposed as MCP-compatible tools through a "loopback" provider. This allows: + +1. **Unified tool interface**: Internal and external tools use the same registration, invocation, and rendering patterns +2. **Scalable architecture**: New internal capabilities automatically become available as tools +3. **Future extensibility**: Internal tools could be exposed externally when Big-AGI ships as an MCP server +4. **Consistent UX**: Users manage all tools (internal and external) in one place + +### 6.2 Loopback Tool Registry + +The loopback provider registers internal capabilities as tool definitions: + +| Internal Capability | Loopback Tool Name | Description | Source Module | +|--------------------|--------------------|-------------|--------------| +| Google Search | `bigagi.search_google` | Web search via Google CSE | `/src/modules/google/` | +| Web Browse | `bigagi.browse_url` | Fetch and extract web page content | `/src/modules/browse/` | +| YouTube Transcript | `bigagi.youtube_transcript` | Extract video transcript | `/src/modules/youtube/` | +| Image Generation | `bigagi.generate_image` | Text-to-image generation | `/src/modules/t2i/` | +| Image Caption | `bigagi.caption_image` | Describe image content | `/src/modules/aifn/image-caption/` | + +### 6.3 Loopback vs Native Search + +Big-AGI currently supports native search for several providers: +- `vndGeminiGoogleSearch` (Gemini grounding) +- `vndOaiWebSearchContext` (OpenAI web search) +- `vndAntWebSearch` (Anthropic web search) +- `vndXaiWebSearch` / `vndXaiXSearch` (xAI search) +- `vndMoonshotWebSearch` (Moonshot) +- `vndPerplexitySearchMode` (Perplexity) +- `vndOrtWebSearch` (OpenRouter) + +**Strategy**: Native search is preferred when available (lower latency, provider-optimized). The loopback `bigagi.search_google` tool serves as: + +1. **Fallback**: For models/providers without native search +2. **Override**: User can explicitly enable Google Search tool even when native search is available, for consistent results across models +3. **Programmable**: External MCP tools or the model itself can invoke search as part of multi-step workflows + +The decision on native vs loopback search is made at the tool assembly stage: +- If model has native search AND user hasn't enabled loopback search → use native +- If model lacks native search OR user explicitly enables loopback search → include as tool + +### 6.4 Loopback Implementation Pattern + +Loopback tools bypass HTTP transport and call internal functions directly: + +```typescript +// Loopback provider implements the same interface as MCP client +class LoopbackProvider { + async listTools(): Promise { + return loopbackToolRegistry.getAll(); + } + + async callTool(name: string, args: Record): Promise { + switch (name) { + case 'bigagi.search_google': + const results = await callApiSearchGoogle(args.query as string, args.items as number); + return { content: [{ type: 'text', text: JSON.stringify(results) }] }; + + case 'bigagi.browse_url': + const page = await callBrowseFetchPageOrThrow(args.url as string, ['markdown']); + return { content: [{ type: 'text', text: page.content.markdown }] }; + + // ... other tools + } + } +} +``` + +--- + +## 7. User Experience + +### 7.1 Settings UI + +New "MCP Servers" section in Settings > Tools tab: + +``` +┌─ Settings ──────────────────────────────────────────┐ +│ [Chat] [Voice] [Draw] [Tools] [Extras] │ +│ │ +│ ┌─ MCP Servers ────────────────────────────────────┐ │ +│ │ │ │ +│ │ [+ Add Server] │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────┐ │ │ +│ │ │ 🟢 Sequential Thinking [Toggle] │ │ │ +│ │ │ smithery.ai/server-sequential-thinking │ │ │ +│ │ │ Tools: 1 Resources: 0 Status: Connected │ │ │ +│ │ │ [Configure] [Remove] │ │ │ +│ │ └─────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ │ ┌─────────────────────────────────────────────┐ │ │ +│ │ │ ⚪ My Custom Server [Toggle] │ │ │ +│ │ │ http://localhost:8080/mcp │ │ │ +│ │ │ Tools: 3 Resources: 2 Status: Disconnected │ │ │ +│ │ │ [Configure] [Remove] │ │ │ +│ │ └─────────────────────────────────────────────┘ │ │ +│ │ │ │ +│ └────────────────────────────────────────────────────┘ │ +│ │ +│ ┌─ Built-in Tools ─────────────────────────────────┐ │ +│ │ [x] Google Search (fallback for models w/o │ │ +│ │ native search) │ │ +│ │ [ ] Web Browse │ │ +│ │ [ ] YouTube Transcript │ │ +│ └────────────────────────────────────────────────────┘ │ +└──────────────────────────────────────────────────────┘ +``` + +### 7.2 Add Server Flow + +``` +[+ Add Server] + │ + ├─ Enter MCP Server URL: [https://example.com/mcp ] + │ + ├─ [Test Connection] + │ ├─ Success: Show server name, version, capabilities + │ └─ Failure: Show error (CORS, network, auth required) + │ + ├─ If auth required: + │ └─ [Authorize] → OAuth 2.1 popup flow + │ + └─ [Add Server] + └─ Server added to persistent store, tools discovered +``` + +### 7.3 Tool Approval (Human-in-the-Loop) + +When an LLM invokes an MCP tool, the user sees an approval prompt: + +``` +┌─ Tool Invocation ──────────────────────────────┐ +│ │ +│ github_create_issue wants to: │ +│ Create a new issue on repo "enricoros/big-AGI" │ +│ │ +│ Arguments: │ +│ ┌──────────────────────────────────────────┐ │ +│ │ repo: "enricoros/big-AGI" │ │ +│ │ title: "Fix login bug" │ │ +│ │ body: "The login form..." │ │ +│ └──────────────────────────────────────────┘ │ +│ │ +│ [ ] Always allow this tool (this conversation) │ +│ │ +│ [Approve] [Edit & Approve] [Deny] │ +│ │ +│ ℹ️ destructive: false, readOnly: false │ +└──────────────────────────────────────────────────┘ +``` + +Behavior based on tool annotations: +- `readOnlyHint: true` → Auto-approve (configurable) +- `destructiveHint: true` → Always require approval +- No annotations → Require approval (safe default) + +### 7.4 Chat UI Integration + +MCP tool invocations and responses are rendered using the existing `BlockPartToolInvocation` component, which already supports function call display. MCP tools appear identically to native tool calls with additional metadata: + +- Server name badge on tool invocation blocks +- Connection status indicator in chat toolbar +- Tool approval inline in conversation flow + +### 7.5 Composer Integration + +The composer shows available MCP tools alongside native capabilities: + +- MCP resources can be attached to messages (like attachments) +- MCP prompts appear as slash-command suggestions +- Active MCP servers shown as badges in composer toolbar + +--- + +## 8. Storage Design + +### 8.1 Persisted State: MCP Server Configuration + +New Zustand store with localStorage persistence: + +```typescript +// store-mcp-servers.ts +interface MCPServerConfig { + id: string; // UUID + label: string; // User-friendly name (from server or user) + url: string; // MCP endpoint URL + enabled: boolean; // Global toggle + + // Authentication + auth?: { + type: 'oauth2' | 'bearer' | 'api-key'; + // OAuth: stored tokens + accessToken?: string; + refreshToken?: string; + tokenExpiry?: number; + // API Key + apiKey?: string; + headerName?: string; // e.g., 'Authorization', 'X-API-Key' + }; + + // Cached server info (from last initialize) + serverInfo?: { + name: string; + version: string; + protocolVersion: string; + capabilities: MCPServerCapabilities; + }; + + // User preferences + autoApproveReadOnly: boolean; // Auto-approve readOnlyHint tools + enabledToolNames?: string[]; // Subset of tools to expose (null = all) + + // Metadata + addedAt: number; + lastConnectedAt?: number; +} + +interface MCPServersStore { + servers: MCPServerConfig[]; + + // Actions + addServer(config: Omit): string; + updateServer(id: string, updates: Partial): void; + removeServer(id: string): void; + toggleServer(id: string): void; +} +``` + +### 8.2 Ephemeral State: Active Sessions + +Per-conversation vanilla Zustand store (not persisted): + +```typescript +// Managed within PerChatOverlayStore or as separate slice +interface MCPSessionState { + /** Active connections keyed by server config ID */ + connections: Map; + + /** Tool approval state for this conversation */ + toolApprovals: Map; + + /** Pending tool invocations awaiting user approval */ + pendingApprovals: MCPPendingApproval[]; +} +``` + +### 8.3 Message Fragment Storage + +MCP tool invocations and responses use existing fragment types: + +```typescript +// No new fragment types needed - MCP tools map to existing: +DMessageToolInvocationPart // { pt: 'tool_invocation', id, invocation: { type: 'function_call', name, args } } +DMessageToolResponsePart // { pt: 'tool_response', id, response: { type: 'function_call', name, result }, environment: 'client' } +``` + +The `environment` field on tool responses distinguishes execution context: +- `'upstream'`: LLM provider executed (e.g., Gemini code execution) +- `'server'`: Big-AGI server executed +- `'client'`: Browser executed (MCP tools fall here since the browser is the MCP client) + +--- + +## 9. Tool Execution Orchestration + +### 9.1 Integration Point: ConversationHandler + +The tool execution loop integrates with `ConversationHandler`'s existing chat execution flow. When AIX streaming completes with `tokenStopReason: 'ok-tool_invocations'`: + +``` +1. AIX streaming completes with tool invocations +2. ConversationHandler detects tool_invocation fragments +3. For each invocation: + a. Check if tool is MCP-sourced (registry lookup by name) + b. If MCP: route to MCP client for execution + c. If native/loopback: route to internal handler + d. If approval required: show approval UI, await user decision +4. Collect all tool responses +5. Append tool response message to conversation +6. Re-invoke AIX with updated conversation (continuation) +7. Repeat until LLM stops invoking tools +``` + +### 9.2 Parallel Tool Calls + +When the LLM generates multiple tool invocations in a single response (parallel tool calling, supported by Anthropic, OpenAI, Gemini, Groq): + +- All MCP tool calls to the same server are sent sequentially (MCP spec allows but doesn't require parallel) +- Tool calls to different servers are sent in parallel +- Loopback tools execute in parallel +- All results collected before sending continuation to LLM + +### 9.3 Error Handling + +| Error Type | Handling | +|-----------|---------| +| MCP server unreachable | Mark tool response as error, include in conversation so LLM can adapt | +| Tool execution error (`isError: true`) | Display error in tool response block, include in conversation | +| User denies tool call | Return error response "User denied tool invocation", LLM receives this | +| Session expired | Re-initialize session transparently, retry tool call | +| CORS error | Surface in UI as server configuration issue | +| Timeout | Configurable timeout (default 30s), return timeout error to LLM | + +--- + +## 10. Security Considerations + +### 10.1 Authentication + +For MVP, support these auth methods: + +1. **No auth**: For local servers (localhost) +2. **Bearer token**: User provides API key/token in settings +3. **OAuth 2.1 + PKCE**: For remote servers requiring authorization (future phase) + +### 10.2 Tool Invocation Safety + +- **Human-in-the-loop by default**: All tool invocations require user approval unless: + - Tool has `readOnlyHint: true` AND user enabled auto-approve for read-only tools + - User selected "always allow" for that specific tool in that conversation +- **Destructive tool warning**: Tools with `destructiveHint: true` always show a warning +- **No credential passthrough**: Big-AGI never forwards LLM API keys to MCP servers + +### 10.3 Data Privacy + +- MCP server configurations (including auth tokens) stored in localStorage (same security model as LLM API keys) +- Tool invocation arguments and responses visible to the user in conversation +- No tool data transmitted to Big-AGI servers (client-side only) + +### 10.4 CORS and Origin Security + +- Browser enforces CORS on all MCP server requests +- MCP servers must return appropriate CORS headers +- Local servers should bind to `127.0.0.1` only +- Big-AGI does NOT set custom Origin headers (browser controls this) + +--- + +## 11. Phasing + +### Phase 1: MVP - HTTP MCP Client (P0) + +**Scope**: Connect to MCP servers, discover tools, bridge to LLM tool calling. + +**Deliverables**: +1. MCP Streamable HTTP client implementation +2. MCP-to-AIX tool bridge (schema conversion) +3. Tool execution loop in ConversationHandler +4. Settings UI for server management (add/remove/toggle) +5. Tool approval dialog +6. Connection status indicator +7. MCP session management (per-conversation) +8. Bearer token authentication + +**Success Criteria**: +- User can add an MCP server URL, discover its tools, and use them in chat +- Works with any tool-calling LLM (not vendor-specific) +- Tool invocations render correctly in conversation +- Multi-turn tool use loops work (LLM calls tool → gets result → continues) + +### Phase 2: Resources, Prompts & Loopback (P1) + +**Scope**: MCP resources as context, prompts as actions, internal tools via loopback. + +**Deliverables**: +1. Resource discovery and reading +2. Resource content injection into conversations (as attachments/context) +3. Prompt discovery and retrieval +4. Prompt integration with composer (slash commands) +5. Loopback provider with Google Search, Browse, YouTube tools +6. Built-in tools settings panel +7. Search override logic (loopback vs native) + +### Phase 3: OAuth, Elicitation & Polish (P2) + +**Scope**: Full OAuth flows, server-initiated UI, production polish. + +**Deliverables**: +1. OAuth 2.1 + PKCE authorization flow +2. Elicitation support (server-requested user input forms) +3. Resource subscriptions (change notifications) +4. Resource templates (parameterized URIs) +5. Tool completions (argument autocompletion) +6. Logging ingestion (debug panel) +7. Server-initiated notifications via GET SSE stream + +### Phase 4: Advanced Features (P3) + +**Scope**: Tasks, sampling, desktop app stdio. + +**Deliverables**: +1. Tasks support (long-running operations with polling) +2. Sampling support (server-initiated LLM requests) +3. stdio transport (desktop app only, requires Tauri/Electron) +4. Roots capability (filesystem boundaries for desktop) +5. MCP server marketplace/directory integration + +--- + +## 12. Technical Constraints and Risks + +### 12.1 Constraints + +| Constraint | Impact | Mitigation | +|-----------|--------|------------| +| **Browser-only** (no subprocess spawning) | No stdio transport | Streamable HTTP only; stdio deferred to desktop app | +| **Edge Runtime** (Big-AGI server is stateless) | Cannot proxy MCP connections server-side persistently | Direct browser-to-MCP connection; optional stateless proxy for CORS | +| **CORS** | MCP servers must support CORS for browser access | Document requirement; offer server-side proxy as fallback | +| **No SSE POST support in EventSource API** | Browser EventSource only supports GET | Use `fetch()` + `ReadableStream` for POST SSE responses | +| **localStorage token storage** | Same security as LLM API keys (acceptable for Big-AGI's threat model) | Document; suggest browser-native credential storage in future | + +### 12.2 Risks + +| Risk | Likelihood | Impact | Mitigation | +|------|-----------|--------|------------| +| MCP servers lacking CORS headers | High | Blocks direct connection | Server-side CORS proxy option; document server requirements | +| Tool execution loops (infinite) | Medium | Resource exhaustion | Max loop depth (configurable, default 10); user abort | +| MCP server version incompatibility | Low | Handshake failure | Version negotiation per spec; clear error messages | +| Tool name collisions (multiple servers) | Medium | Ambiguous invocation | Namespace tools by server: `servername.toolname` | +| Large tool lists overwhelming LLM context | Medium | Token waste, poor accuracy | Tool filtering per conversation; future: Anthropic Tool Search Tool | + +--- + +## 13. Dependencies + +### 13.1 External Dependencies + +| Dependency | Purpose | Status | +|-----------|---------|--------| +| `@modelcontextprotocol/sdk` | Official MCP TypeScript SDK | Available; evaluate browser compatibility | +| JSON-RPC 2.0 | Protocol encoding | Implement directly (simple) or use `jsonrpc-lite` | +| `EventSource` / `ReadableStream` | SSE handling | Browser native | + +**Note**: The official `@modelcontextprotocol/sdk` is designed for Node.js. For browser use, we may need to: +- Use the SDK's type definitions only +- Implement transport layer ourselves (fetch-based HTTP + SSE) +- Or find/create a browser-compatible fork + +### 13.2 Internal Dependencies + +| Component | Dependency Type | Changes Needed | +|-----------|----------------|----------------| +| AIX wire types | Extension | Add MCP tool source metadata to tool definitions | +| ContentReassembler | No change | Already handles tool invocation/response particles | +| ConversationHandler | Extension | Add MCP tool execution loop after AIX streaming | +| PerChatOverlayStore | Extension | Add MCPSessionState slice | +| Settings Modal | Extension | Add MCP Servers section to Tools tab | +| Composer | Extension | Show MCP tool/resource availability | + +--- + +## 14. Success Metrics + +| Metric | Phase 1 Target | Phase 2 Target | +|--------|---------------|---------------| +| MCP servers configurable | Yes | Yes | +| Tools discoverable and usable | Yes | Yes | +| Works with Anthropic, OpenAI, Gemini | Yes | Yes | +| Tool approval flow | Yes | Yes | +| Resources as context | No | Yes | +| Loopback tools | No | Yes | +| OAuth support | No | Phase 3 | +| Connection reliability (reconnect) | Basic | Full | +| User-reported setup friction | < 2 min to add server | < 30s to add server | + +--- + +## 15. Open Questions + +1. **Tool name namespacing**: Should MCP tools be prefixed with server identifier to avoid collisions? (e.g., `myserver.create_issue` vs `create_issue`). The MCP spec recommends DNS-like naming but doesn't enforce it. + +2. **Proxy vs direct**: Should Big-AGI offer a server-side CORS proxy for MCP servers that don't support browser CORS? This adds server-side complexity but improves compatibility. + +3. **Tool search/filtering**: With many MCP servers, how should tools be filtered for each conversation? Options: per-conversation toggle, global enable/disable, automatic relevance filtering. + +4. **Conversation-scoped vs global sessions**: Should MCP sessions be per-conversation or shared? Per-conversation is cleaner but creates more connections; shared is more efficient but complicates state. + +5. **Loopback exposure**: Should internal loopback tools be visible to users in the MCP tools list, or hidden as an implementation detail? Making them visible lets users toggle them; hiding them reduces complexity. + +6. **Browser SDK**: The official `@modelcontextprotocol/sdk` targets Node.js. Should we maintain a browser-compatible fork, implement from scratch, or contribute browser support upstream? + +--- + +## Appendices + +### A. MCP Protocol Quick Reference + +``` +Client → Server: + initialize Handshake with capabilities + tools/list Discover available tools + tools/call Invoke a tool + resources/list Discover available resources + resources/read Read a resource + resources/subscribe Subscribe to resource changes + prompts/list Discover available prompts + prompts/get Retrieve a prompt template + completion/complete Request argument completion + logging/setLevel Set minimum log level + ping Keepalive + +Server → Client: + sampling/createMessage Request LLM completion (requires sampling capability) + elicitation/create Request user input (requires elicitation capability) + roots/list Request filesystem roots (requires roots capability) + +Notifications (either direction): + notifications/initialized + notifications/cancelled + notifications/progress + notifications/message (server → client, logging) + notifications/tools/list_changed (server → client) + notifications/resources/list_changed + notifications/resources/updated + notifications/prompts/list_changed + notifications/roots/list_changed (client → server) +``` + +### B. Integration with Existing Native Tool Features + +| Feature | Current Implementation | MCP Equivalent | Coexistence Strategy | +|---------|----------------------|----------------|---------------------| +| Gemini Google Search | `vndGeminiGoogleSearch` in model params | `bigagi.search_google` loopback | Native preferred; loopback as fallback | +| OpenAI Web Search | `vndOaiWebSearchContext` in model params | External search MCP server | Native preferred; MCP as alternative | +| Anthropic Web Search | `vndAntWebSearch` in model params | External search MCP server | Native preferred; MCP as alternative | +| Gemini Code Execution | `vndGeminiCodeExecution` via AIX | Code execution MCP server | Native preferred | +| OpenAI Code Interpreter | `vndOaiCodeInterpreter` via AIX | Code execution MCP server | Native preferred | +| Anthropic Tool Search | `vndAntToolSearch` in model params | Could discover MCP tools | Complementary—Tool Search can discover MCP tools | +| Google Custom Search | `search.router.ts` tRPC | `bigagi.search_google` loopback | Loopback wraps existing implementation | +| Browse | `browse.router.ts` tRPC | `bigagi.browse_url` loopback | Loopback wraps existing implementation | + +### C. Reference MCP Servers for Testing + +| Server | URL | Transport | Auth | Purpose | +|--------|-----|-----------|------|---------| +| Sequential Thinking | smithery.ai | HTTP | None | Reasoning enhancement | +| Fetch | Community | HTTP | None | URL content fetching | +| GitHub | Community | HTTP | OAuth/Token | Repository management | +| Brave Search | Community | HTTP | API Key | Web search | +| Memory | Community | HTTP | None | Knowledge graph persistence | + +### D. Key File References + +| File | Purpose | +|------|---------| +| `src/modules/aix/server/api/aix.wiretypes.ts` | AIX tool schemas (lines 312-410) | +| `src/modules/aix/client/ContentReassembler.ts` | Tool particle handling (lines 393-430) | +| `src/modules/aix/client/aix.client.fromSimpleFunction.ts` | Tool definition helpers | +| `src/common/stores/chat/chat.fragments.ts` | DMessageToolInvocationPart/ResponsePart (lines 196-231) | +| `src/apps/chat/components/message/fragments-content/BlockPartToolInvocation.tsx` | Tool UI rendering | +| `src/common/chat-overlay/ConversationHandler.ts` | Chat orchestration | +| `src/common/chat-overlay/store-perchat_vanilla.ts` | Per-conversation state | +| `src/modules/google/search.router.ts` | Google Search (loopback candidate) | +| `src/modules/browse/browse.router.ts` | Browse (loopback candidate) | +| `src/modules/youtube/youtube.router.ts` | YouTube (loopback candidate) | +| `src/apps/settings-modal/SettingsModal.tsx` | Settings modal structure | +| `kb/systems/client-side-fetch.md` | CSF pattern (relevant for direct browser connections) |