Developer Insights

Master Prompts vs. Multi-Agent + MCP Pipelines

The AI tooling ecosystem in 2026 has a gravitational pull toward orchestrators, MCP servers, and multi-agent pipelines. For specific problems, that architecture is exactly right. For most production AI features, it's expensive infrastructure for a problem that doesn't need it.

If you're already running a pipeline and the token bills feel like just the cost of doing AI — it's worth seeing what the same output looks like with a single master prompt. Most teams have never made that comparison.

One distinction worth making: a CyWire master prompt isn't a system prompt you write in a text file. It's a structured JSON format built in the platform — six defined sections, typed variables with validation, schema-enforced output, and version tracking — designed to connect your inputs to an LLM and return reliable structured data from a single call.

What MCP and multi-agent pipelines actually do

MCP (Model Context Protocol) is an open standard — originally from Anthropic — for connecting AI models to external tools. It runs over JSON-RPC 2.0: an AI model calls a tool, the MCP server executes it and returns the result, the model reads that result and decides what to call next. Tools can be databases, APIs, file systems, search engines, or custom services.

A multi-agent pipeline layers orchestration on top: one AI model (the orchestrator) decides which sub-agents to invoke, those agents call their own tools via MCP, pass results back up, and the orchestrator assembles a final answer. It's powerful for genuinely open-ended tasks — but every layer costs tokens and adds a failure point.

Token cost is where this compounds. By default, every tool registered to an MCP server sends its full schema to the model on every inference call — name, description, parameter types, the works. Ten tools means ten schemas in context whether the model uses them or not. Experienced teams mitigate this with tool filtering: scoping which schemas get exposed per request based on task context. It genuinely works, but now you're maintaining that routing logic on top of the pipeline you already built. Without it, the overhead is real and measurable — especially once you're running multiple agents, each with their own tool sets.

Pipeline componentTypical token overhead per request
Orchestrator system prompt800–2,000 tokens
Each sub-agent system prompt500–1,500 tokens × however many agents
MCP tool schemas (default: all load per request)~200 tokens per tool — 10 tools = 2,000 tokens by default; reducible with tool filtering
Routing messages between agents300–800 tokens per hop, accumulating across calls
Total pipeline overhead6,000–20,000+ tokens before any actual task work

Estimates based on typical unoptimized production pipelines with 3 agents and 8–12 MCP tools. Tool filtering can reduce schema overhead; orchestrator and routing costs remain.

When multi-agent + MCP is genuinely the right choice

Pipelines earn their place. There are tasks where the overhead is worth every token:

  • Input too large for one context window: Even at 128K–200K token limits, large document corpora, codebases, or multi-source aggregations may not fit. Chunking across calls or agents is the right solution.

  • The AI must decide what step comes next: Conditional branching can be pre-defined — if X, call tool A, else call tool B. Genuine agency is needed only when the branch conditions themselves can't be anticipated at design time: the model encounters something unexpected and reasons its way to the next step.

  • Parallel processing at scale: Running 50 reports simultaneously, not sequentially. Concurrent agent calls are the right abstraction here.

  • Iterative refinement by design: A write → critique → revise loop where each pass reads the previous output. That's inherently multi-step.

If your task genuinely hits one of these, build the pipeline. The question is whether you've checked first.

What most production AI features actually look like

Customer risk reports. Compliance summaries. Course outlines. Supplier evaluations. Email personalizers. Product descriptions. The majority of AI features shipped in production follow the same pattern: structured input goes in, structured output comes out, and the shape of that output can be described as a JSON schema before the AI ever runs.

Those tasks don't require an orchestrator to decide what to do. They don't need MCP tools to fetch data — the app already has the data. The developer already knows what variables are needed and what the output should look like. That knowledge belongs in the prompt, not in an agent loop.

Quick test

If you can write a JSON schema for the output before the AI runs, that's a job for a master prompt — one call, no orchestration, schema-enforced result.

What's inside a CyWire master prompt

A CyWire master prompt downloads as a single JSON file. When your server loads it, everything needed to run the feature is already there:

customer-risk-report.json (structure)
{
  "full_prompt_text": "You are a financial risk analyst...\n\nClient: {account_name}\nIndustry: {industry}\nRisk score: {risk_score}\n---\n[task instructions, output format, quality guardrails, constraints]",

  "variables": {
    "account_name": { "type": "string", "required": true },
    "industry":     { "type": "enum",   "required": true, "options": ["finance","healthcare","retail",...] },
    "risk_score":   { "type": "number", "required": true, "validation": { "min": 0, "max": 100 } },
    "notes":        { "type": "string", "required": false }
  },

  "output_schema": {
    "type": "object",
    "properties": {
      "risk_level":          { "type": "string", "enum": ["low","medium","high","critical"] },
      "summary":             { "type": "string" },
      "risk_factors":        { "type": "array",  "items": { "type": "string" } },
      "recommended_actions": { "type": "array",  "items": { "type": "string" } }
    },
    "required": ["risk_level","summary","risk_factors","recommended_actions"]
  },

  "schema_strict": true,
  "metadata": { "version_number": 4 }
}

The full_prompt_text is a fully assembled system prompt — six structured sections (identity, industry context, task instructions, output format, quality and safety rules, constraints) joined together, with {variable} slots for runtime data. buildPrompt() runs entirely on your server: it validates required variables against the type definitions, fills the slots with your actual values, and catches any unresolved placeholders before anything reaches the AI.

What gets sent to the AI provider is the compiled prompt text (with variables substituted in), the output schema via output_config, and nothing else. The variable type definitions, validation rules, and version metadata stay local. The version number gets saved with the result in your own database — so every generated record stays traceable to the exact prompt that produced it.

Token cost: same task, two approaches

A complete master prompt JSON can run 50–70K characters — variable type definitions, validation rules, schema, examples, metadata. None of that reaches the LLM. What your AI provider actually receives is only the compiled full_prompt_text: buildPrompt() fills the {variable} slots with your runtime data and passes only that resolved text plus the output schema. The full_prompt_text alone is typically 14–20K characters — roughly 4,000–6,000 tokens of actual task content sent to the LLM. A multi-agent pipeline for the same task spends that much on infrastructure before any task content even loads:

Token spendMulti-agent + MCPMaster prompt
Routing infrastructure6,000–20,000+ tokens (orchestrator + agents + MCP schemas)None
System prompt sent to AI500–2,000 tokens per agent, repeated across passes~4,000–6,000 tokens — compiled prompt with your data, once
Output schemaPer-agent, often inconsistent200–500 tokens via output_config, schema-strict on every call
Approximate total8,000–25,000+ tokens4,200–6,500 tokens

The entire master prompt call — full feature spec, your runtime data, structured output — costs roughly what a pipeline's routing overhead alone costs before any actual work starts. And unlike pipeline context that accumulates across agent hops, this is a clean slate every time.

Latency, consistency, debugging — how else they differ

Multi-agent + MCPMaster prompt
LatencyMultiple sequential round trips — each with network + inference timeSingle inference call
Output consistencyShape depends on which agent path ran; schema enforcement varies by agentSchema enforced by the provider on every call — same shape every time
Prompt versioningAgent system prompts scattered, typically unversionedEvery result saves the prompt filename + version number — fully traceable
TestingHard to test across agent state and MCP tool responses spanning multiple callsBuilt-in test runner in CyWire — validate output shape and quality before you ship
DebuggingWhich agent failed? Which MCP tool call returned unexpected data? Which hop was malformed?One call, one input, one output — replay any failure from the saved input
Best forOpen-ended tasks, large-input parallelization, dynamic MCP tool selection, iterative refinementDefined input → structured output (the majority of what teams actually ship)

Where a single prompt isn't enough

A master prompt is single-pass by design. For most tasks that's exactly what you want. For a few, it's a real constraint:

  • Input exceeds context limits: Modern models support 128K–200K tokens, which covers most real-world inputs. But large codebases, long document collections, or multi-source aggregations can push past that. Chunking across multiple calls is the right move.
  • The task is genuinely open-ended: If the AI needs to evaluate intermediate results and decide what step comes next, that's agent territory. The line: if the developer can pre-define the steps, that's a prompt. If the AI has to figure them out, it's an agent.
  • True parallelization required: A master prompt runs one task at a time. Concurrent requests against the same file work fine — but if you need many simultaneous analyses with shared state, an agent framework handles that better.
  • Iterative refinement is the point: If the output of one AI call must feed into the next — a critique loop, a revision cycle — that's inherently multi-step. Two sequential master prompts (generate, then critique) is often simpler than a full pipeline, but it's still two calls.

Even when you need multiple AI calls, you don't always need a pipeline framework. Two sequential master prompts — generate, then critique — is often easier to test, debug, and maintain than building an equivalent multi-agent setup.

When you do build a pipeline: master prompts slot in

If your task genuinely requires an MCP server or agent pipeline, CyWire master prompts compose into it cleanly. The master prompt becomes the instruction layer for what a specific MCP tool does — tested prompt text, typed variables, and output schema drop directly into the tool handler. The pipeline handles connectivity and routing; the master prompt defines what the LLM is told to do and what structured data comes back.

The promptVersion returned from buildPrompt() should travel with every result. When a prompt is updated, you can trace exactly which version produced which output — even across a multi-step agent run.

mcp-tool-handler.ts
import { buildPrompt } from '../ai/prompt'
import { PROMPTS } from '../ai/prompt-registry'

// MCP tool: master prompt is the AI instruction layer
server.tool(
  'analyze-customer-risk',
  { account_name: z.string(), risk_score: z.number(), industry: z.string() },
  async (input) => {
    // buildPrompt validates variables, fills {slots}, catches anything missed
    const { promptText, promptVersion, outputSchema } =
      await buildPrompt(PROMPTS.riskReport, input)

    const res = await fetch('https://api.anthropic.com/v1/messages', {
      method: 'POST',
      headers: {
        'x-api-key': process.env.ANTHROPIC_API_KEY!,
        'anthropic-version': '2023-06-01',
        'content-type': 'application/json',
      },
      body: JSON.stringify({
        model: 'claude-sonnet-4-6',
        max_tokens: 16000,
        system: promptText,
        messages: [{ role: 'user', content: 'Return only the structured JSON result.' }],
        output_config: { format: { type: 'json_schema', schema: outputSchema } },
      }),
    })

    const data = await res.json()
    const result = JSON.parse(data.content[0].text)

    // promptVersion travels with the result — agent runs stay traceable
    return {
      content: [{ type: 'text', text: JSON.stringify({ promptVersion, result }) }],
    }
  }
)

The master prompt supplies the intelligence. The MCP tool supplies the connectivity. Changing the prompt doesn't require changing the tool definition — they're separate concerns. The example above uses Anthropic — the same pattern applies for OpenAI. See the Integration Guide for both provider implementations.

What to take away

Every MCP tool schema loads into context on every request — used or not. Three agents and ten tools, and you've already spent 6,000–20,000 tokens before your task starts. A CyWire master prompt sends the compiled system prompt with your data filled in, the output schema, and nothing else. That's 4,000–6,500 tokens total — often less than what the pipeline spends just setting up.

The decision rule is simple: if you can describe the output as a JSON schema before the AI runs, write a master prompt. One call, schema-enforced result, version number saved with every record. When the AI genuinely has to figure out what step comes next — dynamic tool selection, open-ended reasoning, input too large for a single context window — build the pipeline.

And when you do need a pipeline, master prompts slot directly into MCP tool handlers. The compiled prompt becomes the intelligence layer for what that tool does. The pipeline handles routing; the master prompt defines what your LLM receives and what structured data comes back. They're not competitors — one is infrastructure, the other is the feature.

See how it wires together

The integration guide has starter code for Claude and OpenAI — prompt loading, variable substitution, schema validation, and version tracing in a few hundred lines.

View the Integration Guide →