Multi-agent AI solution generator for a developer platform at scale
A developer platform with millions of monthly visitors needed to compress weeks of API evaluation into minutes. We built a nine-agent AI pipeline that takes a natural language description of a use case, retrieves live documentation through MCP-connected knowledge tools, and delivers a complete working prototype (production-ready code, interactive preview, and implementation guide) in under four minutes.
Metrics
<4 min
from prompt to working prototype, live preview, and implementation guide
2 wks → minutes
collapsed evaluation cycle for new developers
73%
reduction in token cost per request
70%
faster evaluation cycles after model tiering
The Problem
Developer platforms with broad API portfolios face a conversion problem: millions of visitors evaluate the platform each month, but only a fraction convert to active users.
The gap isn't interest, it's friction.
Developers spend weeks reading documentation, comparing APIs, and assembling proof-of-concept code before they can even determine whether the platform meets their needs.
Static documentation, sample code, and starter guides serve the patient developer. The modern developer expects instant, personalized answers. The journey from consideration to evaluation to integration used to take weeks.
The goal: Compress it to minutes.
Sector
API Platform
Scopes
- Multi agent architecture
- LLM performance engineering
- Google Cloud architecture
- MCP tool integration
Technologies
- ADK (Agent Development Kit)
- Model Context Protocol
- Terraform
- TypeScript
- Python
GCP Services
- Vertex AI Agent Engine
- Cloud Run
- Cloud Build
- Artifact Registry
- BigQuery
- Secret Manager
- Cloud Logging
Models
- Gemini 2.5 Pro
- Gemini 2.5 Flash
What We Built
A nine-agent AI system that handles the full developer journey from prompt to working prototype.
A developer describes their intent in natural language and the system orchestrates a pipeline of specialized agents: an analysis agent validates feasibility by querying live documentation through MCP tools, a styling pipeline generates and validates visual configuration assets, a code generation pipeline writes, evaluates, and refines production-ready code using extended reasoning capabilities, and a programmatic agent compiles an implementation guide with API documentation links, setup instructions, and pricing guidance. The entire pipeline streams results back to the user in real time, completing in under four minutes.
The output (working code, live preview, and implementation guide) is designed for developers to copy, test, and integrate immediately, reducing the evaluation phase from a multi-week research project to a single conversation.
Multi-Agent Orchestration
The core innovation is a nine-agent pipeline where each agent has a single responsibility, clear input/output contracts, and a dedicated model assignment optimized for its task.
Dig deeper: how each agent works
🤖 gemini-2.5-flash
The orchestrator. Manages session state, conversation routing, and dynamic instruction composition. It delegates to specialized sub-agents based on the user's request and injects context gathered from upstream analysis into downstream agents at runtime.
Design Decisions in Agent Orchestration
Why nine agents instead of a monolithic prompt
A single prompt attempting all tasks (analysis, styling, code generation, evaluation, documentation) would exceed context windows and produce unreliable outputs. Decomposition allows each agent to have focused instructions, specialized model selection, and independent iteration loops.
Why programmatic agents
The Schema Validator and Documentation Builder are implemented as programmatic agents (no LLM calls). Schema validation is deterministic, no model needed. The documentation builder detects APIs in the final code and maps them to documentation URLs from a canonical registry. This eliminates two potential points of hallucination and reduces token usage to zero for those pipeline stages.
Why separate evaluator and refiner
Combining evaluation and correction in one agent leads to confirmation bias: the agent "grades its own homework." Separating them creates genuine adversarial pressure: the evaluator applies strict rules and security scanning, the refiner responds to specific feedback and performs surgical code changes targeting only the flagged issues.
Why enforced output schemas
Every agent that produces code or configuration declares a structured output schema. The model is constrained to generate valid output matching the schema, eliminating freeform prose, reducing token waste from verbose responses, and guaranteeing that downstream agents receive parseable inputs without additional transformation or extraction steps. This also makes the pipeline testable: each agent's output can be validated independently against its schema contract.
How MCP Connects Agents to Knowledge
The Model Context Protocol gives agents a standardized interface to live API documentation. Three agents in the pipeline (Analysis, Code Generator, and Code Evaluator) use it to query an indexed knowledge base at runtime, with results capped per query to control context window growth. The MCP server runs as an isolated service in its own project, so documentation retrieval never competes with agent LLM calls for API quota. The knowledge base can be updated or expanded without touching agent code, which keeps the system responsive to API changes without redeployment.
But MCP alone is not enough. Tool calls are agent initiated, which means the model decides whether to invoke retrieval. A confident but wrong model skips the call and hallucinates plausible API patterns. To prevent this, canonical examples are injected directly into agent instructions at runtime rather than offered as an optional tool. The analysis phase identifies which APIs the developer needs, and only matching examples flow into downstream instructions. MCP handles open ended retrieval where the agent has discretion. Direct injection handles patterns that must be present unconditionally. The combination eliminates stale documentation and a major class of API hallucinations.
Iterative Performance Engineering
Building a nine-agent pipeline that performs reliably at scale required multiple development iterations. Each version addressed observed production behavior and introduced targeted refinements that maintained output quality while improving system stability.
Engineering Evolution: From Launch to Production Scale
The initial deployment ran the code generator, evaluator, and refiner on gemini-2.5-pro, the most capable model available. All canonical examples were embedded statically into every prompt, regardless of whether they were relevant to the user's request. The system worked, but at ~30,000 tokens per request and 15-20 Pro-tier calls per pipeline execution, cost and latency were unsustainable at scale.
Self-Healing Pipeline
A key differentiator of the system is its ability to detect, recover from, and prevent failures at multiple levels without surfacing errors to the end user. The self-healing behavior built into the pipeline (security auto-fix loops, graceful fallback to previous valid outputs) ensures that issues mid-pipeline don't become user-visible errors.
What Monogram Delivered
PRODUCTION · SERVING THOUSANDS OF DAILY USERS
What used to take weeks of manual evaluation now happens in a single conversation. The outcomes below reflect steady state production behavior.
- Sub-4-minute prototype generation including complete working code, live preview, and implementation guide, replacing a 2+ week manual evaluation process
- ~800-1,000 daily users served with consistent quality, achieved through five phases of iterative performance engineering
- 73% reduction in per-request token consumption through dynamic instruction injection, and ~70% faster evaluation cycles through model tiering, with no measured quality degradation
- Self-healing execution with security auto-fix and three-layer retry so users never encounter raw errors or unsafe code
- Nine-agent pipeline with clear separation of concerns, each agent independently testable, tunable, and backed by enforced output schemas
- Full infrastructure-as-code with Terraform-managed environments (dev/staging/prod), automated CI/CD gates, BigQuery telemetry, and structured observability
The models are capable. The orchestration frameworks are mature. The remaining challenge is the careful engineering work of making them perform reliably, efficiently, and safely. That's where applied AI studios deliver their value.