Multi-agent AI solution generator for a developer platform at scale

A developer platform with millions of monthly visitors needed to compress weeks of API evaluation into minutes. We built a nine-agent AI pipeline that takes a natural language description of a use case, retrieves live documentation through MCP-connected knowledge tools, and delivers a complete working prototype (production-ready code, interactive preview, and implementation guide) in under four minutes.

Metrics

<4 min

from prompt to working prototype, live preview, and implementation guide

2 wks → minutes

collapsed evaluation cycle for new developers

73%

reduction in token cost per request

70%

faster evaluation cycles after model tiering

The Problem

Developer platforms with broad API portfolios face a conversion problem: millions of visitors evaluate the platform each month, but only a fraction convert to active users.

The gap isn't interest, it's friction.

Developers spend weeks reading documentation, comparing APIs, and assembling proof-of-concept code before they can even determine whether the platform meets their needs.

Static documentation, sample code, and starter guides serve the patient developer. The modern developer expects instant, personalized answers. The journey from consideration to evaluation to integration used to take weeks.

The goal: Compress it to minutes.

Sector

API Platform

Scopes

Multi agent architecture
LLM performance engineering
Google Cloud architecture
MCP tool integration

Technologies

ADK (Agent Development Kit)
Model Context Protocol
Terraform
TypeScript
Python

GCP Services

Vertex AI Agent Engine
Cloud Run
Cloud Build
Artifact Registry
BigQuery
Secret Manager
Cloud Logging

Models

Gemini 2.5 Pro
Gemini 2.5 Flash

What We Built

A nine-agent AI system that handles the full developer journey from prompt to working prototype.

A developer describes their intent in natural language and the system orchestrates a pipeline of specialized agents: an analysis agent validates feasibility by querying live documentation through MCP tools, a styling pipeline generates and validates visual configuration assets, a code generation pipeline writes, evaluates, and refines production-ready code using extended reasoning capabilities, and a programmatic agent compiles an implementation guide with API documentation links, setup instructions, and pricing guidance. The entire pipeline streams results back to the user in real time, completing in under four minutes.

The output (working code, live preview, and implementation guide) is designed for developers to copy, test, and integrate immediately, reducing the evaluation phase from a multi-week research project to a single conversation.

Multi-Agent Orchestration

The core innovation is a nine-agent pipeline where each agent has a single responsibility, clear input/output contracts, and a dedicated model assignment optimized for its task.

Dig deeper: how each agent works

🤖 gemini-2.5-flash

The orchestrator. Manages session state, conversation routing, and dynamic instruction composition. It delegates to specialized sub-agents based on the user's request and injects context gathered from upstream analysis into downstream agents at runtime.

Design Decisions in Agent Orchestration

Why nine agents instead of a monolithic prompt
A single prompt attempting all tasks (analysis, styling, code generation, evaluation, documentation) would exceed context windows and produce unreliable outputs. Decomposition allows each agent to have focused instructions, specialized model selection, and independent iteration loops.

Why programmatic agents
The Schema Validator and Documentation Builder are implemented as programmatic agents (no LLM calls). Schema validation is deterministic, no model needed. The documentation builder detects APIs in the final code and maps them to documentation URLs from a canonical registry. This eliminates two potential points of hallucination and reduces token usage to zero for those pipeline stages.

Why separate evaluator and refiner
Combining evaluation and correction in one agent leads to confirmation bias: the agent "grades its own homework." Separating them creates genuine adversarial pressure: the evaluator applies strict rules and security scanning, the refiner responds to specific feedback and performs surgical code changes targeting only the flagged issues.

Why enforced output schemas
Every agent that produces code or configuration declares a structured output schema. The model is constrained to generate valid output matching the schema, eliminating freeform prose, reducing token waste from verbose responses, and guaranteeing that downstream agents receive parseable inputs without additional transformation or extraction steps. This also makes the pipeline testable: each agent's output can be validated independently against its schema contract.

How MCP Connects Agents to Knowledge

The Model Context Protocol gives agents a standardized interface to live API documentation. Three agents in the pipeline (Analysis, Code Generator, and Code Evaluator) use it to query an indexed knowledge base at runtime, with results capped per query to control context window growth. The MCP server runs as an isolated service in its own project, so documentation retrieval never competes with agent LLM calls for API quota. The knowledge base can be updated or expanded without touching agent code, which keeps the system responsive to API changes without redeployment.

But MCP alone is not enough. Tool calls are agent initiated, which means the model decides whether to invoke retrieval. A confident but wrong model skips the call and hallucinates plausible API patterns. To prevent this, canonical examples are injected directly into agent instructions at runtime rather than offered as an optional tool. The analysis phase identifies which APIs the developer needs, and only matching examples flow into downstream instructions. MCP handles open ended retrieval where the agent has discretion. Direct injection handles patterns that must be present unconditionally. The combination eliminates stale documentation and a major class of API hallucinations.

Iterative Performance Engineering

Building a nine-agent pipeline that performs reliably at scale required multiple development iterations. Each version addressed observed production behavior and introduced targeted refinements that maintained output quality while improving system stability.

Engineering Evolution: From Launch to Production Scale

The initial deployment ran the code generator, evaluator, and refiner on gemini-2.5-pro, the most capable model available. All canonical examples were embedded statically into every prompt, regardless of whether they were relevant to the user's request. The system worked, but at ~30,000 tokens per request and 15-20 Pro-tier calls per pipeline execution, cost and latency were unsustainable at scale.

Self-Healing Pipeline

A key differentiator of the system is its ability to detect, recover from, and prevent failures at multiple levels without surfacing errors to the end user. The self-healing behavior built into the pipeline (security auto-fix loops, graceful fallback to previous valid outputs) ensures that issues mid-pipeline don't become user-visible errors.

What Monogram Delivered

PRODUCTION · SERVING THOUSANDS OF DAILY USERS

What used to take weeks of manual evaluation now happens in a single conversation. The outcomes below reflect steady state production behavior.

Sub-4-minute prototype generation including complete working code, live preview, and implementation guide, replacing a 2+ week manual evaluation process
~800-1,000 daily users served with consistent quality, achieved through five phases of iterative performance engineering
73% reduction in per-request token consumption through dynamic instruction injection, and ~70% faster evaluation cycles through model tiering, with no measured quality degradation
Self-healing execution with security auto-fix and three-layer retry so users never encounter raw errors or unsafe code
Nine-agent pipeline with clear separation of concerns, each agent independently testable, tunable, and backed by enforced output schemas
Full infrastructure-as-code with Terraform-managed environments (dev/staging/prod), automated CI/CD gates, BigQuery telemetry, and structured observability

The models are capable. The orchestration frameworks are mature. The remaining challenge is the careful engineering work of making them perform reliably, efficiently, and safely. That's where applied AI studios deliver their value.

Metrics

The Problem

Sector

Scopes

Technologies

GCP Services

Models

What We Built

Multi-Agent Orchestration

01Root Agent

02Analysis Agent

03Style Generator + Schema Validator + Style Refiner

04Code Generator

05Code Evaluator + Code Refiner

Design Decisions in Agent Orchestration

How MCP Connects Agents to Knowledge

Iterative Performance Engineering

01Phase 1: Monolithic Pro (Launch)

02Phase 2: Dynamic Instruction Injection (~73% token reduction)

03Phase 3: Model Tiering (~70% eval latency reduction)

04Phase 4: Quota Resilience and Self-Healing

05Phase 5: Caching and Budget Controls

Self-Healing Pipeline

What Monogram Delivered