Spec-Driven Development: Turning AI Coding from Guesswork into Ground Truth

AI has changed how we build software. But for all the speed it gives us, there's still one problem that keeps showing up: the model can move fast, but it doesn't always know why a feature exists, what decisions were made, or how the implementation should evolve over time.

That's where spec-driven development comes in.

Instead of starting with code, spec-driven development starts with a written specification. The spec becomes the source of truth for the feature: what it should do, how it should behave, what edge cases matter, and how implementation decisions connect back to requirements. In our work, we've been using this approach to make AI-assisted development more reliable, more traceable, and easier to maintain.

The Problem with AI-Generated Code

Most AI coding workflows are still too ephemeral.

You open Cursor, Claude Code, or another agentic coding tool. You describe what you want. The model plans, writes code, maybe updates a few files, and then the plan disappears. What remains is the implementation — but not the full reasoning behind it.

That creates a few problems:

The model guesses when requirements are incomplete.
Important product decisions get buried in code.
Future changes lack context.
Bugs become harder to reason about because there's no durable record of intent.

In a traditional workflow, we might accept this because writing detailed specs, tests, and implementation plans takes time. But with AI, that tradeoff changes. The work that used to feel too time-consuming can now become part of the normal development loop.

What Is Spec-Driven Development?

Spec-driven development is simple in principle:

Every non-trivial feature gets a spec before code is written.

The spec lives in the codebase, usually as markdown. It is a living document that evolves alongside the feature, not a one-off planning artifact written once and abandoned. In our projects, specs are stored in a dedicated specs/ folder and become part of the same workflow as code, tests, and implementation tasks.

The point is to give both humans and AI agents a shared source of truth, not to produce documentation for its own sake.

A good spec answers:

What should this feature do?
What user interactions matter?
What requirements must be satisfied?
What technical decisions are required?
What tests prove the feature works?
Where should implementation details reference the original requirement?

The Four-Phase Loop

The workflow follows a four-phase loop:

Requirements
Design
Tasks
Implementation

Each phase requires human approval before moving to the next. That human-in-the-loop step matters. AI can generate the first draft, but humans still need to validate whether the requirements are correct, whether the technical design makes sense, and whether the implementation plan matches the product intent.

1. Requirements

The requirements phase turns a brief feature description into testable acceptance criteria.

For example, when implementing free-tier limits on a recent project, the requirements defined scenarios like what should happen when a free-plan user reaches a storage limit, attempts an upload, or upgrades to a paid plan. Each requirement received a numbered ID, such as 1.1 or 1.3, which could then be referenced later in tests and implementation code.

This is one of the most important parts of the workflow. If the requirements are vague, the implementation will be vague. If the requirements are clear, the AI has a much stronger foundation to work from.

2. Design

Once the requirements are approved, the next phase translates them into technical decisions.

This can include:

API shape
database design
data model changes
product behavior
architectural constraints

The design file exists to make implementation decisions explicit before code is generated. That way, the agent follows decisions that have already been reviewed instead of inventing architecture midstream.

3. Tasks

The tasks phase breaks the feature into individual implementation steps.

In our workflow, each task can map to a commit, creating a clean connection between the spec, the work performed, and the resulting code history. This part is still evolving. As features grow, tasks can drift or become harder to keep in sync, so we're still evaluating the best structure.

4. Implementation

Only after requirements, design, and tasks are approved does implementation begin.

The implementation phase is also mixed with test-driven development. The agent reads the spec, writes tests first, then implements the feature until the tests pass. Once the feature is complete, automated reviewers such as Gemini or Cursor Bug Bot can review the code before local testing.

This flips the usual AI coding flow. Instead of "ask, generate, fix," the process becomes "specify, approve, test, implement, verify."

Why Requirement IDs Matter

One of the most useful details in this workflow is the use of requirement IDs.

A requirement like 1.1 reaches well beyond the requirements file. It can be referenced in:

task lists
tests
implementation comments
code paths
future changes

That gives the codebase traceability. If someone sees a piece of logic and wants to understand why it exists, they can follow the requirement ID back to the spec. The code can point to the relevant requirement rather than carrying long comments that try to explain every decision inline.

This is especially valuable in complex product logic. The more edge cases a feature has, the more useful it becomes to separate intent from implementation.

Specs Should Ship with Code

The most important mindset shift is that specs ship with the code.

They are part of the deliverable, not a separate artifact left behind. That means a feature includes more than code changes:

requirements
design decisions
tasks
tests
implementation

Together, these form a durable record of the feature. Future developers — and future AI agents — can see what changed and why.

Keeping Specs Clean

A spec should describe the feature. It should not become a dumping ground for every coding preference or architectural convention.

For example, if an AI agent notices a TypeScript pattern or database convention, that probably does not belong in the feature spec. General engineering rules should live in separate repo-level instruction files, such as agent rules, skills, or an agents.md file. Feature-specific behavior belongs in the spec.

This separation of concerns keeps the system maintainable:

Specs define feature behavior.
Rules define general engineering standards.
Skills define repeatable agent workflows.
Templates keep spec files consistent.

Without this separation, specs can become noisy and harder for both humans and models to use.

The Role of Repo Knowledge

A major advantage of this approach is that the AI agent can use the repository itself as context.

A well-structured repo might include a .claude/ folder for agents, rules, hooks, and skills, along with a specs/ folder and templates for requirements, design, and tasks. Asked to create a new spec, an agent can find the specs folder, follow the existing workflow, and start asking clarifying questions.

That is the difference between a generic prompt and a systemized development environment.

The real prompt becomes the repo itself — its rules, files, conventions, templates, and previous decisions — rather than just the sentence the developer types into the chat.

A Concrete Example: Notifications

Take a feature like user notifications — letting people know about activity relevant to them, surfacing those alerts in the product, and perhaps delivering them over email.

Because a request like that is intentionally brief, a well-set-up agent does not jump directly into implementation. It begins asking clarifying questions. That is exactly what we want.

For a feature like notifications, there are important decisions to make:

Should notifications be delivered in real time or batched?
What events should trigger a notification?
Should notifications persist, or expire after a while?
Which channels matter — in-app, email, push?
How should repository rules guide those choices?

Working through those questions surfaces an important point: not every decision belongs in a feature spec. If we want to explore a broader architectural shift, such as introducing a message queue or a new data store to support delivery, that may require new repo-level rules or skills rather than a single feature spec.

Why This Matters for AI-Native Teams

AI coding tools are getting better quickly. But better models alone do not solve the core problem of software development: shared understanding.

Teams still need:

clear requirements
durable context
reviewable decisions
testable outcomes
maintainable systems

Spec-driven development gives AI agents a better operating environment. Instead of relying on one-off prompts, we can build a structured loop where the agent works from approved context and produces code that can be traced back to intent.

That is the real unlock.

Final Thought

Spec-driven development is not new. The ideas behind requirements, acceptance criteria, and test-driven workflows have existed for a long time. What is new is that AI makes these practices easier to adopt at speed.

The teams that get the most out of AI will be the ones that give it better context, tighter constraints, and stronger feedback loops, not the ones that simply ask models to write more code.

Spec first. Code second.