In the previous article I explained what Spec-Driven Development is and why it makes sense for teams working with agents. Now we get to the part that matters most: how to turn that into day-to-day execution.

Once the conversation leaves the conceptual layer and hits a real project, the same problems usually show up first: vague stories, ambiguous criteria, incomplete tests, shallow review, and commits moving forward without enough guarantees. That is exactly where SDD needs to become concrete.

In this post I will organize that practical layer into five parts: how to write mandatory requirements without ambiguity, what a story needs before implementation starts, what must be true before it can be considered done, which gates keep quality in place, and which templates make the process repeatable.

Before the Checklists: Writing Clear Mandatory Requirements

Before getting into DoR, DoD, and quality gates, it helps to fix one important foundation first: how mandatory requirements are written. If that layer starts vague, everything else degrades with it, including planning, implementation, review, and testing.

One simple way to handle that is EARS (Easy Approach to Requirements Syntax). It isn’t an AI framework, and it doesn’t replace a PRD, story, or ADR. It’s just a lean format for writing mandatory requirements so they stay clear, testable, and hard to misread.

That helps a lot when work passes through multiple agents or multiple people. If a requirement is vague, each person or agent fills the gaps in their own way. When it is written in this format, it becomes much easier to review, test, and automate validation around it.

The idea is simple: use the keyword SHALL to indicate mandatory requirements. This removes ambiguity and makes automated validation easier:

The system SHALL [mandatory action] WHEN [condition]

Practical examples:

  • “The system SHALL return 403 when an unauthorized user accesses a protected resource”
  • “The system SHALL validate user input before processing any request”
  • “The story SHALL have all ACs in Given/When/Then format before implementation starts”

When you read a requirement with SHALL, it’s clear: it is mandatory, it is testable, and there is very little room for interpretation. In the checklists below, I will use that pattern to make explicit what is non-negotiable.

Definition of Ready (DoR)

Before implementing any story, it needs to be ready. Sounds obvious, but this is where most problems with AI begin: the story arrives vague, without clear criteria, and the agent does whatever it thinks is best.

A story SHALL be considered ready when:

  • Verifiable ACs — each acceptance criterion SHALL be in Given/When/Then format with measurable outcomes
  • Sequenced tasks — tasks SHALL have identified dependencies and execution order
  • Target files identified — locations for changes SHALL be listed
  • Consistent specs — the story SHALL be aligned with PRD, architecture, and UX
  • Test types defined — each AC SHALL have its test type: unit, integration, E2E
  • Error scenarios identified — cases like not-found, forbidden, and invalid-input SHALL be mapped
  • Research completed — official docs SHALL have been consulted, not just agent memory
  • Security impact assessed — auth, authorization, and input validation SHALL be evaluated

The Architect agent validates the DoR before implementation begins. If it fails, fix the story and revalidate.

Definition of Done (DoD)

If DoR prevents a weak start, DoD prevents teams from stopping too early. In agent-assisted work that matters even more, because “it worked once” is not the same as “it is actually done”.

On the other side, a story SHALL be considered done when it passes through all these layers:

Code Quality

  • Adversarial code review SHALL have been completed, with all findings fixed
  • SOLID and Clean Code principles SHALL have been verified
  • Zero dead code, zero pending TODO/FIXME

Tests

  • All tests SHALL pass — full project suite, not just what changed
  • TDD per AC — each acceptance criterion SHALL have gone through the RED -> GREEN -> REFACTOR cycle
  • AC -> test mapping — each AC SHALL have at least 1 documented test
  • Error scenarios tested — not-found, forbidden, invalid-input
  • Data isolation — User A SHALL NOT be able to access User B’s data (when applicable)

Build & Types

  • Full build SHALL pass without errors
  • Type-check SHALL pass without errors (if the language supports it)
  • Lint SHALL pass without violations

UI (when applicable)

  • Browser validation: rendering, zero console errors, dark + light mode
  • Responsive: tested on mobile, tablet, and desktop
  • Accessibility: ARIA labels, keyboard navigation, contrast

As the Scrum.org folks say: “AI is probabilistic, run the same prompt 100 times and you might get 95 correct answers and 5 hallucinations. If your DoD relies solely on binary Pass/Fail checks, you aren’t testing your AI agents, you’re gambling with them.”

Notice how the EARS format makes each DoR and DoD item unambiguous: if it has SHALL, it’s mandatory and can be validated. Without SHALL, it’s a recommendation.

Quality Gates

With DoR and DoD defined, one piece is still missing: something that makes those rules real instead of aspirational.

This is probably the most important practice in all of SDD: having gates that must pass before each commit. The idea is to create verification layers that catch problems before they reach the repository.

The suggested order is:

flowchart TB REVIEW["1. Code Review"] --> TESTS["2. Tests"] TESTS --> BUILD["3. Build"] BUILD --> TYPES["4. Type-check"] TYPES --> COMMIT["5. Commit"] accTitle: Quality gate sequence accDescr: A sequence of delivery gates where code review leads to tests, tests lead to build verification, build leads to type-checking, and approved changes can then be committed.
#GateWhat it catches
1Code ReviewLogic errors, architecture violations, security issues
2TestsRegressions, broken contracts
3BuildCompilation errors, broken imports
4Type-checkType safety violations
5Dependency auditVulnerabilities in dependencies

Most of these checks can be automated with git hooks (pre-commit, pre-push) and CI pipelines. The important thing is that they’re treated as non-negotiable, if a gate fails, the commit doesn’t happen.

Testing Strategy

Quality gates without a testing strategy quickly turn into bureaucracy. To make them useful, you need to know which kind of test is supposed to cover which kind of risk.

Which type of test to use?

flowchart TB A{"Is it pure business logic?"} A -->|Yes| UNIT["Unit test"] A -->|No| B{"Is it an API endpoint?"} B -->|Yes| INT["Integration test"] B -->|No| C{"Is it a UI flow?"} C -->|Yes| E2E["E2E"] C -->|No| D{"Is it a visual component?"} D -->|Yes| COMPONENT["Component test"] D -->|No| REVIEW["Review the case"] accTitle: Test type decision tree accDescr: A decision tree that routes business logic to unit tests, API endpoints to integration tests, UI flows to end-to-end tests, and visual components to component tests.

TDD per Acceptance Criterion

The flow for each AC is:

1. Write a failing test (RED)          — Test captures the AC behavior
2. Implement minimum code (GREEN)      — Make the test pass, nothing more
3. Refactor (REFACTOR)                 — Clean up without changing behavior
4. Repeat for the next AC

What to mock and what not to mock

This table helps avoid one of the most common pitfalls in AI-assisted projects:

LayerMock?Why
Business logic / domainNEVERThis is what you’re testing
Database (in integration tests)NEVERUse real DB with fixtures
External HTTP APIsYESUnreliable, slow, can cost money
File system / storageYESSide effects, cleanup
Time / datesYESDeterministic tests
Framework internalsNEVERTrust the framework

Anti-pattern: Tests that only use mocks are dangerous. They test your mocks, not your code.

Bug Fix Protocol

Bug fixing is another place where teams benefit from having a stable protocol. If every fix enters the system differently, the codebase accumulates patches without memory.

Every bug fix follows this sequence:

  1. Write a regression test that reproduces the bug
  2. Fix the bug
  3. Verify the test passes
  4. Commit test + fix together

Code Review: Multi-Perspective

After tests and gates, you still need the layer that catches what scripts are bad at seeing alone: weak trade-offs, unnecessary coupling, architectural drift, and flawed reasoning.

Code review in SDD is not a quick glance. It’s an adversarial analysis from multiple perspectives:

PerspectiveWhat it catchesSeverity
ArchitectureModule boundaries, coupling, missing abstractionsHIGH
SecurityInjection, auth bypass, sensitive data leaksCRITICAL
LogicEdge cases, race conditions, off-by-one, null handlingHIGH
PerformanceN+1 queries, missing indexes, unnecessary operationsMEDIUM
StyleNaming, function size, complexity, dead codeLOW
TestsMissing coverage, wrong mock boundary, flaky testsMEDIUM

The review has three possible verdicts:

  • Ready to Merge: Zero CRITICAL/HIGH findings
  • Needs Attention: Only MEDIUM findings
  • Needs Work: HIGH or CRITICAL findings, fix everything and re-review

Enforcement Layers

One useful aspect of SDD is that it does not rely on a single magic checkpoint. The verifications stack in layers, and each one catches a different class of problem:

flowchart TB L1["1. Editor"] L2["2. Pre-commit"] L3["3. Pre-push"] L4["4. CI"] L5["5. Code Review"] L6["6. Runtime"] L1 --> L2 --> L3 --> L4 --> L5 --> L6 accTitle: Enforcement layers for SDD accDescr: A linear chain of enforcement layers moving from the editor to pre-commit, pre-push, CI, code review, and runtime validation.

You don’t need to implement all of them at once. Start with layer 2 (pre-commit hooks) and work your way up.

Essential Templates

Once this process starts working, another issue appears: every story, ADR, or checklist gets written in a different shape. Good templates do not make the work rigid; they reduce friction and keep the process consistent.

Story Template

# Story [Epic]-[Story]: [Title]

## Story Statement
AS A [role], I WANT [action] SO THAT [value]

## Status: ready-for-dev | in-progress | review | done

## Acceptance Criteria

### AC-1: [Name]
- GIVEN [precondition]
- WHEN [action]
- THEN [expected result]
- **Test type:** unit | integration | E2E

### AC-2: [Name]
...

## Tasks
- [ ] Task 1: [Description] (AC-1)
- [ ] Task 2: [Description] (AC-2)
- [ ] Task 3: Write tests (AC-1, AC-2)
- [ ] Task 4: Code review
- [ ] Task 5: Update artifacts

ADR Template

## ADR-[ID]: [Title]

**Status:** Proposed | Accepted | Deprecated
**Date:** YYYY-MM-DD

### Context
[What is the problem? Why does a decision need to be made?]

### Options Considered
1. **Option A:** [Description] — Pros: [x, y]. Cons: [a, b].
2. **Option B:** [Description] — Pros: [x, y]. Cons: [a, b].

### Decision
[Which option was chosen and why]

### Consequences
- [What changes as a result]
- [What new constraints are introduced]

Checklist: Setting Up SDD on a New Project

Phase 0: Foundation

  • Repository initialized with Git and .gitignore
  • Project constitution (AGENTS.md, CLAUDE.md, or equivalent) created with overview, commands, and golden rules
  • Scoped rules structure created for testing, coding, and quality gates
  • Linter and formatter configured
  • Type checker configured (if the language supports it)
  • Test framework configured
  • Git hooks installed: pre-commit with lint + format + secret scan
  • CI pipeline created: tests + build + security scan

Phase 1-3: Analysis, Planning, and Solutioning

  • Product Brief created with vision, personas, metrics
  • PRD created with FRs in Given/When/Then format
  • Architecture documented with ADRs for key decisions
  • Stories created with ACs and tasks
  • Implementation Readiness validation

Ongoing

  • Decision-to-rule pipeline active: new patterns become rules
  • Bug-to-rule pipeline active: bug classes become prevention rules
  • Living specs maintained: artifacts updated with implementation learnings

Final thoughts

Implementing SDD doesn’t have to be all or nothing. If I could recommend where to start:

  1. Create the project constitution (AGENTS.md, CLAUDE.md, or equivalent) with project overview, commands, and 5-10 golden rules
  2. Set up basic quality gates: tests + build passing before each commit
  3. Adopt Given/When/Then for your story acceptance criteria
  4. Start documenting ADRs for important technical decisions

Over time, add the other layers: multi-perspective code review, progressive disclosure of rules, artifact sync, EARS format in requirements.

The most important thing is to understand that AI is a powerful tool, but without a clear contract of specs it produces plausible but wrong code. SDD is that contract.

Official BMAD Method repository.

Addy Osmani: How to Write a Good Spec for AI Agents.

Scrum.org: Definition of Done for AI Agents.