SDD in Practice: Quality Gates, Tests, and Templates

In the previous article I explained what Spec-Driven Development is and why it makes sense for teams working with agents. Now we get to the part that matters most: how to turn that into day-to-day execution.

Once the conversation leaves the conceptual layer and hits a real project, the same problems usually show up first: vague stories, ambiguous criteria, incomplete tests, shallow review, and commits moving forward without enough guarantees. That is exactly where SDD needs to become concrete.

In this post I will organize that practical layer into five parts: how to write mandatory requirements without ambiguity, what a story needs before implementation starts, what must be true before it can be considered done, which gates keep quality in place, and which templates make the process repeatable.

Before the Checklists: Writing Clear Mandatory Requirements

Before getting into DoR, DoD, and quality gates, it helps to fix one important foundation first: how mandatory requirements are written. If that layer starts vague, everything else degrades with it, including planning, implementation, review, and testing.

One simple way to handle that is EARS (Easy Approach to Requirements Syntax). It isn’t an AI framework, and it doesn’t replace a PRD, story, or ADR. It’s just a lean format for writing mandatory requirements so they stay clear, testable, and hard to misread.

That helps a lot when work passes through multiple agents or multiple people. If a requirement is vague, each person or agent fills the gaps in their own way. When it is written in this format, it becomes much easier to review, test, and automate validation around it.

The idea is simple: use the keyword SHALL to indicate mandatory requirements. This removes ambiguity and makes automated validation easier:

The system SHALL [mandatory action] WHEN [condition]

Practical examples:

“The system SHALL return 403 when an unauthorized user accesses a protected resource”
“The system SHALL validate user input before processing any request”
“The story SHALL have all ACs in Given/When/Then format before implementation starts”

When you read a requirement with SHALL, it’s clear: it is mandatory, it is testable, and there is very little room for interpretation. In the checklists below, I will use that pattern to make explicit what is non-negotiable.

Definition of Ready (DoR)

Before implementing any story, it needs to be ready. Sounds obvious, but this is where most problems with AI begin: the story arrives vague, without clear criteria, and the agent does whatever it thinks is best.

A story SHALL be considered ready when:

Verifiable ACs — each acceptance criterion SHALL be in Given/When/Then format with measurable outcomes
Sequenced tasks — tasks SHALL have identified dependencies and execution order
Target files identified — locations for changes SHALL be listed
Consistent specs — the story SHALL be aligned with PRD, architecture, and UX
Test types defined — each AC SHALL have its test type: unit, integration, E2E
Error scenarios identified — cases like not-found, forbidden, and invalid-input SHALL be mapped
Research completed — official docs SHALL have been consulted, not just agent memory
Security impact assessed — auth, authorization, and input validation SHALL be evaluated

The Architect agent validates the DoR before implementation begins. If it fails, fix the story and revalidate.

Definition of Done (DoD)

If DoR prevents a weak start, DoD prevents teams from stopping too early. In agent-assisted work that matters even more, because “it worked once” is not the same as “it is actually done”.

On the other side, a story SHALL be considered done when it passes through all these layers:

Code Quality

Adversarial code review SHALL have been completed, with all findings fixed
SOLID and Clean Code principles SHALL have been verified
Zero dead code, zero pending TODO/FIXME

Tests

All tests SHALL pass — full project suite, not just what changed
TDD per AC — each acceptance criterion SHALL have gone through the RED -> GREEN -> REFACTOR cycle
AC -> test mapping — each AC SHALL have at least 1 documented test
Error scenarios tested — not-found, forbidden, invalid-input
Data isolation — User A SHALL NOT be able to access User B’s data (when applicable)

Build & Types

Full build SHALL pass without errors
Type-check SHALL pass without errors (if the language supports it)
Lint SHALL pass without violations

UI (when applicable)

Browser validation: rendering, zero console errors, dark + light mode
Responsive: tested on mobile, tablet, and desktop
Accessibility: ARIA labels, keyboard navigation, contrast

As the Scrum.org folks say: “AI is probabilistic, run the same prompt 100 times and you might get 95 correct answers and 5 hallucinations. If your DoD relies solely on binary Pass/Fail checks, you aren’t testing your AI agents, you’re gambling with them.”

Notice how the EARS format makes each DoR and DoD item unambiguous: if it has SHALL, it’s mandatory and can be validated. Without SHALL, it’s a recommendation.

Quality Gates

With DoR and DoD defined, one piece is still missing: something that makes those rules real instead of aspirational.

This is probably the most important practice in all of SDD: having gates that must pass before each commit. The idea is to create verification layers that catch problems before they reach the repository.

The suggested order is:

flowchart TB REVIEW["1. Code Review"] --> TESTS["2. Tests"] TESTS --> BUILD["3. Build"] BUILD --> TYPES["4. Type-check"] TYPES --> COMMIT["5. Commit"] accTitle: Quality gate sequence accDescr: A sequence of delivery gates where code review leads to tests, tests lead to build verification, build leads to type-checking, and approved changes can then be committed.

#	Gate	What it catches
1	Code Review	Logic errors, architecture violations, security issues
2	Tests	Regressions, broken contracts
3	Build	Compilation errors, broken imports
4	Type-check	Type safety violations
5	Dependency audit	Vulnerabilities in dependencies

Most of these checks can be automated with git hooks (pre-commit, pre-push) and CI pipelines. The important thing is that they’re treated as non-negotiable, if a gate fails, the commit doesn’t happen.

Testing Strategy

Quality gates without a testing strategy quickly turn into bureaucracy. To make them useful, you need to know which kind of test is supposed to cover which kind of risk.

Which type of test to use?

flowchart TB A{"Is it pure business logic?"} A -->|Yes| UNIT["Unit test"] A -->|No| B{"Is it an API endpoint?"} B -->|Yes| INT["Integration test"] B -->|No| C{"Is it a UI flow?"} C -->|Yes| E2E["E2E"] C -->|No| D{"Is it a visual component?"} D -->|Yes| COMPONENT["Component test"] D -->|No| REVIEW["Review the case"] accTitle: Test type decision tree accDescr: A decision tree that routes business logic to unit tests, API endpoints to integration tests, UI flows to end-to-end tests, and visual components to component tests.

TDD per Acceptance Criterion

The flow for each AC is:

1. Write a failing test (RED)          — Test captures the AC behavior
2. Implement minimum code (GREEN)      — Make the test pass, nothing more
3. Refactor (REFACTOR)                 — Clean up without changing behavior
4. Repeat for the next AC

What to mock and what not to mock

This table helps avoid one of the most common pitfalls in AI-assisted projects:

Layer	Mock?	Why
Business logic / domain	NEVER	This is what you’re testing
Database (in integration tests)	NEVER	Use real DB with fixtures
External HTTP APIs	YES	Unreliable, slow, can cost money
File system / storage	YES	Side effects, cleanup
Time / dates	YES	Deterministic tests
Framework internals	NEVER	Trust the framework

Anti-pattern: Tests that only use mocks are dangerous. They test your mocks, not your code.

Bug Fix Protocol

Bug fixing is another place where teams benefit from having a stable protocol. If every fix enters the system differently, the codebase accumulates patches without memory.

Every bug fix follows this sequence:

Write a regression test that reproduces the bug
Fix the bug
Verify the test passes
Commit test + fix together

Code Review: Multi-Perspective

After tests and gates, you still need the layer that catches what scripts are bad at seeing alone: weak trade-offs, unnecessary coupling, architectural drift, and flawed reasoning.

Code review in SDD is not a quick glance. It’s an adversarial analysis from multiple perspectives:

Perspective	What it catches	Severity
Architecture	Module boundaries, coupling, missing abstractions	HIGH
Security	Injection, auth bypass, sensitive data leaks	CRITICAL
Logic	Edge cases, race conditions, off-by-one, null handling	HIGH
Performance	N+1 queries, missing indexes, unnecessary operations	MEDIUM
Style	Naming, function size, complexity, dead code	LOW
Tests	Missing coverage, wrong mock boundary, flaky tests	MEDIUM

The review has three possible verdicts:

Ready to Merge: Zero CRITICAL/HIGH findings
Needs Attention: Only MEDIUM findings
Needs Work: HIGH or CRITICAL findings, fix everything and re-review

Enforcement Layers

One useful aspect of SDD is that it does not rely on a single magic checkpoint. The verifications stack in layers, and each one catches a different class of problem:

flowchart TB L1["1. Editor"] L2["2. Pre-commit"] L3["3. Pre-push"] L4["4. CI"] L5["5. Code Review"] L6["6. Runtime"] L1 --> L2 --> L3 --> L4 --> L5 --> L6 accTitle: Enforcement layers for SDD accDescr: A linear chain of enforcement layers moving from the editor to pre-commit, pre-push, CI, code review, and runtime validation.

You don’t need to implement all of them at once. Start with layer 2 (pre-commit hooks) and work your way up.

Essential Templates

Once this process starts working, another issue appears: every story, ADR, or checklist gets written in a different shape. Good templates do not make the work rigid; they reduce friction and keep the process consistent.

Story Template

# Story [Epic]-[Story]: [Title]

## Story Statement
AS A [role], I WANT [action] SO THAT [value]

## Status: ready-for-dev | in-progress | review | done

## Acceptance Criteria

### AC-1: [Name]
- GIVEN [precondition]
- WHEN [action]
- THEN [expected result]
- **Test type:** unit | integration | E2E

### AC-2: [Name]
...

## Tasks
- [ ] Task 1: [Description] (AC-1)
- [ ] Task 2: [Description] (AC-2)
- [ ] Task 3: Write tests (AC-1, AC-2)
- [ ] Task 4: Code review
- [ ] Task 5: Update artifacts

ADR Template

## ADR-[ID]: [Title]

**Status:** Proposed | Accepted | Deprecated
**Date:** YYYY-MM-DD

### Context
[What is the problem? Why does a decision need to be made?]

### Options Considered
1. **Option A:** [Description] — Pros: [x, y]. Cons: [a, b].
2. **Option B:** [Description] — Pros: [x, y]. Cons: [a, b].

### Decision
[Which option was chosen and why]

### Consequences
- [What changes as a result]
- [What new constraints are introduced]

Checklist: Setting Up SDD on a New Project

Phase 0: Foundation

Repository initialized with Git and .gitignore
Project constitution (AGENTS.md, CLAUDE.md, or equivalent) created with overview, commands, and golden rules
Scoped rules structure created for testing, coding, and quality gates
Linter and formatter configured
Type checker configured (if the language supports it)
Test framework configured
Git hooks installed: pre-commit with lint + format + secret scan
CI pipeline created: tests + build + security scan

Phase 1-3: Analysis, Planning, and Solutioning

Product Brief created with vision, personas, metrics
PRD created with FRs in Given/When/Then format
Architecture documented with ADRs for key decisions
Stories created with ACs and tasks
Implementation Readiness validation

Ongoing

Decision-to-rule pipeline active: new patterns become rules
Bug-to-rule pipeline active: bug classes become prevention rules
Living specs maintained: artifacts updated with implementation learnings

Final thoughts

Implementing SDD doesn’t have to be all or nothing. If I could recommend where to start:

Create the project constitution (AGENTS.md, CLAUDE.md, or equivalent) with project overview, commands, and 5-10 golden rules
Set up basic quality gates: tests + build passing before each commit
Adopt Given/When/Then for your story acceptance criteria
Start documenting ADRs for important technical decisions

Over time, add the other layers: multi-perspective code review, progressive disclosure of rules, artifact sync, EARS format in requirements.

The most important thing is to understand that AI is a powerful tool, but without a clear contract of specs it produces plausible but wrong code. SDD is that contract.

Official BMAD Method repository.

Addy Osmani: How to Write a Good Spec for AI Agents.

Scrum.org: Definition of Done for AI Agents.

Before the Checklists: Writing Clear Mandatory Requirements#

Definition of Ready (DoR)#

Definition of Done (DoD)#

Code Quality#

Tests#

Build & Types#

UI (when applicable)#

Quality Gates#

Testing Strategy#

Which type of test to use?#

TDD per Acceptance Criterion#

What to mock and what not to mock#

Bug Fix Protocol#

Code Review: Multi-Perspective#

Enforcement Layers#

Essential Templates#

Story Template#

ADR Template#

Checklist: Setting Up SDD on a New Project#

Phase 0: Foundation#

Phase 1-3: Analysis, Planning, and Solutioning#

Ongoing#

Final thoughts#

Comments