Spec-Driven Stack 2026: Big 5 + GS
Map of the spec-driven dev methodology crystallizing across the OSS community in 2026. Six tools, one operating model: replace ad-hoc “vibe coding” with a verifiable pipeline from spec to production.
Why it matters: vibe coding fails at ~80% complete because context overflows, the model contradicts earlier decisions, and there is no audit trail. Spec-driven dev fixes the root cause — every change traces back to a human-readable spec.
The Stack
| # | Tool | Role | Stars |
|---|---|---|---|
| 1 | BMAD | Agent team + 7 human gates (G0–G6) | 46k |
| 2 | GSD (Get Shit Done) | Wave execution, fresh 200K context per wave | 58k |
| 3 | Tracer BART mode | Parallel ticket orchestrator with closed-loop verify | — |
| 4 | GitHub Spec Kit | Vendor-agnostic spec layer (spec.md → plan.md → tasks.md) | 91k |
| 5 | Claude Code | Execution engine: plan mode, auto mode, OTel, MCP | — |
| 6 | GStack (Gary Tan) | 9-role virtual dev team incl. security officer + QA | — |
How They Compose
SpecKit ──► spec.md / plan.md / tasks.md (the source of truth)
│
├── BMAD agents (Mary, John, Winston) refine spec, gate G0–G2
│
├── GSD waves: Discuss → Plan → Execute → Verify (fresh window each)
│
├── BART mode: parallelise independent tickets, dependency-aware
│
├── Claude Code: plan mode for first run, auto mode for iterations
│ ├── MCP: DB / repo / API as native tools
│ └── OpenTelemetry: every decision becomes a trace span
│
└── GStack: security officer reviews every PR, QA runs every suite
Each layer adds a different kind of verification. BMAD = human gates. GSD = context discipline. BART = closed-loop self-check. SpecKit = artifact chain. Claude Code = audit trail. GStack = role-separated review.
Tool-by-Tool
1. BMAD — Blueprint Method for Agentic Development
- Insight: vibe coding has no pre-work. BMAD forces planning before code.
- Roles: Mary (analyst → PRD), John (PM → stories), Winston (architect → system design).
- Governance: seven HITL gates G0–G6. G0 problem framing → G6 production deploy. No skipping.
- Result: 50–70% reduction in major refactors vs. vibe-coded baselines.
- See BMad Autonomous Development for the
/badorchestrator we already use.
2. GSD — Get Shit Done
- Problem solved: context rot. Long sessions degrade output quality.
- Fix: never let context get long. Fresh 200K window per wave. Output of one wave seeds the next.
- Loop: Discuss → Plan → Execute → Verify, then commit and start a new window.
- Aligns with our existing rule:
/compact at 50%, /clear at 80%(see globalCLAUDE.md).
3. Tracer BART mode — Smart YOLO
- Scope: runs an entire epic, not a single task.
- Mechanic: breaks the spec into parallel tickets, dispatches agents, manages dependencies.
- Closed-loop verify: failed verification triggers a re-plan, not a retry of the same prompt.
- Case study: analytics dashboard spec → working product in 93 minutes.
4. GitHub Spec Kit — specify
- Why it leads in stars (91k): vendor-agnostic. Works with Claude, GPT, Gemini, any agent.
- Install:
uv tool install specify-cli --from git+https://github.com/github/spec-kit.git - Six commands:
specify,plan,tasks,implement,constitution,tasks-to-issues(MCP → GitHub). - Artifact chain:
spec.md → plan.md → tasks.md. Every file is plain markdown — any agent can pick it up cold.
5. Claude Code — execution engine
- Plan mode: writes the full plan, you review, then executes. Equivalent to BMAD G-gates at the per-task level.
- Auto mode: autonomous within permission boundaries.
- OpenTelemetry tracing: every prompt → code-change link visible in Langfuse / any OTel backend.
- MCP: DB, repos, internal APIs as native tools. Agent operates on the real system, not in isolation.
- See Cost & Observability for our OTel Docker stack.
6. GStack — Gary Tan’s 9 roles
- Premise: stop thinking single assistant. Think virtual dev team with org structure.
- Roles by default: CEO, Engineering Manager, QA, Security Officer, Docs Engineer, Frontend, Backend, DevOps, Data.
- Security Officer alone is worth the setup cost — every PR gets a security review for free.
/browse: agents open Chrome DevTools Protocol, visually inspect the UI, detect layout breaks. Visual QA at agent speed.- Headline claim: 10k LOC / 100 PRs per week. Disputed — community has not independently replicated. Treat as ceiling, not baseline.
What to Adopt at Harris (priority order)
- Spec Kit — install globally (
specifyCLI is now on this machine). Highest ROI, lowest friction. Try on one Wraith ticket. - GSD discipline — already partially in place via
/compactand/clearrules. Formalise the wave loop:Discuss → Plan → Execute → Verify. - GStack security-officer pattern — adapt for Wraith / Centurion. Run a
security-reviewersubagent on every PR before merge. - OTel traces on Claude Code — gives auditable production trail. Useful for regulated workloads.
- BMAD G-gates — only on net-new builds, not bugfixes. Gate overhead is real on small work.
What to Skip / Be Skeptical Of
- GStack 10k-LOC/wk number — unreplicated. Don’t quote it externally.
- BMAD G0–G6 on small fixes — gate overhead exceeds bug-fix budget. Reserve for new builds.
- BART mode unattended overnight — closed-loop verify is good, but our
Marathon guardrule still applies (no 60+ min unattended runs).
Resources
- Spec Kit: https://github.com/github/spec-kit
- BMAD: see existing BMad guide
- GStack: Gary Tan’s repos (multi-role agent harness)
- Adoption rollout: see Adoption Playbook