111 lines
6.2 KiB
Markdown
111 lines
6.2 KiB
Markdown
# Human-in-the-Loop and TDD + AI: When Quality Matters
|
|
|
|
*Part 3 of the "Beyond Vibe Coding" series*
|
|
|
|
In [Part 1](/henry-devto/what-is-vibe-coding-in-2026), we covered vibe coding and spec-driven development. In [Part 2](/henry-devto/ai-pair-programming-vs-agentic-coding), we explored the autonomy spectrum from pair programming to Ralph Loop.
|
|
|
|
Now let's talk about guardrails. When stakes are high, vibes aren't enough. You need structure that catches mistakes before they ship.
|
|
|
|
---
|
|
|
|
## Human-in-the-Loop: Strategic Checkpoints
|
|
|
|
**Credentials:**
|
|
- Atlassian Research: HULA framework (Human-Understanding Large Language Model Agents)
|
|
- Formalized in ICSE 2025 paper ([arXiv 2411.12924](https://arxiv.org/abs/2411.12924))
|
|
- [Google Cloud AI documentation](https://cloud.google.com/discover/what-is-human-in-the-loop-machine-learning) on HITL patterns
|
|
- Implemented in: Claude Code Planning Mode, Cursor Composer approval flows
|
|
|
|
**What it is:**
|
|
|
|
AI operates autonomously BETWEEN checkpoints. Human approves key decisions, reviews output at strategic moments. Not constant supervision — strategic oversight.
|
|
|
|
Agent proposes approach, human confirms direction. Then agent executes freely until next checkpoint.
|
|
|
|
**Permissions ≠ HITL:**
|
|
|
|
Don't confuse permissions with Human-in-the-Loop. Permissions are too low-level. "Can I write this file?" tells me nothing about what task the agent is actually solving.
|
|
|
|
Real HITL is Planning Mode. Agent shows the plan: "here's what I'll do, these files will change, here's the expected outcome." That's decision-level control.
|
|
|
|
The problem with current agents: they don't understand WHEN to stop and ask. Rarely hit the right moment. Either too much autonomy (goes off track) or too many interruptions (breaks flow).
|
|
|
|
[TODO: HITL comic — robots discussing getting rid of the human, final panel shows human among circle of robots passing boxes]
|
|
|
|
Future improvement area: agents that know when they're uncertain and should consult the human. Like "I don't know" responses — current models aren't great at this in practice.
|
|
|
|
**When to use:**
|
|
|
|
Production code with moderate complexity. When outcome matters but speed also matters. Team environments where others will review anyway. Learning new approaches where you want to see the agent's reasoning.
|
|
|
|
Medium stakes: not prototype territory (vibe coding works there), not critical infrastructure (TDD territory).
|
|
|
|
---
|
|
|
|
## TDD + AI: Quality First
|
|
|
|
**Credentials:**
|
|
- Official name: "AI-aided test-first development" — [Thoughtworks Technology Radar](https://www.thoughtworks.com/en-us/radar/techniques/ai-aided-test-first-development) (April 2023, status: TRIAL)
|
|
- [DORA Report 2025](https://dora.dev/research/2025/dora-report/) (Google Cloud): "AI is an amplifier, not a fix" — organizations with strong testing practices get more benefit from AI
|
|
- [Google Cloud analysis](https://cloud.google.com/discover/how-test-driven-development-amplifies-ai-success) (January 2026): "How TDD Amplifies AI Success"
|
|
- Kent Beck (creator of TDD): [Pragmatic Engineer Podcast](https://newsletter.pragmaticengineer.com/p/tdd-ai-agents-and-coding-with-kent) (June 2025) — "TDD is a superpower when working with AI. I communicate things the Genie missed in terms of tests"
|
|
- [8th Light](https://8thlight.com/insights/tdd-effective-ai-collaboration): "TDD: The Missing Protocol for Effective AI Collaboration" (July 2025)
|
|
- [Builder.io guide](https://www.builder.io/blog/test-driven-development-ai): "AI turns TDD's weaknesses into strengths" (August 2025)
|
|
- Tools: [Qodo](https://www.qodo.ai/blog/ai-code-assistants-test-driven-development/) (AI test generation), Claude Code, Cursor
|
|
|
|
**How it works:**
|
|
|
|
Write tests BEFORE implementation (classic TDD). AI generates code to pass tests. Tests become executable specification.
|
|
|
|
Red → Green → Refactor cycle, but AI handles implementation. Tests catch AI mistakes automatically. Tests provide verification without human review of every line.
|
|
|
|
**Tests as specification:**
|
|
|
|
Tests are absolutely critical for key functionality. I always instruct agents to run tests.
|
|
|
|
But here's the thing: writing comprehensive tests upfront plus detailed spec — that's already 80% of the work. If you've written that much structure, is AI really saving time?
|
|
|
|
Most valuable when you have existing spec that naturally converts to tests — like API documentation. Then yes, tests-first makes perfect sense.
|
|
|
|

|
|
|
|
**The guardrails approach:**
|
|
|
|
Tests become safety boundaries for the agent. Agent can iterate freely within test constraints. No need to review every implementation detail. Just verify: tests pass, coverage maintained.
|
|
|
|
Especially valuable for agentic coding. Let the AI experiment, tests catch the mistakes.
|
|
|
|
**Critical warning:**
|
|
|
|
AI-written tests need human review. I've seen agents write "passing" tests using mocked requests — test passes, code is broken. The test verified syntax, not behavior.
|
|
|
|
Correct tests = solid foundation. Bad tests = false confidence that destroys future work.
|
|
|
|
Review test logic before trusting it. Make sure tests verify actual behavior, not just that code runs.
|
|
|
|
---
|
|
|
|
## Conclusion
|
|
|
|
What I typically use:
|
|
- Dev tools and experiments: vibe coding works fine.
|
|
- Production features: spec-driven with Planning Mode.
|
|
- Critical systems: TDD plus extensive review.
|
|
- Research and exploration: Claude Desktop as true pair programmer.
|
|
|
|
The pattern? Higher stakes → more structure. Lower stakes → more vibes.
|
|
|
|
Your approach might be different. If you do things differently — different tools, different approaches, different combinations — share your wins in the comments. What works for you as an engineer?
|
|
|
|
---
|
|
|
|
## The Full Series
|
|
|
|
- **Part 1**: [What Is Vibe Coding in 2026?](/henry-devto/what-is-vibe-coding-in-2026) — vibe coding + spec-driven development
|
|
- **Part 2**: [AI Pair Programming vs Agentic Coding](/henry-devto/ai-pair-programming-vs-agentic-coding) — the autonomy spectrum
|
|
- **Part 3**: Human-in-the-Loop and TDD + AI — guardrails and quality (you are here)
|
|
|
|
---
|
|
|
|
*What's your approach? Pure vibes, full TDD, or something in between? I'm curious what actually works in your projects.*
|