# Human-in-the-Loop and TDD + AI: When Quality Matters *Part 3 of the "Beyond Vibe Coding" series* In [Part 1](/henry-devto/what-is-vibe-coding-in-2026), we covered vibe coding and spec-driven development. In [Part 2](/henry-devto/ai-pair-programming-vs-agentic-coding), we explored the autonomy spectrum from pair programming to Ralph Loop. Now let's talk about guardrails. When stakes are high, vibes aren't enough. You need structure that catches mistakes before they ship. --- ## Human-in-the-Loop: Strategic Checkpoints **Credentials:** - Atlassian Research: HULA framework (Human-Understanding Large Language Model Agents) - Formalized in ICSE 2025 paper ([arXiv 2411.12924](https://arxiv.org/abs/2411.12924)) - [Google Cloud AI documentation](https://cloud.google.com/discover/what-is-human-in-the-loop-machine-learning) on HITL patterns - Implemented in: Claude Code Planning Mode, Cursor Composer approval flows **What it is:** AI operates autonomously BETWEEN checkpoints. Human approves key decisions, reviews output at strategic moments. Not constant supervision — strategic oversight. Agent proposes approach, human confirms direction. Then agent executes freely until next checkpoint. **Permissions ≠ HITL:** Don't confuse permissions with Human-in-the-Loop. Permissions are too low-level. "Can I write this file?" tells me nothing about what task the agent is actually solving. Real HITL is Planning Mode. Agent shows the plan: "here's what I'll do, these files will change, here's the expected outcome." That's decision-level control. The problem with current agents: they don't understand WHEN to stop and ask. Rarely hit the right moment. Either too much autonomy (goes off track) or too many interruptions (breaks flow). [TODO: HITL comic — robots discussing getting rid of the human, final panel shows human among circle of robots passing boxes] Future improvement area: agents that know when they're uncertain and should consult the human. Like "I don't know" responses — current models aren't great at this in practice. **When to use:** Production code with moderate complexity. When outcome matters but speed also matters. Team environments where others will review anyway. Learning new approaches where you want to see the agent's reasoning. Medium stakes: not prototype territory (vibe coding works there), not critical infrastructure (TDD territory). --- ## TDD + AI: Quality First **Credentials:** - Official name: "AI-aided test-first development" — [Thoughtworks Technology Radar](https://www.thoughtworks.com/en-us/radar/techniques/ai-aided-test-first-development) (April 2023, status: TRIAL) - [DORA Report 2025](https://dora.dev/research/2025/dora-report/) (Google Cloud): "AI is an amplifier, not a fix" — organizations with strong testing practices get more benefit from AI - [Google Cloud analysis](https://cloud.google.com/discover/how-test-driven-development-amplifies-ai-success) (January 2026): "How TDD Amplifies AI Success" - Kent Beck (creator of TDD): [Pragmatic Engineer Podcast](https://newsletter.pragmaticengineer.com/p/tdd-ai-agents-and-coding-with-kent) (June 2025) — "TDD is a superpower when working with AI. I communicate things the Genie missed in terms of tests" - [8th Light](https://8thlight.com/insights/tdd-effective-ai-collaboration): "TDD: The Missing Protocol for Effective AI Collaboration" (July 2025) - [Builder.io guide](https://www.builder.io/blog/test-driven-development-ai): "AI turns TDD's weaknesses into strengths" (August 2025) - Tools: [Qodo](https://www.qodo.ai/blog/ai-code-assistants-test-driven-development/) (AI test generation), Claude Code, Cursor **How it works:** Write tests BEFORE implementation (classic TDD). AI generates code to pass tests. Tests become executable specification. Red → Green → Refactor cycle, but AI handles implementation. Tests catch AI mistakes automatically. Tests provide verification without human review of every line. **Tests as specification:** Tests are absolutely critical for key functionality. I always instruct agents to run tests. But here's the thing: writing comprehensive tests upfront plus detailed spec — that's already 80% of the work. If you've written that much structure, is AI really saving time? Most valuable when you have existing spec that naturally converts to tests — like API documentation. Then yes, tests-first makes perfect sense. ![Comic: Developer writes tests to verify AI agent code using another AI agent](https://cdn.banatie.app/blog/henry-devto/img/94559d7c-06ab-4e5f-860a-87419906f3b5) **The guardrails approach:** Tests become safety boundaries for the agent. Agent can iterate freely within test constraints. No need to review every implementation detail. Just verify: tests pass, coverage maintained. Especially valuable for agentic coding. Let the AI experiment, tests catch the mistakes. **Critical warning:** AI-written tests need human review. I've seen agents write "passing" tests using mocked requests — test passes, code is broken. The test verified syntax, not behavior. Correct tests = solid foundation. Bad tests = false confidence that destroys future work. Review test logic before trusting it. Make sure tests verify actual behavior, not just that code runs. --- ## Conclusion What I typically use: - Dev tools and experiments: vibe coding works fine. - Production features: spec-driven with Planning Mode. - Critical systems: TDD plus extensive review. - Research and exploration: Claude Desktop as true pair programmer. The pattern? Higher stakes → more structure. Lower stakes → more vibes. Your approach might be different. If you do things differently — different tools, different approaches, different combinations — share your wins in the comments. What works for you as an engineer? --- ## The Full Series - **Part 1**: [What Is Vibe Coding in 2026?](/henry-devto/what-is-vibe-coding-in-2026) — vibe coding + spec-driven development - **Part 2**: [AI Pair Programming vs Agentic Coding](/henry-devto/ai-pair-programming-vs-agentic-coding) — the autonomy spectrum - **Part 3**: Human-in-the-Loop and TDD + AI — guardrails and quality (you are here) --- *What's your approach? Pure vibes, full TDD, or something in between? I'm curious what actually works in your projects.*