21 KiB

Raw Blame History

Outline: Beyond Vibe Coding

Article: Beyond Vibe Coding: Professional AI Development Methodologies
Author: henry-technical
Type: Explainer / Survey
Target: 2,800 words
Reading time: ~11 minutes

Article Structure Overview

Hook: Vibe coding = Collins Word of Year 2025, but it's insufficient for production work

Core message: Professional AI coding isn't just vibe coding — there's a spectrum of methodologies. Seniors use AI MORE than juniors, and methodology is what separates pros from beginners.

Tone: "Here's what exists and here's what I actually do" — landscape survey through practitioner's lens, not prescriptive guide

Journey: Entry point (vibe coding) → survey of professional approaches → personal experience → invitation to share

Introduction (400 words)

Goal: Hook with vibe coding phenomenon, establish why the term is problematic, promise survey of professional alternatives

Opening Hook (100 words)

Start with Collins Dictionary Word of Year 2025 announcement
Vibe coding caught mainstream attention — finally a term for "AI + prompting until it works"
Henry's take: "I remember when vibe coding meant something different. Now it's everywhere."
Relatable problem: works for prototypes, fails for production

The Problem with "Vibe Coding" (150 words)

Term has negative connotations: unprofessional, unreliable, "toy for juniors"
But 76% of developers using AI tools (Stack Overflow 2024)
Real issue: term conflates ALL AI-assisted development into one bucket
Creates stigma: "Is using AI unprofessional?"
Deeper question developers face: "Can I use AI and still be a real engineer?"

The Reality (150 words)

Key stat: Seniors (10+ years) use AI MORE than juniors
About a third of senior devs generate over half their code with AI
Only 13% of junior devs do the same — 2.5x difference
Professional AI usage ≠ junior with ChatGPT
Methodology separates pros from beginners
Promise: survey of 6 professional approaches + what I actually use

Code/Visual: None in intro Transition: "Let's look at what comes after vibe coding."

Section 1: Vibe Coding (Baseline) (400 words)

Goal: Define vibe coding as entry point, establish it as valid for certain contexts, but insufficient for production

Credentials Block (80 words)

Name: Vibe Coding
Source: Popularized by Andrej Karpathy (Feb 2025), Collins Dictionary
Created by: Community-coined term, formalized by Karpathy
When: 2024-2025, peaked December 2025
Used by: Indie developers, prototypers, early AI adopters
Official definition: Collins Dictionary: "A method of computer programming that relies heavily on artificial intelligence"

What It Is (100 words)

Iterative prompting until code works
No upfront planning, minimal specification
Trust AI to handle details
Fix issues as they appear
Focus on outcome, not process

When It Works (120 words)

Dev tools not going to production
Prototypes and experiments
Side projects with low stakes
Solo work with no handoff requirements
Henry's experience: "I've used this plenty. Works great for internal tools and weekend projects."

The Catch (100 words)

Breaks down at scale
Hard to maintain or handoff
No documentation or structure
Quality inconsistent
Security concerns: Research shows 45-62% of AI-generated code contains security vulnerabilities [1][2][3]
Enterprise response: 27% of companies banned AI tools (Cisco 2024)

Sources:

[1] Georgetown CSET: "Cybersecurity Risks of AI-Generated Code" (Nov 2024)
[2] Veracode: "AI-Generated Code: A Double-Edged Sword" (Sept 2025)
[3] Industry reports (Oct 2025)

Henry's take from interview: "Vibe coding isn't wrong, it's context-dependent. I use it for dev tools. But for production? You need something more structured."

Code example: None — vibe coding is about LACK of structure

Transition: "So what do professionals use instead?"

Section 2: Spec-Driven Development (450 words)

Goal: Present spec-driven as direct contrast to vibe coding — upfront planning, clear requirements, controlled execution

Credentials Block (100 words)

Name: Spec-Driven Development (SDD)
Source: GitHub Spec Kit (github.com/github/spec-kit), GitHub Engineering Blog
Created by: GitHub Engineering Team, formalized by Martin Fowler
When: 2024-2025, emerged as one of 2025's key AI-assisted engineering practices (Thoughtworks)
Used by: GitHub Copilot Workspace, Claude Code users, enterprise teams
Key tools launched: AWS Kiro, GitHub Spec Kit, Tessl Framework

What It Is (120 words)

Write detailed specification BEFORE code
Spec includes: requirements, architecture, API contracts, error handling, edge cases
AI executes against spec
Spec becomes living documentation (CLAUDE.md, .spec files)
Human focuses on WHAT, AI handles HOW
Spec often saved as CLAUDE.md or .spec files in project root

How It Works (100 words)

Write spec in natural language or structured format
Include examples, constraints, acceptance criteria
Agent reads spec, generates code
Iterate on spec if needed, not just on code
Spec stays updated as project evolves

When to Use (80 words)

Medium to high stakes projects
Code that needs handoff or maintenance
When requirements are clear
Enterprise/production code
Multi-developer projects

Henry's perspective from interview (integrated naturally): Time writing spec often exceeds time coding. I've spent half a day on specification, then watched Claude Code finish implementation in 20 minutes. Feels unfair, but the results are solid.

The spec becomes reference for future work — months later, new session starts with "read the spec, find the code."

Challenge: Specs drift from implementation. Architecture changes, paths rename, approaches shift. Keeping spec current = cognitive load. Solution: commit spec changes alongside code.

Pro tip: Use Claude Desktop for spec development, not just execution. Research, brainstorm, find architecture, THEN write spec. Much better than solo spec writing.

Code Example (50 words + code block)

Example CLAUDE.md snippet:

# Image Generation API Integration

## Requirements
- Generate images via Banatie API
- Cache results in database (URL + prompt hash)
- Serve via CDN redirect pattern
- Handle rate limits with exponential backoff

## API Contract
POST /api/images/generate
Body: { prompt: string, projectId: string }
Returns: { imageUrl: string, cached: boolean }

## Error Handling
- 429 Rate Limit → retry with backoff
- 500 Server Error → fallback to placeholder
- Invalid prompt → return validation error

Transition: "Spec-driven gives you control. But what if you want even MORE automation?"

Section 3: Agentic Coding + Ralph Loop (500 words)

Goal: Present agentic coding as high-autonomy approach, introduce Ralph Loop as controversial extreme

Credentials Block (100 words)

Name: Agentic Coding (+ Ralph Loop variant)
Source: arXiv 2508.11126 (Aug 2025), arXiv 2512.14012 (Dec 2025)
Created by: Research community (agentic coding), Geoffrey Huntley (Ralph Loop, May 2025)
When: 2024-2025, Ralph Loop went viral Jan 2026
Used by: Claude Code, experimental workflows, research projects
Tools: Claude Code, Cursor Composer, GitHub Copilot Workspace (agent modes)

What It Is (120 words)

Agent operates with high degree of autonomy
Human sets high-level goals, agent figures out implementation
Agent can plan, execute, debug, iterate without constant approval
Differs from vibe coding: systematic, can course-correct
Ralph Loop extreme: 14-hour autonomous sessions (Geoffrey Huntley)

Agentic vs Vibe Coding (80 words)

Vibe: reactive prompting, no plan
Agentic: agent creates plan, executes systematically
Both involve iteration, but agentic = structured iteration
Agent can debug itself, vibe coding requires human debugging

Ralph Loop (120 words)

Named after Ralph Wiggum (Simpsons character)
Concept: give agent task, walk away, return to finished work
VentureBeat: "How Ralph Wiggum went from Simpsons to AI" (Jan 2026)
Anthropic released official ralph-wiggum plugin by Boris Cherny
Controversial: works for some, mystifying for others
Search volume: 10/month but 140 in December 2025 (trending)

Henry's honest take from interview: I want to believe in Ralph Loop. The idea of 14-hour autonomous sessions sounds amazing. But here's my question: what tasks justify that much autonomous work?

Writing a detailed spec takes me longer than executing it. If Claude Code finishes in 20 minutes, why would I need 14 hours of autonomy?

I'm skeptical about use cases in my projects. Maybe it works for certain domains — large refactors, extensive testing, documentation generation?

If you've found great Ralph Loop applications, share in comments. Genuinely curious.

Permissions Reality Check (100 words)

Agentic coding hits permissions wall
Claude Code asking approval for every file write, API call, terminal command
Breaks flow, defeats autonomy promise
Henry's workaround: "I ask Claude to add all MCP tools to .claude/settings.json proactively"
Sometimes runs --dangerously-skip-permissions but monitors activity
"Nothing git reset can't fix"
This is evolving UX challenge tools are still figuring out

Code example: .claude/settings.json permissions snippet

{
  "allow_all_mcp_tools": true,
  "filesystem_write": ["src/**", "tests/**"],
  "terminal_commands": ["npm", "git", "pytest"]
}

Transition: "High autonomy is one approach. But what about working WITH the AI, not just delegating TO it?"

Section 4: AI Pair Programming (400 words)

Goal: Present pair programming paradigm — collaboration, not just delegation

Credentials Block (100 words)

Name: AI Pair Programming
Source: GitHub official docs, Microsoft Learn
Created by: GitHub (Copilot team), popularized by Copilot marketing
When: 2021-present, evolved from "AI autocomplete" to "pair programmer"
Used by: GitHub Copilot, Cursor, Windsurf
Official tagline: GitHub Copilot = "Your AI pair programmer"

What It Is (100 words)

AI as collaborative partner, not just tool
Continuous suggestions during coding
Context-aware completions
Real-time feedback and alternatives
More than autocomplete: understands project context
720 vol/month for "ai pair programming" (KD 50)

The Reality: Autocomplete ≠ Pair Programming (150 words)

Henry's honest experience from interview: I've tried AI autocomplete multiple times. Each time, I ended up disabling it completely.

Why? When I'm writing code, I've already mentally worked out what I want. The AI suggesting my next line just interrupts my thought process. Standard IDE completions always worked fine for me.

I know many developers love it. Just doesn't fit my workflow.

Real pair programming: Claude Desktop with good system instructions + Filesystem MCP to read actual project files. That's when I feel like I'm working WITH someone who understands my problem and helps solve it.

Autocomplete is reactive. Real pair programming is proactive — discussion, exploration, questioning assumptions.

When It Works (50 words)

Boilerplate reduction
Learning new APIs (seeing examples in context)
Pattern matching across codebase
Repetitive tasks (tests, type definitions)
When developer is receptive to interruptions

Stats:

56% faster task completion (GitHub study)
126% more projects per week for Copilot users
But: experienced devs sometimes 19% SLOWER (METR study)
Effectiveness varies wildly by task type

Transition: "Whether you delegate or collaborate, one question remains: how much oversight?"

Section 5: Human-in-the-Loop (HITL) (400 words)

Goal: Present HITL as balance between autonomy and control — strategic checkpoints

Credentials Block (100 words)

Name: Human-in-the-Loop (HITL)
Source: Atlassian Research (HULA framework), Google Cloud AI docs
Created by: Atlassian Engineering, formalized in ICSE 2025 paper
When: 2024-2025 (academic formalization)
Used by: Enterprise AI systems, Claude Code Planning Mode
Key paper: arXiv 2411.12924 "HULA: Human-Understanding Large Language Model Agents"

What It Is (100 words)

AI operates autonomously BETWEEN checkpoints
Human approves key decisions, reviews output
Not constant supervision, strategic oversight
Agent proposes approach, human confirms direction
Balance: automation + control

Permissions ≠ HITL (120 words)

Henry's take from interview: Permissions aren't HITL. They're too low-level — "can I write this file?" tells me nothing about what the agent is actually solving.

Real HITL is Planning Mode. Agent shows plan: "here's what I'll do, these files will change, expected outcome." That's decision-level control.

The problem: current agents don't understand WHEN to stop and ask. Rarely hits the right moment. Either too much autonomy (goes off track) or too many interruptions (breaks flow).

Future improvement: agents that know when they're uncertain and should consult human. Like "I don't know" responses — current models aren't good at this.

Planning Mode as HITL (80 words)

Claude Code: Planning Mode = default for non-trivial tasks
See full plan before execution
Approve, modify, or reject
Agent executes autonomously after approval
Check results at end

When to Use (100 words)

Production code with moderate complexity
When outcome matters but speed also matters
Team environments (others will review)
Learning new approaches (see agent's reasoning)
Medium stakes: not prototype (vibe), not critical infrastructure (TDD)

Code example: None — HITL is process, not code pattern

Transition: "What about the highest stakes code, where bugs are expensive?"

Section 6: TDD + AI (450 words)

Goal: Present TDD as quality-first approach — tests as specification and safety net

Credentials Block (100 words)

Name: Test-Driven Development with AI (TDD + AI)
Source: Qodo.ai blog, Builder.io guide, GitHub Blog
Created by: Adapted from traditional TDD (Kent Beck), modernized for AI era
When: 2024-2025 (AI-specific implementations)
Used by: Quality-focused teams, enterprise production code
Key article: "TDD with GitHub Copilot" (GitHub Blog, May 2025)

What It Is (120 words)

Write tests BEFORE implementation (classic TDD)
AI generates code to pass tests
Tests = executable specification
Red → Green → Refactor cycle with AI
Tests catch AI mistakes automatically
Tests provide verification without human review of every line

Tests as Specification (100 words)

Henry's perspective from interview: Tests are absolutely important for key functionality. I always instruct agents to run tests.

But here's the thing: writing comprehensive tests upfront + detailed spec = that's already 80% of the work. If you've written that much structure, is the AI really saving time?

Most valuable when you have existing spec that converts to tests — like API documentation. Then yes, tests-first makes perfect sense.

The Guardrails Approach (120 words)

Tests = safety boundaries for agent
Agent can iterate freely within test constraints
No need to review every implementation detail
Just verify: tests pass, coverage maintained
Especially valuable for agentic coding

Critical warning from interview: AI-written tests need human review. I've seen agents write "passing" tests using mocked requests — test passes, code is broken.

Correct tests = solid foundation. Bad tests = false confidence that destroys future work.

Tests verify behavior, not just syntax. Make sure test logic is sound before trusting it.

When to Use (110 words)

High-stakes production code
APIs and integrations (clear contracts)
Security-critical functions
Code with compliance requirements
Refactoring (tests ensure behavior preserved)
When you need confidence in AI output

Code Example: Simple TDD example:

// 1. Write test first
describe('generateImage', () => {
  it('caches results for duplicate prompts', async () => {
    const result1 = await generateImage({ prompt: 'cat' });
    const result2 = await generateImage({ prompt: 'cat' });
    
    expect(result2.cached).toBe(true);
    expect(result1.imageUrl).toBe(result2.imageUrl);
  });
});

// 2. Agent implements to pass test
// 3. Refactor with confidence

Transition: "Six approaches. What ties them together?"

Conclusion (450 words)

Goal: Wrap up landscape survey, reinforce progression from vibe to professional approaches, validate AI usage, invite community sharing

The Landscape Exists (120 words)

So that's what exists beyond vibe coding.

Six methodologies, each with serious foundation — GitHub Spec Kit, academic papers, enterprise adoption. Not random hacks or Twitter trends. Real approaches with real backing.

Vibe coding caught mainstream attention because it resonated. Everyone who's used ChatGPT to debug something recognizes that feeling of "just prompt until it works." But it's the entry point, not the destination.

The landscape is richer than "vibe vs not vibe." Spec-driven for structure. Agentic for autonomy. Pair programming for collaboration. HITL for control. TDD for quality. Different tools for different contexts.

And it's still evolving. Ralph Loop emerged last year. Planning Mode is new. These methodologies will keep developing as AI tools mature.

The Legitimacy Question (120 words)

Back to the underlying question: "Is using AI unprofessional?"

No. The data says otherwise:

76% of developers are using or planning to use AI tools
About a third of senior developers (10+ years experience) generate over half their code with AI
Only 13% of junior developers do the same — that's a 2.5x difference

Professionals use AI MORE than beginners, not less. Google writes 25% of their code with AI. Major companies have adopted AI coding tools across their engineering organizations. That's not unprofessional. That's the new normal.

But HOW you use it matters. Vibe coding for production systems isn't professional. Spec-driven with tests and review? Absolutely professional.

What Makes It Professional (100 words)

The difference isn't the tool. It's the approach:

Clear requirements (spec, tests, or planning phase)
Appropriate oversight (human review, HITL, verification)
Quality controls (tests, linting, security scans)
Maintainability (documentation, handoff-ready structure)
Context awareness (knowing when vibe coding isn't enough)

Seniors achieve 2.5x more value from the same AI tools because they apply methodology, not better prompts. That's the skill that matters.

Professional AI coding means choosing the right approach for the stakes. Weekend prototype? Vibe away. Production payment system? Tests first, spec-driven, reviewed.

What I Actually Use (110 words)

Here's what works for me:

Dev tools and experiments: vibe coding works fine
Production features: spec-driven with Planning Mode
Critical systems: TDD + extensive review
Research and exploration: Claude Desktop as true pair programmer

Your context might be different. Your choices might be different. That's fine.

The point isn't to follow my exact workflow. The point is knowing that choices exist beyond vibe coding, and understanding what each methodology offers.

If you're doing something different — different tools, different approaches, different combinations — share your wins in the comments. What approaches are working for you as an engineer?

Closing: This is what exists. This is what I use. Go see what works for you.

Code Examples Summary

Section	Code Type	Purpose
Spec-Driven	CLAUDE.md example	Show spec format
Agentic	.claude/settings.json	Permissions config
TDD	TypeScript test + impl	Test-first workflow

Total code blocks: 3 Code-to-prose ratio: ~15% (appropriate for explainer/survey)

Visual Assets Needed

Asset	Type	Description	Section
Hero image	Abstract	Spectrum visualization — vibe to professional methodologies	Top
Stats callout	Infographic	Key stats visualization	Introduction

SEO Notes

Primary keyword placement:

"ai coding methodologies" in H1, intro (2x), conclusion
Natural integration, never forced

Secondary keywords:

"spec driven development" in H2, section content
"ai pair programming" in H2, section content
"human in the loop ai" in H2, section content
"ralph loop" in H2, agentic section

Internal linking opportunities:

Link to Banatie docs (if relevant to image generation in examples)
Link to author's other AI development content

Halo keywords (tool mentions):

Claude Code, Cursor, GitHub Copilot throughout
Natural mentions, not forced for SEO

Outline created: 2026-01-23 Status: Validation complete, ready for @writer Revisions: Removed false claims (359x growth, 90% Fortune 100), added source citations for security vulnerabilities, updated senior developer stat to "about a third"

21 KiB Raw Blame History

Outline: Beyond Vibe Coding

Article Structure Overview

Introduction (400 words)

Opening Hook (100 words)

The Problem with "Vibe Coding" (150 words)

The Reality (150 words)

Section 1: Vibe Coding (Baseline) (400 words)

Credentials Block (80 words)

What It Is (100 words)

When It Works (120 words)

The Catch (100 words)

Section 2: Spec-Driven Development (450 words)

Credentials Block (100 words)

What It Is (120 words)

How It Works (100 words)

When to Use (80 words)

Code Example (50 words + code block)

Section 3: Agentic Coding + Ralph Loop (500 words)

Credentials Block (100 words)

What It Is (120 words)

Agentic vs Vibe Coding (80 words)

Ralph Loop (120 words)

Permissions Reality Check (100 words)

Section 4: AI Pair Programming (400 words)

Credentials Block (100 words)

What It Is (100 words)

The Reality: Autocomplete ≠ Pair Programming (150 words)

When It Works (50 words)

Section 5: Human-in-the-Loop (HITL) (400 words)

Credentials Block (100 words)

What It Is (100 words)

Permissions ≠ HITL (120 words)

Planning Mode as HITL (80 words)

When to Use (100 words)

Section 6: TDD + AI (450 words)

Credentials Block (100 words)

What It Is (120 words)

Tests as Specification (100 words)

The Guardrails Approach (120 words)

When to Use (110 words)

Conclusion (450 words)

The Landscape Exists (120 words)

The Legitimacy Question (120 words)

What Makes It Professional (100 words)

What I Actually Use (110 words)

Code Examples Summary

Visual Assets Needed

SEO Notes

21 KiB

Raw Blame History