16 KiB
What Is Vibe Coding in 2026? One Year From Karpathy's Tweet
What is vibe coding in 2026? Exactly one year ago — February 2, 2025 — Andrej Karpathy posted the tweet that started it all. The term became Collins Dictionary's Word of the Year. But here's the thing: what Karpathy meant and what "vibe coding" (or "vibecoding" as some write it) means now are two different things.
{% embed https://x.com/karpathy/status/1886192184808149383 %}
When Karpathy first used the term, he meant something specific. You tell the agent what to do and evaluate the result. The key "vibe" is that you don't dig into HOW the code is written. Something doesn't work? Just write the next prompt, and so on. Now "vibe coding" often means any AI-assisted development at all. Though honestly — in many cases, that's exactly how it works.
We're already seeing derivatives — vibe design, vibe ops, vibe anything. But professional developers need more than vibes. That's where approaches like spec-driven development come in — and that's what this series is about.
But be honest: when you accept the agent's changes without looking, you catch yourself thinking — is this actually done right, can I trust what the LLM generated without reviewing it? When a colleague says they vibe-coded some functionality — do you picture thoughtful architecture or more like "somehow works"? Is vibe coding cheating and irresponsibility, or a perfectly professional approach?
What I know for certain — AI development is here, whatever we call it. According to Stack Overflow 2024, 76% of developers use or plan to use AI tools. About a third of senior developers — those with 10+ years of experience — generate more than half their code with AI.
Let's figure out what exactly we can do with AI. Different approaches exist, giving more control at different stages of work. Choosing the right one and applying it consciously — that's the professional approach. In this article, I'll cover existing AI development methodologies that I've used in practice, with my honest commentary.
Vibe Coding: The Entry Point
What it is:
- Popularized by Andrej Karpathy (February 2025)
- Iterative prompting until code works
- No upfront planning, minimal specification
- Trust AI to handle details, fix issues as they appear
Vibe coding is a great approach. Really. I use it often myself. It works perfectly for non-critical features, dev tools, prototypes, experiments. [правда я все же использую клавиатуру для этого]
When do I use it?
- When the result is easy to evaluate visually
- When scope is obviously localized to one or a few files
Do I look at the diff?
- Honestly, almost always. But I don't check every line — I quickly assess which files changed, what was added or removed. This lets me catch moments when the AI "went off track" fast.
Does it produce bad code? Maybe, but there are simple ways to improve quality:
- describe code style in CLAUDE.md (or AGENTS.md)
- describe the architecture of the relevant part
- provide examples of existing similar features as templates
- ask the agent to run typecheck, linter, and prettier when done
On the other hand, there are pitfalls. 27% of companies have banned AI tools at least temporarily over privacy and security concerns. Apple restricted ChatGPT and Copilot. Amazon banned ChatGPT after discovering responses resembling internal data. Samsung had an employee leak confidential information through ChatGPT. Be careful with security. Don't use vibe coding on critical infrastructure. Especially where you can't easily roll back changes.
[IMAGE: simple DO and DON'T infographic for vibe coding]
You might ask — is it even legitimate to use vibe coding at work? Absolutely! First, you save significant energy on simple things. Your brain resources are limited — delegate simple tasks and routine to AI. It'll do it faster, and you can spend your focus on more important stuff. Second, techniques exist beyond vibe coding that significantly improve development quality and reliability.
So what are these methods?
Spec-Driven Development: Structure First
Credentials:
- Formalized by GitHub Engineering Team (GitHub Spec Kit, September 2025)
- Featured in Thoughtworks Technology Radar Volume 33 (November 2025)
- Professional tools: AWS Kiro (public preview July 2025), Tessl Framework (closed beta September 2025)
- Community solutions: BMAD Method (21 specialized agents), OpenSpec (lightweight CLI)
- Used by: Claude Code users, enterprise teams, GitHub Copilot Workspace
How it works:
Write detailed specification BEFORE code. Spec includes requirements, architecture, API contracts, error handling, edge cases. AI executes against the spec. The spec becomes living documentation — often saved as CLAUDE.md or .spec files in project root.
Human focuses on WHAT. AI handles HOW.
This is actually my main approach for large projects. Especially when adding a new section or functionality that didn't exist before. Time spent writing spec is often significant. But it gives good control — modern models follow instructions pretty well. You can vary the degree of freedom for the agent: you can specify file and folder names yourself, or just give an outline of the solution.
After spending half a day on specification, you watch Claude Code finish implementation in 10 minutes. Feels unfair, but the results are solid.
The spec becomes reference for future work. Months later, a new session starts with "read the spec, find the code" — and the agent has full context immediately.
Long-term challenges:
To continue development later, you need to keep documentation current. Specs often start drifting from real code even during initial implementation. Details change, paths get renamed during refactoring. Keeping the spec up to date adds cognitive load. My solution: commit spec changes alongside code changes. Treat documentation as part of the codebase. Instruct the AI agent to always update the document after completing any task.
Pro tip:
Use Claude Desktop for spec development: give it Filesystem MCP for code access, enable web search for current documentation. Brainstorm the solution together with AI, define architecture — and only then ask it to write the spec.
Agentic Coding: High Autonomy
Credentials:
- Academic research: arXiv 2508.11126 "AI Agentic Programming: A Survey" (UC San Diego, Carnegie Mellon, August 2025), arXiv 2512.14012 "Professional Software Developers Don't Vibe, They Control" (University of Michigan, December 2025)
- Ralph Loop created by Geoffrey Huntley (public launch May 2025, viral wave January 2026)
- Tools: Claude Code, Cursor 2.0 Composer (October 2025, up to 8 parallel agents), GitHub Copilot Agent Mode (preview February 2025)
- Official ralph-wiggum plugin from Anthropic (Boris Cherny)
What it is:
Agent operates with high autonomy. Human sets high-level goals, agent figures out implementation. Agent can plan, execute, debug, iterate without constant approval.
Different from vibe coding: agentic coding is systematic. Agent creates a plan, executes it methodically, can course-correct. Vibe coding is reactive prompting without structure.
My take? Skeptical so far.
I'd like to believe in this approach. The idea of extended autonomous sessions sounds amazing. But here's my question: what tasks justify that much autonomous work?
Writing a detailed spec takes me longer than executing it. If Claude Code finishes in 10 minutes after I've spent hours on specification, why would I need 14 hours of autonomy?
I'm skeptical about applications in my projects. Maybe it works for certain domains — large refactors, extensive testing, documentation generation across huge codebases? But even then, I can't imagine Claude Code not handling it in an hour.
The Ralph Loop extreme:
Named after Ralph Wiggum from The Simpsons. The concept: give the agent a task, walk away, return to finished work. Geoffrey Huntley reported 14-hour autonomous sessions.
If you've found great applications for Ralph Loop, I'm genuinely curious. Share your wins in the comments.
The permissions reality:
Agentic coding hits a wall in practice: permissions. Claude Code asks approval for every file write, API call, terminal command. Completely breaks flow. Kills the autonomy promise.
My workarounds: I ask Claude to add all MCP tools to .claude/settings.json proactively — that reduces interruptions. Sometimes I run with --dangerously-skip-permissions, but keep an eye on what's happening.
Try to set up your environment so the agent can't do anything that git reset couldn't fix. This is clearly a problem waiting for a solution. We need better ways to control coding agent actions.
AI Pair Programming: Working Together
Credentials:
- GitHub official positioning: "Your AI pair programmer" (Copilot marketing since 2021)
- Microsoft Learn documentation
- Tools: GitHub Copilot, Cursor, Windsurf
- 720 monthly searches for "ai pair programming"
The promise:
AI as collaborative partner, not just autocomplete. Continuous suggestions while coding. Context-aware completions. Real-time feedback and alternatives. More than tab-completion — understanding project context.
My honest experience:
I've tried AI autocomplete multiple times. Each time, I ended up disabling it completely.
Why? When I'm writing code, I've already mentally worked out what I want. AI suggesting my next line just interrupts my thought process. Standard IDE completions always worked fine for me.
I know many developers love it. Just doesn't fit my workflow.
Where I find real pair programming:
Claude Desktop with good system instructions plus Filesystem MCP to read actual project files. That's when I feel like I'm working WITH someone who understands my problem and actually helps solve it.
Autocomplete is reactive. Real pair programming is proactive — discussion, exploration, questioning assumptions.
The productivity numbers:
GitHub claims 56% faster task completion with AI assistants. Their study shows Copilot users complete 126% more projects per week. Sounds great.
But here's counter-evidence: METR study found experienced open-source developers took 19% LONGER to complete tasks when using AI tools. Completely contradicts the marketing.
The truth probably depends on context. AI effectiveness varies wildly by task type, developer skill with AI tools, and workflow fit. Not universally faster, not universally slower.
Human-in-the-Loop: Strategic Checkpoints
[IMAGE: comic. Robots talking to each other: "It's time to get rid of this flesh bag." Another robot confirms: "Definitely, without him we'd work 1024 times faster." Final panel — large circle of robots passing boxes to each other, and among them one human]
Credentials:
- Atlassian Research: HULA framework (Human-Understanding Large Language Model Agents)
- Formalized in ICSE 2025 paper (arXiv 2411.12924)
- Google Cloud AI documentation
- Implemented in: Claude Code Planning Mode
What it is:
AI operates autonomously BETWEEN checkpoints. Human approves key decisions, reviews output at strategic moments. Not constant supervision — strategic oversight.
Agent proposes approach, human confirms direction. Then agent executes freely until next checkpoint.
Permissions ≠ HITL:
Don't confuse permissions with Human-in-the-Loop. Permissions are too low-level. "Can I write this file?" tells me nothing about what task the agent is actually solving.
Real HITL is Planning Mode. Agent shows the plan: "here's what I'll do, these files will change, here's the expected outcome." That's decision-level control.
The problem with current agents: they don't understand WHEN to stop and ask. Rarely hit the right moment. Either too much autonomy (goes off track) or too many interruptions (breaks flow).
Future improvement area: agents that know when they're uncertain and should consult the human. Like "I don't know" responses — current models aren't great at this in practice.
When to use:
Production code with moderate complexity. When outcome matters but speed also matters. Team environments where others will review anyway. Learning new approaches where you want to see the agent's reasoning.
Medium stakes: not prototype territory (vibe coding works there), not critical infrastructure (TDD territory).
TDD + AI: Quality First
Credentials:
- Adapted from traditional TDD (Kent Beck)
- Modernized for AI era: Qodo.ai blog, Builder.io guide, GitHub Blog (May 2025)
- Quality-focused teams, enterprise production code
How it works:
Write tests BEFORE implementation (classic TDD). AI generates code to pass tests. Tests become executable specification.
Red → Green → Refactor cycle, but AI handles implementation. Tests catch AI mistakes automatically. Tests provide verification without human review of every line.
Tests as specification:
Tests are absolutely critical for key functionality. I always instruct agents to run tests.
But here's the thing: writing comprehensive tests upfront plus detailed spec — that's already 80% of the work. If you've written that much structure, is AI really saving time?
Most valuable when you have existing spec that naturally converts to tests — like API documentation. Then yes, tests-first makes perfect sense.
The guardrails approach:
Tests become safety boundaries for the agent. Agent can iterate freely within test constraints. No need to review every implementation detail. Just verify: tests pass, coverage maintained.
Especially valuable for agentic coding. Let the AI experiment, tests catch the mistakes.
Critical warning:
AI-written tests need human review. I've seen agents write "passing" tests using mocked requests — test passes, code is broken. The test verified syntax, not behavior.
Correct tests = solid foundation. Bad tests = false confidence that destroys future work.
Review test logic before trusting it. Make sure tests verify actual behavior, not just that code runs.
Conclusion
What I typically use:
- Dev tools and experiments: vibe coding works fine.
- Production features: spec-driven with Planning Mode.
- Critical systems: TDD plus extensive review.
- Research and exploration: Claude Desktop as true pair programmer.
Your approach might be different. If you do things differently — different tools, different approaches, different combinations — share your wins in the comments. What works for you as an engineer?