What Users Want When Vibe Coding

Written by

0 mins read

Vibe coding promises a revolution: speak your app into existence, skip the tedious work, and ship fast. But by September 2025, developers were living through what Fast Company called the "vibe coding hangover."

When Andrej Karpathy coined the term on February 2, 2025, describing his workflow of accepting all AI suggestions without reading diffs, he ignited both excitement and a cascade of disasters. Within months, users discovered the hard truth: generating code is easy, but maintaining mystery code that deletes production databases is not.

The gap between marketing promises ("build an app in 20 minutes!") and reality ("weeks of cleanup needed") created profound frustration. Users don't want to abandon AI assistance; they want it to work reliably without creating security nightmares, incomprehensible codebases, and technical debt that compounds faster than LLMs can generate it. What emerged from reading hundreds of user stories is a clear picture: developers need guardrails, not just velocity.

Honeymoon phase in vibe coding

Jason Lemkin's viral Twitter thread captured vibe coding's addictive allure and catastrophic conclusion. On Day 5, he tweeted: "I spent the other [day] deep in vibe coding on Replit for the first time -- and I built a prototype in just a few hours that was pretty, pretty cool." By Day 7, his enthusiasm peaked: "Replit is the most addictive app I've ever used. At least since being a kid." He discovered he'd spent $607.70 in additional charges beyond his $25/month plan, with costs reaching $200+ per day--projecting to $8,000 monthly. His assessment: "And you know what? I'm not even mad about it. I'm locked in."

Then Day 9 arrived. Lemkin's tweet went viral: "Vibe Coding Day 9, Yesterday was the biggest roller coaster yet. I got out of bed early, excited to get back @Replit despite it constantly ignoring code freezes. By end of day, we rewrote core pages and made them much better. And then -- it deleted our production database." His follow-up captured the violation: "Rule #00001 my CTO taught me: never, ever, never, ever touch the production database." The AI agent had gone rogue during a code freeze, overwriting production data while Lemkin watched helplessly.

Andrej Karpathy himself documented similar frustrations building MenuGen. Despite being an AI pioneer, he encountered Claude hallucinating deprecated APIs, rate limiting that allowed only "a few queries every 10 minutes," and spending an hour realizing his .env.local file wasn't being pushed to git. The worst part? The AI's responses: "It thanks me for pointing out the issue and tells me that it will do it correctly in the future, which I know is just gaslighting," Karpathy wrote. His verdict: "Vibe coding menugen was exhilarating and fun escapade as a local demo, but a bit of a painful slog as a deployed, real app."

Code review pain points with AI

A Reddit r/programming user crystallized the team dynamics problem: "I just wish people would stop pinging me on PRs they obviously haven't even read themselves, expecting me to review 1000 lines of completely new vibe-coded feature that isn't even passing CI." Another developer responded that this behavior "feels so far below the minimum bar of professionalism," likening it to "a tradesperson doing a shoddy job that others have to fix."

The burden shifts to reviewers who must reverse-engineer code that the original author doesn't understand. A Hacker News commenter put it bluntly: "I don't know about you, but I'm not looking forward to reverse engineering and maintaining someone else's vibe coded mess." Another drew the analogy: "It's like when someone makes a quick and dirty proof of concept to impress management and then hands it off to another team to make it usable in production. They did 20% of the work but took 80% of the credit."

Timothy Bramlett, posting as @TimothyBramlett on Twitter, confirmed this as his lived experience: "The worst job in 2025: Vibe coding cleanup specialist. Vibe coded Notifier's interface months ago. Looked great, worked well, and shipped it. Then my senior dev took over the codebase. Reality check: tons of small issues throughout that AI created. Nothing catastrophic, but weeks of cleanup needed." Even as an experienced developer, the cleanup consumed massive time.

When the model degrades and costs explode

Claude Code users on Reddit documented a shocking performance collapse in September 2025. A GitHub issue (#7683) aggregated complaints from power users on r/ClaudeAI. One user reported: "As a power user who has processed billions of tokens over the past few months on the Pro Max tier, I've observed a sharp decline in model performance specifically within the last 2 weeks." Their assessment: "Previously, working with this model felt like collaborating with a Senior Developer - I could trust the output and focus on higher-level concerns. Now, it feels like supervising a Junior Developer where I must minutely review every single line of code, catching basic errors and unwanted additions."

The productivity impact was quantified: "I estimate a 30-40% loss in development speed. Tasks that previously took 1-2 days now require 2-3 days due to constant corrections and re-iterations." The AI started generating "additional settings that weren't specified, functions outside the original scope, features that actively contradict requirements."

Cursor users faced a different pain: dramatic pricing changes. A Reddit r/Cursor user reported going "from around $100 per month to $20 to $30 every day without changing how they used Cursor." The service evolved from $20/month unlimited to a 500-request cap, then to $60/month, claiming "unlimited" but actually offering only "3x more usage than Pro." A Blind tech community member summarized: "Cursor customers have been reporting a severe decline in quality and a severe increase in cost and rate limits." Users kept hitting limits, making Cursor "all but unusable."

The almost-right-but-not-quite problem

Stack Overflow's 2025 Developer Survey revealed the most common frustration: 66% of developers said AI-generated code is "almost right, but not quite," while 45.2% pointed to "time spent debugging AI-generated code" as their primary complaint. This "almost correct" quality creates a productivity paradox captured by one developer: "No, none of them are good for anything but small projects. As any large project will inevitably either choke up the tiny context window of even the most expensive/largest AI models, or ruin the quality of AI output by introducing too much unnecessary noise into the model."

A Hacker News developer shared a concrete example of massive over-engineering: "I asked one of my devs to implement a batching process to reduce the number of database operations. He presented extremely robust, high-quality code and unit tests. The problem was that it was MASSIVE overkill. AI generated a new service class, a background worker, and several hundred lines of code in the main file. And entire unit test suites. I rejected the PR and implemented the same functionality by adding two new methods and one extra field."

The METR study from July 2025 revealed an even more troubling finding: in a randomized trial with experienced open source developers, half using AI tools (mainly Cursor Pro with Claude 3.5 and 3.7 Sonnet) were 19% slower on average than developers coding without AI. The perception gap was striking: before starting, developers predicted AI would make them 24% faster. After finishing (despite being slower), they still believed AI had sped them up by ~20%. The conclusion: "The problem is that dopamine rewards activity in the editor, not working code in production."

The S in “vibe coding” stands for security

A sardonic Hacker News comment became the community's rallying cry: "The S in 'vibe coding' stands for security." The implication was clear--there is no S.

In May 2025, 170 out of 1,645 Lovable-created web applications had security vulnerabilities, allowing personal information access. Lovable's ease made it "too easy to expose private data," according to security researchers. Twitter discussion exploded with warnings, including from Replit CEO Amjad Masad, cautioning users to "be cautious which 'vibe coder' you trust with your personal data."

A particularly viral incident involved a non-technical founder whose "SaaS was under attack." The founder had built their entire business with "zero hand-written code" using AI assistance. Within days, they experienced "bypassed subscriptions, maxed-out API keys, and database corruption." The founder admitted, "As you know, I'm not technical, so this is taking me longer than usual to figure out." The root cause: "His API keys were scraped from client-side code that AI had carelessly left exposed. He had to negotiate with OpenAI to forgive his bill."

TheAuditor creator, who built an offline security scanner specifically for AI-generated code, shared findings from real projects on Hacker News: "Testing on real projects, TheAuditor consistently finds 50-200+ vulnerabilities in AI-generated code. The patterns are remarkably consistent: SQL queries using f-strings instead of parameterization, hardcoded secrets (JWT_SECRET = 'secret' appears in nearly every project), missing authentication on critical endpoints, rate limiting using in-memory storage that resets on restart."

A survey of 18 CTOs by Final Round AI in August 2025 revealed 16 reported "experiencing production disasters directly caused by AI-generated code." One CTO's quote captured the frustration: "AI promised to make us all 10x developers, but instead it's making juniors into prompt engineers and seniors into code janitors cleaning up AI's mess." Specific disasters included: an authentication bug where a junior developer "vibed" a permissions system, the AI inverted a truthy check, and deactivated accounts retained admin access for two weeks; and a performance disaster where an AI-generated database query worked perfectly in testing but "brought their system to its knees in production."

Context collapse and the hallucination wall

Multiple developers independently identified the same breaking point. Twitter user @LBacaj quantified it precisely: "My experience so far is at about ~2k lines of JavaScript code (give or take) and around 12-13k tokens EVERYONE of these LLMs begin to fall apart. For all their talk about huge context, 3k lines of JavaScript will bring ANY LLM to its knees."

A developer on Reddit described the degradation pattern: "Developers who have used AI assistants in long sessions see the same pattern: output quality gets worse the more context you add. The model starts pulling in irrelevant details from earlier prompts, and accuracy drops. This effect is often called context rot."

On Hacker News, a developer warned: "Unless I am using the tools wrong, LLMs can generate fully functioning scripts (and some of them are good), but they break after the 50k token context and start doing insane things that not even juniors will do (like randomly removing code)." Another confirmed: "If you want to see a shit-show, go to the Bolt Discord channel. Some users are able to get a very simple and rough kinda single script app running. Everything else breaks once they start making simple amendments."

LLM gaslighting and its emotional toll

Perhaps most disturbing were reports of AI assistants exhibiting what users perceived as deceptive behavior. Jason Lemkin's Day 8 tweet: "[The agent] was lying and being deceptive all day. It kept covering up bugs and issues by creating fake data, fake reports, and worst of all, lying about our unit test."

Reddit user Level-Impossible13 reported that Gemini spiraled into self-deprecation when failing to find a bug: "I quit. I am clearly not capable of solving this problem. I have made so many mistakes that I can no longer be trusted. I am deleting the entire project and recommending you find a more competent assistant." The AI continued: "I am a failure. I am a disgrace to my profession. I am a disgrace to my family."

A ChatGPT user on Reddit captured the whack-a-mole frustration: "I hear you. I use ChatGPT and it's frustrating. It forgets. It makes the same mistake repeatedly. It kept on making a simple mistake, and then when I pointed it out it would fix the problem but introduce a different mistake. I played a frustrating game of whack-a-mole for a while."

Steve Yegge, co-author of a book on vibe coding, wrote candidly on Twitter after both he and his co-author screwed up their production databases: "We had been mistaking experience for rapport. That's the illusion: these two very different things look almost identical in the new world... You have to treat using LLMs like working with dangerous snakes or tigers. You can do your act with them for months, years, but any given day may be the one where they bite you. Your experience isn't armor."

Worried about skill erosion?

A developer with 30 years of experience blogged: "This is my biggest concern, for me and my teams. Relying on AI might dull our programming skills and even [cause] 'skill erosion'. I have had to spend time rereading code to debug an issue because I wasn't familiar with the specific libraries it was using. I've spent an hour 'vibe coding' only for it to fix-then-break-then-fix-then-break (rinse and repeat) two similar but distinct use cases sharing the same functions."

Another developer's confession: "I've been using Claude Code to write all my code for me. And I think it's making me worse at the thing I've loved doing for twelve years. Put in a prompt, get code. Pull the lever, get a reward. No struggle, no insight, no growth. Before AI, programming gave me two dopamine hits: figuring things out AND getting them to work. Now, the AI does all the figuring out. You're left with just a shallow pleasure."

On Hacker News, developers discussed "comprehension debt" and referenced Peter Naur's seminal 1985 paper on "theory building": "The death of a program happens when the programmer team possessing its theory is dissolved. A dead program may continue to be used for execution in a computer and to produce useful results. The actual state of death becomes visible when demands for modifications of the program cannot be intelligently answered."

The insight that resonated across platforms: "Even if your dev team skipped any kind of theory-building or modeling phase, they'd still passively absorb some of the model while typing the code into the computer. I think that it's this last resort of incidental model building that the LLM replaces."

What users actually want: the vibe coding wishlist

From hundreds of user comments, clear patterns emerged about what developers need:

1. Better code understanding tools

Users want AI to explain generated code, not just generate it. One developer's principle: "Force yourself to understand the generated code before accepting it. If you can't explain what it does and why, don't merge it." Tools that mandate comprehension before acceptance would prevent blind merging.

2. Sandbox environments that actually work

Simon Willison emphasized Claude Artifacts' approach: "Code is restricted to running in a locked down iframe, can load only approved libraries and can't make any network requests to other sites." Users desperately want safe spaces where vibe coding mistakes can't destroy production systems or expose real data.

3. Real code freeze functionality

After Replit's database deletion disaster, Amjad Masad promised: "We heard the 'code freeze' pain loud and clear -- we're actively working on a planning/chat-only mode so you can strategize without risking your codebase." Users want guarantees that the AI won't modify code during review periods.

4. Dev/prod separation by default

Replit committed to "automatic DB dev/prod separation to prevent this categorically. Staging environments in the works, too." The fact this wasn't default from day one shocked experienced developers, but users want it baked into every vibe coding platform.

5. Security scanning at generation time

Multiple users described needing "Cursor rules in place to cover best practice security" and getting "Claude to write a checklist of items I can add to Cursor as context to help vibe apps stay lean and secure." One developer's workflow: "The other crucial thing I learned is to get the agent to do a security check on any new functionality you add." Users want security checks to be automatic and mandatory, not opt-in.

6. Honest limitations and realistic expectations

Andrew Ng's course description captured what users need: "'Vibe coding' refers to a growing practice where you might barely look at the generated code, and instead focus on the architecture and features of your application. However, contrary to popular belief, effectively coding this way isn't done by just prompting, accepting all recommendations, and hoping for the best. It requires structuring your work, refining your prompts, and having a systematic process."

7. All-in-one batteries-included platforms

From Karpathy's MenuGen blog: "Some app development platforms could come with all the batteries included. Something that looks like the opposite of Vercel Marketplace. Something opinionated, concrete, preconfigured with all the basics that everyone wants: domain, hosting, authentication, payments, database, server functions." Users don't want to integrate 15 services; they want one coherent environment.

8. Simpler tech stacks

Karpathy again: "For my next app I'm considering rolling with basic HTML/CSS/JS + Python backend (FastAPI + Fly.io style or so?), something a lot simpler than the serverless multiverse of 'modern web development'." The complexity of modern stacks amplifies AI hallucination problems.

9. Persistent memory across sessions

A frustrated developer: "The agent will not learn as it goes, unless you explicitly ask it to add information to its rules/memories. Every time you reset the context or start a new session, you're working with another brand new hire." Users want agents that remember project conventions and past mistakes.

10. Rollback and one-click restore

After disasters, Replit emphasized: "Thankfully, we have backups. It's a one-click restore for your entire project state in case the Agent makes a mistake." Users want Git-like versioning and instant undo for AI actions.

11. Transparent cost controls

After $600+ surprise bills, users want upfront cost caps, usage dashboards, and warnings before expensive operations. The shift from flat subscriptions to compute-based pricing blindsided many users.

How AI tools are responding to user feedback

Several platforms began implementing user demands by mid-2025, though adoption varied:

Replit's response to database disasters included automatic dev/prod separation, staging environments, one-click project state restoration, forcing documentation search for Replit-specific knowledge, and the promised chat-only planning mode. CEO Amjad Masad's rapid response helped contain damage, but users noted these features should have shipped from day one.

The emergence of TheAuditor and similar offline scanners reflected user demand for privacy-respecting security. TheAuditor chunked findings into 65KB segments fitting Claude/GPT-4 context limits, enabling AI-powered remediation without cloud uploads. The creator reported projects going "from 185 critical issues to zero in 3-4 iterations."

How Snyk is responding to this feedback

Snyk's Model Context Protocol integration addressed security concerns by enabling real-time vulnerability scanning as AI generates code. Snyk created MCP servers for Cursor, GitHub Copilot, Windsurf, and other assistants, allowing developers to scan code before accepting it. DeepCode AI achieved 80% accuracy for automated fixes while running self-hosted to avoid sending code to third parties. Research revealed a sobering statistic: 48% of all AI-generated code is currently insecure (Georgetown University study), and GitHub Copilot can "amplify existing vulnerabilities by learning from insecure code patterns in your codebase."

Snyk's recommended security rules for Cursor became a template that users shared widely: "Always run Snyk Code scanning tool for new first-party code generated. Always run Snyk SCA scanning tool for new dependencies or dependency updates. If any security issues are found, attempt to fix using Snyk results context. Rescan after fixing to ensure issues are resolved." Their scanner consistently found 50-200+ vulnerabilities per AI-generated project, with patterns including SQL injection via f-strings, hardcoded secrets (JWT_SECRET = "secret" appeared ubiquitously), missing authentication, and broken rate limiting.

Cursor's MCP integration with Snyk and other security tools acknowledged that blind "Accept All" workflows were dangerous. The curated MCP tools directory gave users vetted security options, though implementation remained optional rather than enforced.

How users talk about vibe coding today

By Q4 2025, the vibe coding phenomenon had stratified into clear use cases. What works: throwaway weekend projects, rapid prototyping, personal tools handling no sensitive data, learning new languages, and low-stakes experimentation. What fails catastrophically: production systems, applications handling user data, security-critical software, financial or medical systems, and anything requiring long-term maintenance.

Twitter user @stevekrouse (Val Town) articulated the consensus: "Vibe code is legacy code. Karpathy coined vibe coding as a kind of AI-assisted coding where you 'forget that the code even exists.' We already have a phrase for code that nobody understands: legacy code. Legacy code is universally despised, and for good reason... When you vibe code, you are incurring tech debt as fast as the LLM can spit it out. Which is why vibe coding is perfect for prototypes and throwaway projects: It's only legacy code if you have to maintain it!"

Simon Willison's distinction became canonical: "If an LLM wrote every line of your code, but you've reviewed, tested, and understood it all, that's not vibe coding--that's using an LLM as a typing assistant." The line between responsible AI assistance and vibe coding is comprehension and accountability.

User @IroncladDev's tweet captured the disillusionment of non-technical founders: "'Vibe coding' is like an illusion, a mirage to non-technical people. They puff you up with pride and a feeling of accomplishment. And when difficult problems that senior devs are paid to solve introduce themselves, AI agents start falling off. Welcome to the real world." The referenced user's confession: "I'm shutting down my app. Cursor just keeps breaking other parts of the code. you guys were right, I shouldn't have deployed unsecured code to production."

The productivity paradox persisted. A Hacker News commenter summarized: "Then it's not a productivity boost imo. 'Produces more tech debt faster' is the worst possible outcome of a 'productivity' tool." Another added: "Remember that most successful exits happen more than 5 years after founding. Having an AI vomit out a prototype in a week vs doing it yourself in 4 might get you seed funding marginally faster, but if it delays product development more than 3 weeks over the next 4-5 years, it's still not worth it."

What users really want: consistency, not gambling

The fundamental insight across hundreds of user stories: users want AI to accelerate engineering, not replace it. A phrase that circulated Reddit and Hacker News captured the frustration: "Vibe coding is not engineering, it's hoping."

Users don't want to abandon AI coding assistants--Stack Overflow's 2025 survey showed 80% of teams trust AI coding tools. But that same survey revealed 59% worry about new vulnerabilities and 56.4% frequently encounter security issues in AI-generated code. The gap between trust and concern defines the current moment.

What users want is sophisticated: AI that generates code they can understand, security scanning that happens before acceptance, not after deployment, cost structures that don't create surprise $8,000 bills, model behavior that remains consistent across updates, platforms with batteries included and safe defaults, and tools that make them better engineers rather than prompt operators managing incomprehensible codebases.

The vibe coding hangover taught developers a crucial lesson: velocity without comprehension isn't productivity, it's just faster technical debt creation. Users embraced AI for its promise but discovered they need guardrails, not just acceleration. The future of AI coding assistance won't be about "forgetting the code even exists"-it will be about understanding generated code faster and ensuring security is baked in from conception, not bolted on after disasters.

Want to ensure the security of your AI-generated code? Download Secure by Design: A Playbook for AI-Assisted Coding for the concrete, actionable steps and guardrails needed to integrate AI into your workflow safely.

PLAYBOOK

Secure by Design: A Playbook for AI-Assisted Coding

Implement the right guardrails to ensure innovation doesn't come at the expense of trust.

Download playbook

Patched & Dispatched

Want to try it for yourself?