Skip to main content

Command Palette

Search for a command to run...

Vibe Coding Is Producing the Worst Codebases I've Ever Reviewed

I've reviewed 40+ AI-generated PRs in the last three months. Here's the pattern nobody on LinkedIn wants to talk about.

Updated
14 min read
Vibe Coding Is Producing the Worst Codebases I've Ever Reviewed

I need to vent.

I have spent the last three months reviewing pull requests that were, depending on who you ask, either "shipped at 10x speed using Cursor" or "vibe coded in a weekend by our PM, just clean it up real quick." And I have to say something that's apparently controversial in 2026:

A lot of this code is genuinely awful. Not "junior developer with potential" awful. Not "needs a few rounds of feedback" awful. Actively dangerous, would-fail-a-bootcamp-final-project awful. And it's getting merged. Everywhere.

I'm not anti-AI. I literally wrote a whole post about my AI stack. I use Claude Code daily. I have a 200-line CLAUDE.md in every project I touch. The point isn't that AI tools are bad. The point is that "vibe coding," the actual practice Karpathy named where you "fully give in to the vibes" and "forget the code even exists," is producing a generation of codebases that nobody can maintain, including the people who shipped them.

Let me show you what I keep seeing.


The Stats Aren't Subtle

Before the rant, the receipts. Because every time I bring this up, someone says "you're just one engineer, what's your sample size."

Here's the broader sample size:

A December 2025 CodeRabbit analysis of 470 open-source GitHub PRs found that AI co-authored code contained roughly 1.7x more major issues than human-written code, with 2.74x more security vulnerabilities and 75% more misconfigurations. METR ran a randomized controlled trial with experienced open-source devs on real codebases and found that the AI-assisted group was measurably slower. The kicker: even after the experiment, they still believed AI had made them faster. Broader survey data shows 95% of developers report feeling productive while measurably producing lower-quality code.

41% of developers admit to pushing AI-generated code to production without full review. Forrester predicts 75% of enterprises will face moderate-to-high technical debt directly attributable to AI-driven rapid development by end of 2026. Stack Overflow ran an analysis literally titled "AI can 10x developers... in creating tech debt."

Karpathy himself, the guy who coined the term, declared vibe coding "passe" in February 2026 and proposed "Agentic Engineering" as its successor with the not-very-subtle thesis that you actually need oversight and review.

So no, this isn't just my vibes about your vibes. The data is there. Now let's talk about what it actually looks like in code.


Pattern 1: The Component That Does Everything And Knows Nothing

The single most common thing I see in vibe-coded React PRs is a 600-line component that someone "iterated on" for 45 minutes by saying things like "now add a search bar" and "make it work with the cart" and "add loading state please."

The result is always the same shape:

jsx

export default function ProductsPage() {
  const [products, setProducts] = useState([]);
  const [filtered, setFiltered] = useState([]);
  const [search, setSearch] = useState('');
  const [loading, setLoading] = useState(true);
  const [error, setError] = useState(null);
  const [cart, setCart] = useState([]);
  const [selectedCategory, setSelectedCategory] = useState('all');
  const [sortBy, setSortBy] = useState('name');
  const [page, setPage] = useState(1);
  const [showModal, setShowModal] = useState(false);
  const [selectedProduct, setSelectedProduct] = useState(null);
  const [user, setUser] = useState(null);
  const [favorites, setFavorites] = useState([]);
  const [recentlyViewed, setRecentlyViewed] = useState([]);
  // ... 8 more useStates
  
  useEffect(() => {
    fetch('/api/products').then(r => r.json()).then(data => {
      setProducts(data);
      setFiltered(data);
      setLoading(false);
    });
  }, []);

  useEffect(() => {
    setFiltered(products.filter(p => 
      p.name.toLowerCase().includes(search.toLowerCase())
    ));
  }, [search, products]);

  useEffect(() => {
    if (user) {
      fetch(`/api/favorites/${user.id}`).then(r => r.json()).then(setFavorites);
    }
  }, [user]);

  // ... 6 more useEffects
  
  // 400 more lines
}

Every state. In one component. Twelve useEffects firing in cascade. No memoization (which is fine in 2026 with the compiler, except this isn't the compiler's job, this is just bad data flow). The cart lives at the same level as the search input. The user fetch happens inside a page that doesn't even need to know about auth. It's all "working" in the sense that it renders without errors. It's also impossible to refactor without breaking five other things, because nobody knows what depends on what.

When I ask the person who shipped it, "why is favorites state in this component?", the answer is always some variation of "the AI added it when I asked for the favorite button." That's not architecture. That's accumulation.


Pattern 2: Hallucinated APIs That Pass Code Review Because Nobody Checks

This one keeps me up at night. I have personally caught the following in PRs in the last quarter:

  • A call to useFormState from React (it's useActionState since 19, but a Cursor session pulled in a stale doc)

  • A tailwind.config.ts plugin that doesn't exist on npm at all (the AI made up a package name and the dev installed something similar that did exist, which was a different package by a different author)

  • A Zustand selector pattern from v3 inlined into a v5 store, which silently broke subscription behavior in a way that only showed up under load

  • A Next.js revalidate config used inside a Server Action where it does literally nothing

None of these would survive five minutes of actual reading. They all survived code review because nobody read them. The PR description said "added X feature using Cursor agent" and the reviewer hit Approve because the tests passed. The tests passed because the AI wrote those too.

This is the loop, and it's everywhere: AI writes the code, AI writes the tests, human reviews the diff for "vibes," human merges. Three of those four steps don't involve anyone understanding what's happening. The fourth step is theater.


Pattern 3: Security Theatre Built On A Hardcoded Foundation

Genuinely I have lost count of how many AI-generated files I've found with credentials in them. API keys. Database passwords. JWT secrets. Stripe keys, plural. One memorable PR included a .env.example file with the actual production values copy-pasted in, which the AI helpfully labeled "for reference."

The data backs this up: AI-generated code fails to defend against XSS 86% of the time per multiple studies, and a vibe-coded app earlier this year leaked 1.5 million API keys and 35,000 user emails because the owner shipped without writing a single line manually. There's a documented case of an indie dev who built a SaaS entirely with Cursor, watched users start "bypassing the subscription, creating random shit on db," couldn't debug it because he didn't write it, and shut the product down.

But it's not just the catastrophic failures. It's the thousand small things:

  • Auth checks on the client only ("the AI generated a useAuth hook so we're good")

  • SQL queries built with string concatenation because the AI defaulted to the simplest example

  • File uploads with no MIME validation (the prompt was "let users upload an image," not "let users upload an image safely")

  • CORS set to * in production because that's what made local dev work

  • dangerouslySetInnerHTML rendering user content with no sanitization, because the prompt was "render the bio as HTML"

The AI did exactly what it was asked. The human didn't ask for safety. So safety isn't there.


Pattern 4: The Test Suite That Tests Nothing

This is the one that's most depressing because it actively hides the other problems.

jsx

describe('CheckoutForm', () => {
  it('renders without crashing', () => {
    render(<CheckoutForm />);
  });

  it('has a submit button', () => {
    render(<CheckoutForm />);
    expect(screen.getByRole('button')).toBeInTheDocument();
  });

  it('calls onSubmit when submitted', () => {
    const onSubmit = jest.fn();
    render(<CheckoutForm onSubmit={onSubmit} />);
    fireEvent.click(screen.getByRole('button'));
    expect(onSubmit).toHaveBeenCalled();
  });
});

That's the entire test suite for a checkout form that processes payments. It tests that the component renders. It tests that a button exists. It tests that a click handler fires. It does not test:

  • Whether the form actually validates the credit card number

  • Whether it handles a declined payment correctly

  • Whether duplicate submissions are prevented

  • Whether the cart total matches what gets charged

  • Whether tax is calculated correctly for the user's region

  • Whether anything happens to the order after payment succeeds

Coverage report: 87%. Looks fine in CI. Catches absolutely zero real bugs. And because the AI generated it, the dev has zero intuition for what should be tested. The test suite isn't an artifact of thinking about the system. It's an artifact of asking "write tests for this component," which the AI obliged with the most generic possible scaffolding.

The Stack Overflow phrase that haunts me is "AI can 10x developers in creating tech debt." This is exactly that. The illusion of safety, with none of the actual safety.


Pattern 5: Inconsistency As A Way Of Life

This is the subtle one, and it's the one that kills codebases over six months.

You ask the AI to fetch data on Monday and get an async/await with try/catch. You ask for similar fetching on Wednesday and get a .then().catch() chain. Friday you get a custom hook that wraps SWR. Next Monday you get a Server Component with use(). Same person, same project, same week. Four different patterns for the same problem, all "working," all merged, all now part of the codebase forever.

Multiply that across a team of five people each running their own AI sessions and you get a codebase where:

  • Form submission happens five different ways

  • State is sometimes in Zustand, sometimes in Context, sometimes in URL params, sometimes hoisted into a parent component for no reason

  • File names are sometimes kebab-case, sometimes PascalCase, sometimes camelCase, occasionally snake_case

  • Some components use named exports, some use default, some export both

  • Error handling is sometimes thrown, sometimes returned as a tuple, sometimes silently swallowed

And here's the thing. The AI doesn't have an opinion. It will happily generate whatever pattern you nudge it toward in that particular session. So unless your team has hard, enforced conventions in a CLAUDE.md or .cursorrules, you don't have a codebase. You have an anthology.


"But It Works"

This is the response I get every time I bring this up. "The feature shipped. Users are using it. Stop being a gatekeeper."

Cool, here's the thing nobody wants to say out loud. Working is the lowest possible bar for software.

Software has to keep working when:

  • A dependency releases a breaking change

  • A new requirement comes in that needs you to understand the existing code

  • Traffic doubles

  • A user does something the original prompt didn't anticipate

  • Three months pass and the original "developer" has forgotten what they shipped

  • The original "developer" leaves the company

A vibe-coded codebase fails every one of those tests. The original Cursor prompts aren't documented anywhere. The architectural decisions weren't decisions, they were accidents. The "developer" who shipped it can't read their own code, and I am being literal, I have asked people to walk me through their PR and watched them open the file and squint.

Forrester's 75% number isn't a prediction about the future. That technical debt exists right now, in your repo, accumulating interest. The bill comes due the first time something actually breaks at 2 AM and the on-call engineer pulls up a 600-line component with no tests, no comments, and no git blame older than three weeks because the entire feature was vibe coded in one session.


The Annoying Part Is That AI Is Genuinely Good

I want to be very clear about this because the discourse is so binary right now. AI-assisted development, done well, is transformatively productive. I've written about exactly how I do it. Plan Mode, Skills files, CLAUDE.md, code review on every diff, treating the AI as a fast junior who needs supervision.

That's not vibe coding. That's just using better tools. The distinction Simon Willison drew is the right one: if you've reviewed, tested, and understood every line, you're using an LLM as a typing assistant. That's fine. That's good, even.

Vibe coding is the other thing. It's specifically the practice of not understanding what got generated. It's the founder who can't explain their own auth flow. It's the PM who shipped a feature without ever opening the file. It's the senior dev who approved the PR because "the tests passed and Cursor is usually right."

The reason this matters is that the industry is currently conflating these two things. Companies are hiring "vibe coders" and writing job descriptions that brag about not needing engineers. YC startups are showing off codebases that are 95% AI-generated like it's a feature. And the post-mortems are starting to roll in, and they all sound the same: "we shipped fast, then we couldn't fix anything, then we had to rewrite it from scratch."


What Actually Helps

If you want to use AI tools and not produce a dumpster fire, here's what works. None of this is novel. All of it requires you to care.

Write a CLAUDE.md or .cursorrules file before you write a feature. Tech stack, conventions, mistakes to avoid. Update it every time you correct the AI. This single file is the difference between consistent output and an anthology.

Use Plan Mode. Always. Architecture decisions deserve a reviewable artifact. Letting the AI start typing immediately is how you end up with a 600-line page component that holds the entire app state.

Review every line. Not the diff summary. Not the test output. The actual code. If you can't explain what a function does, you don't ship it. This is non-negotiable, and it's the single line that separates "AI-assisted developer" from "vibe coder."

Write your own tests, or at least your own test plan. Tell the AI what to test, don't ask it to figure out what to test. The AI does not know what's important about your business logic. You do.

Refactor as you go. When the AI piles state into one component, extract it. When it duplicates a pattern, abstract it. When it picks a pattern that conflicts with the rest of the codebase, make it match. This is the actual work. The AI typed faster, you still have to think.

Treat the AI like a contractor, not a colleague. It will confidently do exactly what you said. It will not push back on bad ideas. It will not ask "are you sure about this?" before adding dangerouslySetInnerHTML. That judgment is your job.


The Part Where I Get Yelled At In The Comments

I know this whole post reads as "old man yells at AI." It isn't. I'm 100% on board with the productivity gains. My own workflow is heavily AI-assisted and I'm not going back.

What I'm not on board with is the abdication of judgment that "vibe coding" specifically describes. The whole point of the term is that you stop caring about the code. And it turns out, when nobody cares about the code, the code is bad. This is not a complicated insight. We've known it since long before LLMs. Every "no-code" platform, every "low-code" platform, every offshore-it-and-forget-it strategy has hit the exact same wall, just slower.

The wall is maintenance. The wall is correctness. The wall is the thing that breaks at 2 AM and nobody can fix because nobody understood it in the first place.

If you're using AI tools well, none of this rant applies to you. Carry on, you're doing it right.

If you're vibe coding, please at least read your own PRs before merging them. Your future self, your future teammates, and the people who get paged when your auth flow breaks will thank you.


If this hit a nerve, drop a horror story in the comments. I'm collecting them. The worst one I've heard so far involves a startup that lost their entire production database because an AI agent ignored a code freeze directive. You probably can do better.