AI Vyuh Code QA
aivyuh codeqa
AI Code QualityVulnerabilitiesResearchVibe Coding

Why 53% of AI-Generated Code Ships with Vulnerabilities

AI-generated code carries 2.74x more XSS vulnerabilities and 86% fails injection defense. What the research says about AI code security scan findings.

AI Vyuh Engineering ·

The vibe coding era has a security problem, and we now have enough data to quantify it.

Across eight major studies from 2024-2026 — spanning Georgetown University, Veracode, Stanford, Georgia Tech, Apiiro, CodeRabbit, Escape.tech, and Backslash Security — a consistent picture emerges: AI-generated code ships faster, but it ships with more vulnerabilities than human-written code. Not marginally more. Measurably, repeatedly, and dangerously more.

This isn’t a hit piece on AI coding tools. They’re transforming software development — 46% of GitHub code is now AI-generated, and the vibe coding market hit $4.7 billion in 2026. But shipping fast without scanning is like driving fast without brakes. Here’s what the research actually says.


The Headline Numbers

Before diving into individual studies, here’s the consolidated picture:

MetricFindingSource
AI code with critical security flaws45%Veracode, 2025
AI code failing XSS defense86%Georgetown CSET, 2024
AI code vulnerable to log injection88%Georgetown CSET, 2024
AI code failing SQL injection defense47%Georgetown CSET, 2024
XSS vulnerability rate vs. human code2.74x higherVeracode, 2025
Overall issues vs. human code1.7x higherCodeRabbit, 2025
Security issues at Fortune 50 enterprises10x increaseApiiro, 2025
Confirmed CVEs from AI coding tools74 (est. 400-700 total)Georgia Tech, 2026
Vulnerabilities in vibe-coded apps2,000+ across 5,600 appsEscape.tech, 2025-2026
Functionally correct AND secure AI codeOnly 10.5%ArXiv, 2025

Every one of these studies used different methodologies, different sample sizes, and different models. They all point in the same direction.


Study 1: Georgetown CSET — The XSS Failure

Georgetown University’s Center for Security and Emerging Technology published one of the most rigorous assessments of AI code security in November 2024.

Methodology: Tested 5 LLMs (including GPT-3.5-turbo and GPT-4) using 67 prompts from the LLMSecEval dataset, designed to target the MITRE Top 25 CWE list. Generated C code was analyzed using the ESBMC formal verification tool — not pattern matching, but mathematical proof of vulnerability.

Key findings:

  • 86% of generated code failed XSS defense (CWE-80)
  • 88% was vulnerable to log injection
  • 47% contained SQL injection flaws
  • All five models — including GPT-4 — produced “similar and severe bugs”

The critical insight: these aren’t edge cases surfaced by adversarial prompts. They’re the default output when you ask an LLM to write security-relevant code. The models consistently generate code that works but isn’t safe.


Study 2: Veracode — 2.74x More XSS, Across 100+ Models

Veracode’s 2025 GenAI Code Security Report is the largest systematic study of AI code vulnerability rates to date.

Methodology: 100+ LLMs tested across 80 coding tasks in 4 languages (Java, JavaScript, Python, C#), measuring 4 CWE types.

Key findings:

Vulnerability TypeAI vs. Human Rate
Cross-site scripting (XSS)2.74x more likely
Insecure object references1.91x more likely
Improper password handling1.88x more likely
Insecure deserialization1.82x more likely

Language matters: Java had a 72% security failure rate. Python, C#, and JavaScript ranged from 38-45%.

The 2026 update showed limited improvement: security pass rates remain stagnant at 55% industry-wide. Even the best-performing models (GPT-5 and GPT-5 Mini) only reached 70-72%. Most models from Anthropic, Google, and others hover at 50-59%.

As Veracode concluded: “Current LLMs simply aren’t architected to maintain the kind of persistent state and inter-statement reasoning required for robust dataflow analysis.” The models don’t understand the security implications of the code they generate — they pattern-match syntax, not safety.


Study 3: The 5,600-App Study — Vulnerabilities in Production

Escape.tech conducted the largest audit of live vibe-coded applications — apps built with Lovable, Bolt.new, and Base44 that are running in production, used by real people.

Methodology: Scanned 5,600+ publicly accessible vibe-coded applications using a layered discovery approach to identify all exposed assets — hosts, web apps, APIs, schemas.

Key findings:

Finding TypeCount
Total vulnerabilities2,000+
Exposed secrets (API keys, tokens, credentials)400+
Exposed PII (including medical records and bank account numbers)175

Every vulnerability was in production. Not in a lab. Not in a CTF challenge. In apps processing real user data.

The pattern is consistent: AI-generated code ships with hardcoded API keys, missing authentication on API endpoints, exposed database connection strings, and no input validation. These aren’t sophisticated vulnerabilities — they’re security basics that the AI never learned to implement by default.


Study 4: Georgia Tech — The CVE Radar

Georgia Tech’s Vibe Security Radar, launched May 2025, tracks CVEs directly attributable to AI coding tools by tracing vulnerability-fixing commits backward to identify who (or what) introduced the bug.

Key findings as of March 2026:

  • 74 confirmed CVEs directly attributed to AI coding tools
  • Monthly growth: 6 → 15 → 35 (January → February → March) — nearly doubling each month
  • Estimated actual total: 400-700 AI-introduced CVEs across the open-source ecosystem (most AI tools don’t leave detectable metadata signatures)

The most alarming trend isn’t the current count — it’s the growth rate. If the doubling trend continues, AI-generated CVEs will become a dominant category in vulnerability databases by late 2026.


Study 5: Apiiro — Enterprise-Scale Impact

Apiiro’s research examined AI-assisted development at Fortune 50 enterprises — not startups experimenting with vibe coding, but the largest companies in the world deploying AI coding tools at scale.

Key findings:

  • AI-assisted developers produced 3-4x more code but 10x more security issues
  • 322% more privilege escalation paths
  • 153% more design flaws
  • 40% jump in secrets exposure
  • By June 2025: 10,000+ new security findings per month (10x spike in 6 months)

The paradox: simple syntax mistakes fell 76%, and logic bugs dropped 60%. AI coding tools are genuinely good at eliminating routine errors. But they simultaneously introduced architectural security flaws — the kind that are expensive to find and expensive to fix.


Study 6: CodeRabbit — AI vs. Human PRs

CodeRabbit’s December 2025 analysis compared 470 GitHub pull requests: 320 AI-co-authored versus 150 human-only, using Poisson rate ratios with 95% confidence intervals.

Key findings:

Issue CategoryAI vs. Human Rate
Overall issues1.7x more
Logic/correctness errors1.75x more
Readability issues3x+ more
Error handling gaps~2x more
Security findingsup to 2.74x more

The readability finding is underappreciated. AI-generated code looks clean — consistent formatting, comprehensive comments. But the comments restate the code instead of explaining intent, the error handling blocks are copy-pasted verbatim, and the logic errors are hidden behind a veneer of polish. This creates a false confidence effect.


The False Confidence Problem

Stanford’s foundational 2022 study (Perry, Srivastava, Kumar, Boneh) identified the most dangerous meta-risk: developers using AI assistants write less secure code AND believe it’s more secure.

In their study of 47 participants, those with AI assistant access:

  • Produced code with more security vulnerabilities
  • Were more likely to rate their code as secure compared to participants without AI help

This false confidence compounds every vulnerability statistic above. When developers trust AI-generated code more than their own, they review it less. The result: vulnerable code ships faster with less scrutiny.

Backslash Security’s April 2025 study confirmed the mechanism: with “naive” prompts (the way most developers actually use AI tools), all seven tested LLMs generated code vulnerable to at least 4 of 10 common CWEs. Security-focused prompting improved results significantly — Claude 3.7 Sonnet went from 6/10 to 10/10 secure outputs — but most developers don’t prompt for security explicitly.


AI-Generated vs. Human-Written Code: The Comparison Table

DimensionHuman-WrittenAI-GeneratedSource
XSS vulnerability rateBaseline2.74x higherVeracode
SQL injection rateBaseline47% fail rateGeorgetown CSET
Issues per PRBaseline1.7x higherCodeRabbit
Security issues (enterprise)Baseline10x higherApiiro
Privilege escalation pathsBaseline322% moreApiiro
Secrets exposureBaseline40% moreApiiro
Logic errorsBaseline1.75x moreCodeRabbit
Error handling gapsBaseline2x moreCodeRabbit
Developer confidence in securityAccurate self-assessmentOverconfidentStanford
Syntax errorsBaseline76% fewerApiiro
Simple logic bugsBaseline60% fewerApiiro

The pattern is clear: AI excels at surface-level code quality (syntax, formatting, simple bugs) while introducing deeper architectural and security issues. It trades routine errors for critical vulnerabilities.


The Technical Debt Accelerator

A 2026 ArXiv study analyzed 6,275 public GitHub repositories containing 304,362 verified AI-authored commits. The finding: unresolved technical debt climbed from a few hundred issues in early 2025 to 110,000+ surviving issues by February 2026.

AI-generated code doesn’t just ship with vulnerabilities — it accumulates technical debt at an accelerated rate. The same pattern-matching that produces working code fast also produces:

  • Duplicated logic instead of abstractions
  • Over-engineered error handling that catches everything and handles nothing
  • Dependency sprawl — AI suggests popular packages, not necessarily maintained or secure ones
  • Test suites that test the mocks — green tests that would pass even if the underlying code broke

This debt compounds. Each AI-generated module that ships without review becomes a liability that’s harder to audit, harder to patch, and harder to replace.


What This Means for Your Codebase

If you’re shipping AI-generated code — and statistically, you probably are — the question isn’t whether your codebase has vulnerabilities. It’s whether you’ve found them.

The research points to three non-negotiable practices:

1. Scan everything. Every AI-generated PR needs automated security scanning before merge. Not just linting — static analysis for the OWASP Top 10 vulnerability classes. The AppSec Santa study found 78% of confirmed vulnerabilities were caught by only one scanning tool, which means you need multiple scanners, not just one.

2. Prompt for security explicitly. Backslash Security showed that security-focused prompting dramatically improves AI code quality. Include security requirements in your prompts: “implement input validation for all user inputs,” “never hardcode API keys,” “use parameterized queries for all database access.”

3. Review AI code more critically, not less. The Stanford false confidence effect is real. AI-generated code’s polished appearance makes it more dangerous to skip review, not less. The vulnerabilities are architectural, not syntactic — they won’t show up in a quick skim.


Stop Shipping Blind

The 53% statistic isn’t a scare tactic — it’s a measurement. And the measurement says your AI-generated code probably has vulnerabilities that no one has looked for.

We built AI Vyuh Code QA specifically for this problem: automated scanning of AI-generated codebases by specialized agents that understand the vulnerability patterns LLMs introduce. Five agents, five scan categories (security, dependencies, architecture, test coverage, code smells), results in under a minute.

The vibe coding revolution doesn’t have to be a security crisis. But it requires treating AI-generated code as what it is: fast first drafts that need expert review before they ship.


Further Reading

The AI Vyuh blog covers the broader pattern behind these findings: the AI-generated code quality crisis no one is talking about, and why vibe coding security risks are compounding as AI-assisted development accelerates.


Sources

  • Georgetown CSET, “Cybersecurity Risks of AI-Generated Code,” November 2024
  • Veracode, “2025 GenAI Code Security Report,” 2025
  • Veracode, “Spring 2026 GenAI Code Security Update,” 2026
  • Escape.tech, “The State of Security of Vibe Coded Apps,” 2025-2026
  • Georgia Tech SSLab, “Vibe Security Radar,” May 2025-present
  • Apiiro, “4x Velocity, 10x Vulnerabilities,” 2025
  • CodeRabbit, “State of AI vs Human Code Generation,” December 2025
  • Stanford University (Perry et al.), “Do Users Write More Insecure Code with AI Assistants?,” 2022
  • Backslash Security, LLM Code Security Study, April 2025
  • ArXiv, Large-Scale GitHub AI Code Analysis, October 2025
  • ArXiv, AI-Authored Technical Debt Study, 2026