What percentage of AI-generated code has vulnerabilities?

Multiple studies converge on 40-62% of AI-generated code containing security vulnerabilities. Veracode found 45% of AI-generated code introduces critical security flaws across 100+ LLMs. Georgetown CSET found 86% of AI-generated code failed XSS defense and 88% was vulnerable to log injection. A scan of 5,600 vibe-coded apps by Escape.tech found 2,000+ vulnerabilities, 400+ exposed secrets, and 175 PII exposures. The specific rate depends on the model, language, and prompting approach used.

Is AI-generated code less secure than human-written code?

Yes, according to current research. Veracode found AI-generated code is 2.74x more likely to introduce XSS vulnerabilities than human-written code. CodeRabbit's analysis of 470 GitHub PRs found AI code has 1.7x more issues overall. Apiiro's Fortune 50 enterprise study found AI-assisted developers produce 10x more security issues despite producing 3-4x more code. Stanford research also found that developers using AI assistants wrote significantly less secure code and were more likely to believe it was secure — a dangerous false confidence effect.

How many CVEs are caused by AI-generated code?

Georgia Tech's Vibe Security Radar has confirmed 74 CVEs directly attributed to AI coding tools as of March 2026, with the rate nearly doubling month-over-month (6 in January, 15 in February, 35 in March). Researchers estimate the actual number is 5-10x higher — roughly 400-700 AI-introduced CVEs across the open-source ecosystem — because most AI tools don't leave detectable metadata signatures.

← Back to Blog

AI Code QualityVulnerabilitiesResearchVibe Coding

Why 53% of AI-Generated Code Ships with Vulnerabilities

AI-generated code carries 2.74x more XSS vulnerabilities and 86% fails injection defense. What the research says about AI code security scan findings.

AI Vyuh Engineering · 7 April 2026

The vibe coding era has a security problem, and we now have enough data to quantify it.

Across eight major studies from 2024-2026 — spanning Georgetown University, Veracode, Stanford, Georgia Tech, Apiiro, CodeRabbit, Escape.tech, and Backslash Security — a consistent picture emerges: AI-generated code ships faster, but it ships with more vulnerabilities than human-written code. Not marginally more. Measurably, repeatedly, and dangerously more.

This isn’t a hit piece on AI coding tools. They’re transforming software development — 46% of GitHub code is now AI-generated, and the vibe coding market hit $4.7 billion in 2026. But shipping fast without scanning is like driving fast without brakes. Here’s what the research actually says.

The Headline Numbers

Before diving into individual studies, here’s the consolidated picture:

Metric	Finding	Source
AI code with critical security flaws	45%	Veracode, 2025
AI code failing XSS defense	86%	Georgetown CSET, 2024
AI code vulnerable to log injection	88%	Georgetown CSET, 2024
AI code failing SQL injection defense	47%	Georgetown CSET, 2024
XSS vulnerability rate vs. human code	2.74x higher	Veracode, 2025
Overall issues vs. human code	1.7x higher	CodeRabbit, 2025
Security issues at Fortune 50 enterprises	10x increase	Apiiro, 2025
Confirmed CVEs from AI coding tools	74 (est. 400-700 total)	Georgia Tech, 2026
Vulnerabilities in vibe-coded apps	2,000+ across 5,600 apps	Escape.tech, 2025-2026
Functionally correct AND secure AI code	Only 10.5%	ArXiv, 2025

Every one of these studies used different methodologies, different sample sizes, and different models. They all point in the same direction.

Study 1: Georgetown CSET — The XSS Failure

Georgetown University’s Center for Security and Emerging Technology published one of the most rigorous assessments of AI code security in November 2024.

Methodology: Tested 5 LLMs (including GPT-3.5-turbo and GPT-4) using 67 prompts from the LLMSecEval dataset, designed to target the MITRE Top 25 CWE list. Generated C code was analyzed using the ESBMC formal verification tool — not pattern matching, but mathematical proof of vulnerability.

Key findings:

86% of generated code failed XSS defense (CWE-80)
88% was vulnerable to log injection
47% contained SQL injection flaws
All five models — including GPT-4 — produced “similar and severe bugs”

The critical insight: these aren’t edge cases surfaced by adversarial prompts. They’re the default output when you ask an LLM to write security-relevant code. The models consistently generate code that works but isn’t safe.

Study 2: Veracode — 2.74x More XSS, Across 100+ Models

Veracode’s 2025 GenAI Code Security Report is the largest systematic study of AI code vulnerability rates to date.

Methodology: 100+ LLMs tested across 80 coding tasks in 4 languages (Java, JavaScript, Python, C#), measuring 4 CWE types.

Key findings:

Vulnerability Type	AI vs. Human Rate
Cross-site scripting (XSS)	2.74x more likely
Insecure object references	1.91x more likely
Improper password handling	1.88x more likely
Insecure deserialization	1.82x more likely

Language matters: Java had a 72% security failure rate. Python, C#, and JavaScript ranged from 38-45%.

The 2026 update showed limited improvement: security pass rates remain stagnant at 55% industry-wide. Even the best-performing models (GPT-5 and GPT-5 Mini) only reached 70-72%. Most models from Anthropic, Google, and others hover at 50-59%.

As Veracode concluded: “Current LLMs simply aren’t architected to maintain the kind of persistent state and inter-statement reasoning required for robust dataflow analysis.” The models don’t understand the security implications of the code they generate — they pattern-match syntax, not safety.

Study 3: The 5,600-App Study — Vulnerabilities in Production

Escape.tech conducted the largest audit of live vibe-coded applications — apps built with Lovable, Bolt.new, and Base44 that are running in production, used by real people.

Methodology: Scanned 5,600+ publicly accessible vibe-coded applications using a layered discovery approach to identify all exposed assets — hosts, web apps, APIs, schemas.

Key findings:

Finding Type	Count
Total vulnerabilities	2,000+
Exposed secrets (API keys, tokens, credentials)	400+
Exposed PII (including medical records and bank account numbers)	175

Every vulnerability was in production. Not in a lab. Not in a CTF challenge. In apps processing real user data.

The pattern is consistent: AI-generated code ships with hardcoded API keys, missing authentication on API endpoints, exposed database connection strings, and no input validation. These aren’t sophisticated vulnerabilities — they’re security basics that the AI never learned to implement by default.

Study 4: Georgia Tech — The CVE Radar

Georgia Tech’s Vibe Security Radar, launched May 2025, tracks CVEs directly attributable to AI coding tools by tracing vulnerability-fixing commits backward to identify who (or what) introduced the bug.

Key findings as of March 2026:

74 confirmed CVEs directly attributed to AI coding tools
Monthly growth: 6 → 15 → 35 (January → February → March) — nearly doubling each month
Estimated actual total: 400-700 AI-introduced CVEs across the open-source ecosystem (most AI tools don’t leave detectable metadata signatures)

The most alarming trend isn’t the current count — it’s the growth rate. If the doubling trend continues, AI-generated CVEs will become a dominant category in vulnerability databases by late 2026.

Study 5: Apiiro — Enterprise-Scale Impact

Apiiro’s research examined AI-assisted development at Fortune 50 enterprises — not startups experimenting with vibe coding, but the largest companies in the world deploying AI coding tools at scale.

Key findings:

AI-assisted developers produced 3-4x more code but 10x more security issues
322% more privilege escalation paths
153% more design flaws
40% jump in secrets exposure
By June 2025: 10,000+ new security findings per month (10x spike in 6 months)

The paradox: simple syntax mistakes fell 76%, and logic bugs dropped 60%. AI coding tools are genuinely good at eliminating routine errors. But they simultaneously introduced architectural security flaws — the kind that are expensive to find and expensive to fix.

Study 6: CodeRabbit — AI vs. Human PRs

CodeRabbit’s December 2025 analysis compared 470 GitHub pull requests: 320 AI-co-authored versus 150 human-only, using Poisson rate ratios with 95% confidence intervals.

Key findings:

Issue Category	AI vs. Human Rate
Overall issues	1.7x more
Logic/correctness errors	1.75x more
Readability issues	3x+ more
Error handling gaps	~2x more
Security findings	up to 2.74x more

The readability finding is underappreciated. AI-generated code looks clean — consistent formatting, comprehensive comments. But the comments restate the code instead of explaining intent, the error handling blocks are copy-pasted verbatim, and the logic errors are hidden behind a veneer of polish. This creates a false confidence effect.

The False Confidence Problem

Stanford’s foundational 2022 study (Perry, Srivastava, Kumar, Boneh) identified the most dangerous meta-risk: developers using AI assistants write less secure code AND believe it’s more secure.

In their study of 47 participants, those with AI assistant access:

Produced code with more security vulnerabilities
Were more likely to rate their code as secure compared to participants without AI help

This false confidence compounds every vulnerability statistic above. When developers trust AI-generated code more than their own, they review it less. The result: vulnerable code ships faster with less scrutiny.

Backslash Security’s April 2025 study confirmed the mechanism: with “naive” prompts (the way most developers actually use AI tools), all seven tested LLMs generated code vulnerable to at least 4 of 10 common CWEs. Security-focused prompting improved results significantly — Claude 3.7 Sonnet went from 6/10 to 10/10 secure outputs — but most developers don’t prompt for security explicitly.

AI-Generated vs. Human-Written Code: The Comparison Table

Dimension	Human-Written	AI-Generated	Source
XSS vulnerability rate	Baseline	2.74x higher	Veracode
SQL injection rate	Baseline	47% fail rate	Georgetown CSET
Issues per PR	Baseline	1.7x higher	CodeRabbit
Security issues (enterprise)	Baseline	10x higher	Apiiro
Privilege escalation paths	Baseline	322% more	Apiiro
Secrets exposure	Baseline	40% more	Apiiro
Logic errors	Baseline	1.75x more	CodeRabbit
Error handling gaps	Baseline	2x more	CodeRabbit
Developer confidence in security	Accurate self-assessment	Overconfident	Stanford
Syntax errors	Baseline	76% fewer	Apiiro
Simple logic bugs	Baseline	60% fewer	Apiiro

The pattern is clear: AI excels at surface-level code quality (syntax, formatting, simple bugs) while introducing deeper architectural and security issues. It trades routine errors for critical vulnerabilities.

The Technical Debt Accelerator

A 2026 ArXiv study analyzed 6,275 public GitHub repositories containing 304,362 verified AI-authored commits. The finding: unresolved technical debt climbed from a few hundred issues in early 2025 to 110,000+ surviving issues by February 2026.

AI-generated code doesn’t just ship with vulnerabilities — it accumulates technical debt at an accelerated rate. The same pattern-matching that produces working code fast also produces:

Duplicated logic instead of abstractions
Over-engineered error handling that catches everything and handles nothing
Dependency sprawl — AI suggests popular packages, not necessarily maintained or secure ones
Test suites that test the mocks — green tests that would pass even if the underlying code broke

This debt compounds. Each AI-generated module that ships without review becomes a liability that’s harder to audit, harder to patch, and harder to replace.

What This Means for Your Codebase

If you’re shipping AI-generated code — and statistically, you probably are — the question isn’t whether your codebase has vulnerabilities. It’s whether you’ve found them.

The research points to three non-negotiable practices:

1. Scan everything. Every AI-generated PR needs automated security scanning before merge. Not just linting — static analysis for the OWASP Top 10 vulnerability classes. The AppSec Santa study found 78% of confirmed vulnerabilities were caught by only one scanning tool, which means you need multiple scanners, not just one.

2. Prompt for security explicitly. Backslash Security showed that security-focused prompting dramatically improves AI code quality. Include security requirements in your prompts: “implement input validation for all user inputs,” “never hardcode API keys,” “use parameterized queries for all database access.”

3. Review AI code more critically, not less. The Stanford false confidence effect is real. AI-generated code’s polished appearance makes it more dangerous to skip review, not less. The vulnerabilities are architectural, not syntactic — they won’t show up in a quick skim.

The 53% statistic isn’t a scare tactic — it’s a measurement. And the measurement says your AI-generated code probably has vulnerabilities that no one has looked for.

We built AI Vyuh Code QA specifically for this problem: automated scanning of AI-generated codebases by specialized agents that understand the vulnerability patterns LLMs introduce. Five agents, five scan categories (security, dependencies, architecture, test coverage, code smells), results in under a minute.

The vibe coding revolution doesn’t have to be a security crisis. But it requires treating AI-generated code as what it is: fast first drafts that need expert review before they ship.

Sources

Georgetown CSET, “Cybersecurity Risks of AI-Generated Code,” November 2024
Veracode, “2025 GenAI Code Security Report,” 2025
Veracode, “Spring 2026 GenAI Code Security Update,” 2026
Escape.tech, “The State of Security of Vibe Coded Apps,” 2025-2026
Georgia Tech SSLab, “Vibe Security Radar,” May 2025-present
Apiiro, “4x Velocity, 10x Vulnerabilities,” 2025
CodeRabbit, “State of AI vs Human Code Generation,” December 2025
Stanford University (Perry et al.), “Do Users Write More Insecure Code with AI Assistants?,” 2022
Backslash Security, LLM Code Security Study, April 2025
ArXiv, Large-Scale GitHub AI Code Analysis, October 2025
ArXiv, AI-Authored Technical Debt Study, 2026

Why 53% of AI-Generated Code Ships with Vulnerabilities

The Headline Numbers

Study 1: Georgetown CSET — The XSS Failure

Study 2: Veracode — 2.74x More XSS, Across 100+ Models

Study 3: The 5,600-App Study — Vulnerabilities in Production

Study 4: Georgia Tech — The CVE Radar

Study 5: Apiiro — Enterprise-Scale Impact

Study 6: CodeRabbit — AI vs. Human PRs

The False Confidence Problem

AI-Generated vs. Human-Written Code: The Comparison Table

The Technical Debt Accelerator

What This Means for Your Codebase

Stop Shipping Blind

Further Reading

Sources