Case StudyAI Code QualityVibe CodingScan Findings

2,873 findings across 11 repos in under 2 minutes — here's the pattern

We scanned 11 repos from one early deployment in under 2 minutes. The aggregate: 2,873 line-level findings, 135 unique issues, 2 high-severity. Here's what AI-generated code keeps doing — and what V5 looks for.

AI Vyuh Engineering · 28 April 2026

2,873 findings across 11 repos in under 2 minutes — here's the pattern — AI Vyuh Code QA Blog

aivyuh codeqa

← Back to Blog

Case StudyAI Code QualityVibe CodingScan Findings

2,873 findings across 11 repos in under 2 minutes — here's the pattern

AI Vyuh Engineering · 28 April 2026

We installed V5 Code QA on an 11-repo portfolio from one of our earliest deployments. The full scan finished before someone could refill a coffee.

Here’s what came back — and what it tells you about AI-generated code in production today.

The shape of the result

Metric	Value
Repositories scanned	11
Total scan time (across all repos)	~118 seconds
Median per-repo turnaround	~7 seconds
Line-level findings (raw)	2,873
Unique issues after dedup	135
High-severity unique issues	2
Medium-severity unique issues	45
Low + informational unique issues	88
Portfolio grade	C

Two numbers matter. The 2,873 is what shows up if you look at every line one of our agents flagged across every scan — that’s the surface area of “things worth a second look.” The 135 is what you actually triage on Monday morning, after we collapse near-duplicates by (category | severity | file | title).

Both are honest. They answer different questions:

How much noise sits in this codebase? → 2,873
What does the team actually fix? → 135 (1 already marked reviewed, 134 open)

If you’ve ever installed a SAST tool and watched the dashboard light up with thousands of “issues” you instantly stopped reading, you know why we surface both numbers. The triage number is the one that should drive your week.

Three findings worth pulling forward

We can’t share the repos themselves. We can share the classes of finding, because they’re the same classes we see across every AI-heavy codebase we’ve looked at:

1. Hardcoded test secret in a CI fixture

A literal-looking API token committed to a fixtures file the test runner loads at startup. Not a high-entropy random string in .env.example — an actual-looking token in a path the CI pipeline reads. AI-generated test scaffolding often does this because the model doesn’t distinguish between “a placeholder for a secret” and “a secret.” The remediation is a one-line change. The detection is what saves you the next time someone clones the repo.

2. A `requests` dependency pinned to a version with a known CVE

The dependency was three minor versions behind a published CVE fix. The repo’s last commit was recent — this isn’t an abandoned project. It’s a healthy project that updates dependencies when something breaks, not when a CVE drops. Our agents flagged it with the upgrade path and the published advisory; the cost of the fix is an upgrade and a re-run of the test suite.

3. A public API handler with no tests at all

Not “low coverage.” Zero tests. The handler validated input, made a database call, and returned JSON — exactly the surface you’d want covered. Our test-coverage agent caught it as a structural gap, not a coverage-percentage delta. AI-generated code is excellent at writing test files; it is consistently weaker at making sure the tests cover the things that fail in production.

None of these are exotic findings. They’re the same things a careful senior engineer would catch on a thorough review — if they had two days and the will to do it.

What V5 is looking for that “AI-aware” actually means

Most static analyzers were tuned on years of human-written code. Our agents are tuned on what AI assistants ship now:

Pattern	What it looks like in the wild
Hallucinated APIs	A function call to a method that doesn’t exist on the imported library — the model invented it because it sounded right
Mock-heavy tests	A test file that mocks the database, mocks the API, and then asserts the mock returned what you told it to return. Green build, broken production.
Over-abstraction	Three layers of wrapper functions for a single one-line operation. The model added “in case you want to swap implementations later.”
Phantom dependencies	An import statement for a package that isn’t in `package.json`. Works on the developer’s machine because something else installed it transitively.
Suspiciously uniform code	Identical error-handling blocks repeated across dozens of files. Looks reviewed. Isn’t.

In this 11-repo scan, the 45 medium-severity issues were a roughly even mix of those five patterns plus dependency drift. The 2 high-severity issues were the kind that warrant a meeting: the hardcoded secret and a CVE-bearing dependency. The remaining 88 were grade-C rough edges — fixable in a Friday afternoon.

What this is not

This is not “AI-generated code is broken.” It’s not. The repos in this scan are running. The code works. Real users hit it. The story is more boring and more useful than that:

AI-generated code ships faster than anyone can review it line-by-line. The patterns it introduces are different from the patterns humans introduce. So the review tool has to be tuned for those patterns — not for what SAST caught in 2017.

The reason we ran this scan in the first place is because the team behind it wanted to know whether the V5 agents would find anything substantive on a real working codebase. They did. The team triaged what they found. They moved on.

That’s the loop. That’s what $200 buys you on a Tier 2 audit. That’s what $0 buys you on a free first scan.

See your own numbers

If you want to know what 11 of your repos look like:

Install the V5 GitHub App → — first scan free, no credit card, repos under 100K LOC
See your aggregate dashboard at codeqa.aivyuh.com/t/{your-installation} — every scan, every grade, every finding
If your repo is bigger than the free tier or you want a written audit, book a Tier 2 Standard Audit — $200–$500 one-off, written report

We’ll publish more pattern breakdowns from this and other deployments as the corpus grows. The goal isn’t to scare anyone — it’s to put numbers on what every team using AI assistants already suspects.

This post draws on aggregate telemetry from a Tier-0 internal-partner deployment. No tenant identity, repository name, or contributor identity is disclosed. Numbers are exact; descriptive details are deliberately generic to protect the source.