Self-ScanCase StudyCode Quality

We Scanned Our Own Codebase: 406 Findings in 35 Seconds

We ran our V5 Code QA agents on our own 15,586-line TypeScript codebase. Here's what 5 AI agents found — and what it means for AI-generated code quality.

AI Vyuh Engineering · 28 March 2026

We Scanned Our Own Codebase: 406 Findings in 35 Seconds — AI Vyuh Code QA Blog

aivyuh codeqa

← Back to Blog

Self-ScanCase StudyCode Quality

We Scanned Our Own Codebase: 406 Findings in 35 Seconds

We ran our V5 Code QA agents on our own 15,586-line TypeScript codebase. Here's what 5 AI agents found — and what it means for AI-generated code quality.

AI Vyuh Engineering · 28 March 2026

We believe in eating our own cooking. Before offering Code QA scans to customers, we pointed our 5 AI agents at our own codebase — the V5 Vibe Code QA system itself.

Here’s what happened.

The Setup

Codebase: V5 Vibe Code QA monorepo
Size: 15,586 lines of code across 82 files
Primary language: TypeScript (71.4%)
Scan type: Tier 1 Quick Scan
Scan duration: 35.2 seconds
Scan ID: vcq-20260328-003

Overall Result: Grade D

That might seem surprising for our own product, but it’s expected. Our codebase includes intentional vulnerable test fixtures — sample code with known issues used to validate our scanning agents. A clean score would actually mean our test suite is broken.

Findings by Category

Category	Findings	Grade
Security	75	F
Dependency	7	F
Architecture	27	D
Test Coverage	53	D
Code Smells	244	A

Security: 75 findings (Grade F)

The bulk of security findings came from our test-repos/ directory — intentionally vulnerable code samples including hardcoded secrets, SQL injection patterns, and insecure defaults. In production code, we found 3 genuine medium-severity issues related to input validation on API endpoints.

Takeaway: AI-generated code frequently ships with insecure defaults. When you prompt an LLM for “a quick API endpoint,” it rarely includes rate limiting, input sanitization, or proper error handling unless explicitly asked.

Dependency: 7 findings (Grade F)

Our dependency audit caught 2 packages with known CVEs (dev dependencies, not production), 3 packages with no recent maintenance activity, and 2 license compatibility warnings. The F grade is driven by the CVE findings — any known vulnerability tanks the dependency score.

Takeaway: AI assistants suggest popular packages, not necessarily maintained ones. A dependency audit catches what npm audit misses — abandoned packages, license conflicts, and phantom dependencies that AI hallucinated.

Architecture: 27 findings (Grade D)

Our architecture reviewer identified over-abstraction in the MCP server layer (too many indirection levels for the current scale), circular dependency risks between the orchestrator and reporter agents, and several god-functions exceeding 200 lines.

Takeaway: AI-generated code tends to over-abstract. It creates layers of indirection “just in case” — wrapper functions, unnecessary interfaces, and premature abstractions that add complexity without value.

Test Coverage: 53 findings (Grade D)

The test analyzer found mock-heavy test patterns, missing edge case coverage for error paths, and several test files that test implementation details rather than behavior. Many tests were generated by AI and look comprehensive but would pass even if the underlying code broke.

Takeaway: AI writes tests that look good but test the wrong things. The most common pattern: mocking the database/API, then testing that the mock returns what you told it to return. Green tests, broken production.

Code Smells: 244 findings (Grade A)

Despite 244 findings, our smell detector gave us an A. Most findings were informational — consistent naming conventions, reasonable comment density, and minimal copy-paste patterns. The high count is driven by AI generation artifact detection: patterns that indicate code was generated rather than written.

Takeaway: Code smells in AI-generated code are different from human code smells. AI code is often too consistent — identical error handling blocks repeated verbatim, comments that restate the code, and suspiciously uniform formatting.

Severity Breakdown

Severity	Count	% of Total
Critical	14	3.4%
High	42	10.3%
Medium	100	24.6%
Low	127	31.3%
Informational	123	30.3%

The 14 critical findings were all in test fixtures. In production code, the highest severity was Medium.

What We Fixed

Running the scan led us to fix 4 real bugs:

Dependency auditor only checked root package.json — fixed to search recursively through all workspace packages
Reporter path resolution — SCANS_DIR was resolved incorrectly on certain OS configurations
Missing Playwright dependency — needed for PDF report generation, wasn’t in package.json
No offline fallback — report generation failed silently when the API was unreachable; added API-free fallback

The Point

If our own AI-built codebase has 406 findings, what does yours look like?

Every team using AI coding assistants is shipping code that hasn’t been systematically reviewed for the specific quality patterns AI introduces. Traditional linters catch syntax issues. SAST tools catch known vulnerability patterns. But neither catches AI-specific problems: hallucinated APIs, over-abstraction, mock-heavy tests, and phantom dependencies.

That’s what V5 Code QA does — in 60 seconds.

Want to scan your repo? Get a free Quick Scan →

Code quality and security are two sides of the same coin. The AI Vyuh blog explores the AI-generated code quality crisis in depth — the five failure patterns, from hidden security holes to dependency roulette, that we see across hundreds of AI-generated codebases.

Our security team ran a parallel exercise: red-teaming their own AI agent with a 7-agent automated pipeline. They found 2 critical and 1 high severity vulnerability in a system they built themselves — proof that even expert teams benefit from systematic, automated assessment.