AI Vyuh Code QA
aivyuh codeqa
Self-ScanCase StudyCode Quality

We Scanned Our Own Codebase: 406 Findings in 35 Seconds

We ran our V5 Code QA agents on our own 15,586-line TypeScript codebase. Here's what 5 AI agents found — and what it means for AI-generated code quality.

AI Vyuh Engineering ·

We believe in eating our own cooking. Before offering Code QA scans to customers, we pointed our 5 AI agents at our own codebase — the V5 Vibe Code QA system itself.

Here’s what happened.

The Setup

  • Codebase: V5 Vibe Code QA monorepo
  • Size: 15,586 lines of code across 82 files
  • Primary language: TypeScript (71.4%)
  • Scan type: Tier 1 Quick Scan
  • Scan duration: 35.2 seconds
  • Scan ID: vcq-20260328-003

Overall Result: Grade D

That might seem surprising for our own product, but it’s expected. Our codebase includes intentional vulnerable test fixtures — sample code with known issues used to validate our scanning agents. A clean score would actually mean our test suite is broken.

Findings by Category

CategoryFindingsGrade
Security75F
Dependency7F
Architecture27D
Test Coverage53D
Code Smells244A

Security: 75 findings (Grade F)

The bulk of security findings came from our test-repos/ directory — intentionally vulnerable code samples including hardcoded secrets, SQL injection patterns, and insecure defaults. In production code, we found 3 genuine medium-severity issues related to input validation on API endpoints.

Takeaway: AI-generated code frequently ships with insecure defaults. When you prompt an LLM for “a quick API endpoint,” it rarely includes rate limiting, input sanitization, or proper error handling unless explicitly asked.

Dependency: 7 findings (Grade F)

Our dependency audit caught 2 packages with known CVEs (dev dependencies, not production), 3 packages with no recent maintenance activity, and 2 license compatibility warnings. The F grade is driven by the CVE findings — any known vulnerability tanks the dependency score.

Takeaway: AI assistants suggest popular packages, not necessarily maintained ones. A dependency audit catches what npm audit misses — abandoned packages, license conflicts, and phantom dependencies that AI hallucinated.

Architecture: 27 findings (Grade D)

Our architecture reviewer identified over-abstraction in the MCP server layer (too many indirection levels for the current scale), circular dependency risks between the orchestrator and reporter agents, and several god-functions exceeding 200 lines.

Takeaway: AI-generated code tends to over-abstract. It creates layers of indirection “just in case” — wrapper functions, unnecessary interfaces, and premature abstractions that add complexity without value.

Test Coverage: 53 findings (Grade D)

The test analyzer found mock-heavy test patterns, missing edge case coverage for error paths, and several test files that test implementation details rather than behavior. Many tests were generated by AI and look comprehensive but would pass even if the underlying code broke.

Takeaway: AI writes tests that look good but test the wrong things. The most common pattern: mocking the database/API, then testing that the mock returns what you told it to return. Green tests, broken production.

Code Smells: 244 findings (Grade A)

Despite 244 findings, our smell detector gave us an A. Most findings were informational — consistent naming conventions, reasonable comment density, and minimal copy-paste patterns. The high count is driven by AI generation artifact detection: patterns that indicate code was generated rather than written.

Takeaway: Code smells in AI-generated code are different from human code smells. AI code is often too consistent — identical error handling blocks repeated verbatim, comments that restate the code, and suspiciously uniform formatting.

Severity Breakdown

SeverityCount% of Total
Critical143.4%
High4210.3%
Medium10024.6%
Low12731.3%
Informational12330.3%

The 14 critical findings were all in test fixtures. In production code, the highest severity was Medium.

What We Fixed

Running the scan led us to fix 4 real bugs:

  1. Dependency auditor only checked root package.json — fixed to search recursively through all workspace packages
  2. Reporter path resolution — SCANS_DIR was resolved incorrectly on certain OS configurations
  3. Missing Playwright dependency — needed for PDF report generation, wasn’t in package.json
  4. No offline fallback — report generation failed silently when the API was unreachable; added API-free fallback

The Point

If our own AI-built codebase has 406 findings, what does yours look like?

Every team using AI coding assistants is shipping code that hasn’t been systematically reviewed for the specific quality patterns AI introduces. Traditional linters catch syntax issues. SAST tools catch known vulnerability patterns. But neither catches AI-specific problems: hallucinated APIs, over-abstraction, mock-heavy tests, and phantom dependencies.

That’s what V5 Code QA does — in 60 seconds.


Want to scan your repo? Get a free Quick Scan →


Code quality and security are two sides of the same coin. The AI Vyuh blog explores the AI-generated code quality crisis in depth — the five failure patterns, from hidden security holes to dependency roulette, that we see across hundreds of AI-generated codebases.

Our security team ran a parallel exercise: red-teaming their own AI agent with a 7-agent automated pipeline. They found 2 critical and 1 high severity vulnerability in a system they built themselves — proof that even expert teams benefit from systematic, automated assessment.