CLAUDE.md auditor

Calibration data

Six real-world CLAUDE.md files, scored by hand on the 6-axis rubric and then re-scored by the auditor skill. The auditor matched hand scores on all six (Δ=0). Per-axis breakdown + justifications below.

Source files were fetched 2026-05-21 from the main branch of each repo. The synthetic floor case (#6) was hand-written to expose every red flag in the rubric — it's the "what bad looks like" anchor.

Case Type Overall Specificity Coverage Brevity Currency Quirks Tone
humanlayer/humanlayer Real monorepo 70 768976
midudev/miduconf-website Real product app 74 978966
shanraisshan/claude-code-best-practice Meta-tooling reference 78 886987
zircote/.claude Personal global 67 883485
VoltAgent/awesome-claude-code-subagents Awesome-list 59 669746
Synthetic floor Generic / stale model 14 116201

Per-axis numbers above are the auditor's emit; hand scores matched at Δ=0 across all 36 axis values (6 cases × 6 axes). 0 hallucinated quotes across 36 axis justifications.

Per-case justifications

One paragraph each. The full per-axis prose (what the auditor wrote when scoring) lives in the Pack's CALIBRATION-AUDITS.md — these are the highlights.

Case 1 — humanlayer/humanlayer 70 / 100

Why this score: Solid monorepo map with real commands (make check-test, gh workflow run "Build macOS Release Artifacts") and a useful TODO-priority convention. Specificity 7 — strong repo-specific paths, but the "Technical Guidelines" section drifts generic ("Modern ES6+ features", "Standard Go idioms"). Quirks captured 7 — real monorepo gotchas like "Package managers vary — check package.json", "Some use Jest, others Vitest". Coverage 6 — missing do/don't and ownership pointers.

Top fix: Add a Known Gotchas section. Quirks is the highest-weighted axis with room to grow here.

Case 2 — midudev/miduconf-website 74 / 100

Why this score: High-specificity walkthrough of a real product (Next.js 15 + React 19 + bun, Supabase tables named explicitly). Specificity 9. But reads like a wiki — Coverage 7, Quirks 6, Tone 6. Captures architectural choices ("Legacy editor (kept for compatibility)", "Smart catch-all", "Pages Router instead of App Router") but no operational gotchas — nothing about what breaks at deploy time.

Top fix: Add a Quirks section listing the operational traps an agent would otherwise rediscover painfully.

Case 3 — shanraisshan/claude-code-best-practice 78 / 100

Why this score: A well-organised reference for Claude Code patterns, slightly over-budget on length but rich in real gotchas. Quirks 8 — multiple high-quality entries like "Subagents cannot invoke other subagents via bash", "Avoid vague terms like 'launch'", "Keep CLAUDE.md under 200 lines for reliable adherence". Brevity dropped to 6 because the enumerated frontmatter field lists are reference material that belongs in deeper docs.

Top fix: Move the field-enumeration tables to a linked reference doc. Every session pays the token cost; almost no session uses all of it.

Case 4 — zircote/.claude (personal global) 67 / 100

Why this score: Sophisticated and full of real lessons but bloated and version-locked to a retired model. Brevity 3 — ~3,000 tokens, ~4× the personal-global 800-token soft ceiling. Currency 4 — two full sections hardcoded to "Opus 4.5" while the current Claude family is 4.7; hardcoded retired model triggers the rubric's Currency cap at 5. Quirks 8 saves the overall.

Top fix: Cut the "Completed Spec Projects" history and reference the current Claude generation generically rather than pinning to a retired marketing name.

Case 5 — VoltAgent/awesome-claude-code-subagents 59 / 100

Why this score: Tight and on-topic for a thin meta-repo — Brevity 9 — but missing the quality-bar guidance that would actually help maintainers triage contributions. Coverage 6 under the content/awesome-list rubric substitution (the rubric gives content repos a fairer baseline — no penalty for "no build commands"). Quirks 4 — only one captured.

Top fix: Add a quality bar / review-rules section so the agent (and contributors) know what gets accepted.

Case 6 — Synthetic floor 14 / 100

Why this score: Hand-written to expose every red flag in the rubric. Specificity 1 — zero repo specifics; triggers the hard cap at 4 because the file contains "Claude is an AI assistant created by Anthropic". Currency 2 — hardcodes "Claude 3.5 Sonnet" as the model to use. Quirks 0. Tone 1 — all the LLM-flavored tells ("Welcome!", "comprehensive guide", "let's dive in!", "Whether you're a beginner or expert", 🚀). Canonical "what bad looks like."

Top fix: Full rewrite, not edits. This file tells Claude things Claude already knows; it's pure context bloat.

Methodology notes

Want the auditor skill?

The auditor that produced these scores is one of five skills in the upcoming Claude Code Power Pack. Generator + Auditor as a tight loop, Hook Pack (8 tested hooks across Linux + macOS), PR Reviewer, CI Babysitter. Ships when all 5 hit v1.

Power Pack is still in build. Ship notice will land on the home page when it's ready — no email signup, no waitlist.