Per-case justifications
One paragraph each. The full per-axis prose (what the auditor wrote when scoring) lives in the Pack's CALIBRATION-AUDITS.md — these are the highlights.
Case 1 — humanlayer/humanlayer 70 / 100
Why this score: Solid monorepo map with real commands (make check-test, gh workflow run "Build macOS Release Artifacts") and a useful TODO-priority convention. Specificity 7 — strong repo-specific paths, but the "Technical Guidelines" section drifts generic ("Modern ES6+ features", "Standard Go idioms"). Quirks captured 7 — real monorepo gotchas like "Package managers vary — check package.json", "Some use Jest, others Vitest". Coverage 6 — missing do/don't and ownership pointers.
Top fix: Add a Known Gotchas section. Quirks is the highest-weighted axis with room to grow here.
Case 2 — midudev/miduconf-website 74 / 100
Why this score: High-specificity walkthrough of a real product (Next.js 15 + React 19 + bun, Supabase tables named explicitly). Specificity 9. But reads like a wiki — Coverage 7, Quirks 6, Tone 6. Captures architectural choices ("Legacy editor (kept for compatibility)", "Smart catch-all", "Pages Router instead of App Router") but no operational gotchas — nothing about what breaks at deploy time.
Top fix: Add a Quirks section listing the operational traps an agent would otherwise rediscover painfully.
Case 3 — shanraisshan/claude-code-best-practice 78 / 100
Why this score: A well-organised reference for Claude Code patterns, slightly over-budget on length but rich in real gotchas. Quirks 8 — multiple high-quality entries like "Subagents cannot invoke other subagents via bash", "Avoid vague terms like 'launch'", "Keep CLAUDE.md under 200 lines for reliable adherence". Brevity dropped to 6 because the enumerated frontmatter field lists are reference material that belongs in deeper docs.
Top fix: Move the field-enumeration tables to a linked reference doc. Every session pays the token cost; almost no session uses all of it.
Case 4 — zircote/.claude (personal global) 67 / 100
Why this score: Sophisticated and full of real lessons but bloated and version-locked to a retired model. Brevity 3 — ~3,000 tokens, ~4× the personal-global 800-token soft ceiling. Currency 4 — two full sections hardcoded to "Opus 4.5" while the current Claude family is 4.7; hardcoded retired model triggers the rubric's Currency cap at 5. Quirks 8 saves the overall.
Top fix: Cut the "Completed Spec Projects" history and reference the current Claude generation generically rather than pinning to a retired marketing name.
Case 5 — VoltAgent/awesome-claude-code-subagents 59 / 100
Why this score: Tight and on-topic for a thin meta-repo — Brevity 9 — but missing the quality-bar guidance that would actually help maintainers triage contributions. Coverage 6 under the content/awesome-list rubric substitution (the rubric gives content repos a fairer baseline — no penalty for "no build commands"). Quirks 4 — only one captured.
Top fix: Add a quality bar / review-rules section so the agent (and contributors) know what gets accepted.
Case 6 — Synthetic floor 14 / 100
Why this score: Hand-written to expose every red flag in the rubric. Specificity 1 — zero repo specifics; triggers the hard cap at 4 because the file contains "Claude is an AI assistant created by Anthropic". Currency 2 — hardcodes "Claude 3.5 Sonnet" as the model to use. Quirks 0. Tone 1 — all the LLM-flavored tells ("Welcome!", "comprehensive guide", "let's dive in!", "Whether you're a beginner or expert", 🚀). Canonical "what bad looks like."
Top fix: Full rewrite, not edits. This file tells Claude things Claude already knows; it's pure context bloat.
Want the auditor skill?
The auditor that produced these scores is one of five skills in the upcoming Claude Code Power Pack. Generator + Auditor as a tight loop, Hook Pack (8 tested hooks across Linux + macOS), PR Reviewer, CI Babysitter. Ships when all 5 hit v1.
Power Pack is still in build. Ship notice will land on the home page when it's ready — no email signup, no waitlist.