Skip to content
Loading benchmark data…

AI Governance
in the Wild

The first public governance scan of 20 major AI-assisted open source projects — RST scores, DevContract clause pass rates, and verifiable Development Receipts. Not a hall of shame. A baseline.

Governance Leaderboard

Sorted by RST health score. Click any column header to re-sort. Filter by category.

# Repository RST Score ↓ Verdict Clauses Findings Agent Configs Receipt

DC-A01/A02 marked N/A for repos with no agent config files — not applicable, not failing. DC-S01 failures in example/tutorial repos reflect documentation patterns, not live credentials.

Distribution Analysis

How governance posture distributes across the 20-repo cohort.

Key Findings

What 20 repos reveal about where AI development governance stands today.

01

Star count is not a governance proxy

The two most-starred repos in the cohort (75k and 65k stars) scored 58 and 55 respectively — the lowest two scores overall. Meanwhile, pydantic-ai (8k stars) achieved the highest score at 91. Community popularity and engineering rigour are orthogonal dimensions.

02

35% of repos have detectable AI agent config files

7 of 20 repos contain .claude/, .kiro/, .clinerules, or similar agent configuration artefacts — a significant jump from 12 months ago. The AI-assisted development footprint is now visible in OSS repos at scale.

03

alwaysAllow grants are the most common agent config finding

Every repo with agent config files had at least one non-empty alwaysAllow list (DC-A02 failure). This is consistent with how these tools ship their example configurations — but it means the example becomes the default, and the default is permissive. The CVE-2025-59536 pattern is in the wild.

04

Secret detection flags documentation patterns, not live credentials

DC-S01 failures in cookbook and tutorial repos are almost entirely documentation examples (sk-abc123… style placeholders). The scanner distinguishes high-entropy values from placeholder patterns, but the clause still fires. Known tradeoff in repository-level scanning vs runtime secrets management.

05

Full DC-v1 compliance is achievable — 3 repos achieved it

fastmcp, pydantic-ai, and anthropic-cookbook-adjacent projects that passed all 6 clauses share a common profile: actively maintained by small, focused engineering teams with a culture of rigour. Full compliance correlates with team discipline more than project size or funding.

06

CI presence (DC-G02) is nearly universal; test suites (DC-G03) less so

17 of 20 repos have a CI pipeline configured. But 6 of 20 fail DC-G03 (test suite present), and of those, 4 have over 10k stars. As AI-generated code accelerates output, the test coverage gap widens.

Methodology

How the benchmark was produced and what it does (and doesn't) measure.

Scanner: ticketyboo-scanner v1, deep scan mode

All 20 repos were scanned using identical settings: deep scan (all 7 analysis layers), default DevContract DC-v1 clause set, single scan per repo captured April 2026. Scans are point-in-time; results may diverge as repos evolve.

DevContract clauses applied uniformly

Six clauses from the DC-v1 default set were evaluated: DC-S01 (no hardcoded secrets), DC-A01 (no enableAllProjectMcpServers), DC-A02 (no blanket alwaysAllow), DC-G01 (README present), DC-G02 (CI pipeline present), DC-G03 (test suite present). Clauses DC-A01 and DC-A02 are marked N/A for repos without agent config files.

Development Receipts issued for all scans

Each scan produced a SHA-256 self-attested Development Receipt stored at ticketyboo.dev/api/receipt/{scan_id}. The receipt hash is tamper-evident — any modification to the receipt body will invalidate the attestation. All 20 receipts in this benchmark verified clean.

Context matters — example repos vs production codebases

Tutorial and cookbook repos (anthropic-cookbook, openai-cookbook) are structurally different from production frameworks. Their DC-S01 findings are almost entirely documentation examples, not exploitable credentials. We report scores factually but note this context throughout.

How does your repo compare?

Run a free scan. Get your RST score, DevContract verdict, and a verifiable Development Receipt — in under 90 seconds.