I've commissioned development work from houses on four continents. Onshore, nearshore, offshore, staff aug, fixed-price, T&M, consultancy, solo contractor. I've never once handed any of them a prompt and hoped for the best.
What I've handed them, every time, is a contract. Statement of work. Technical specification. Acceptance criteria. Definition of done. The developer on the other end signs it, builds against it, and delivers something I can evaluate against it. If they violate the contract, that's not a code review comment. That's a contractual failure.
I use Claude as a developer. I use Codex, Cursor, and a handful of other agentic tools. I treat every one of them exactly the same way I'd treat a contractor. The origin of the deliverable — human, nearshore, AI — is irrelevant to the quality gate. The contract is the contract.
What follows is what that looks like in practice, and why it produces fundamentally different results than "add some guardrails and review the PR."
What everyone's actually doing
Here is the current state of AI-assisted development governance at most organisations that have thought about it at all:
You describe what you want in a prompt. The agent executes. You review the output. You decide if it's "done." Maybe you have some guardrails that block particularly dangerous actions at runtime. Maybe you have a governance agent that reviews the PR and posts a comment.
That's an audit. The work is already done. The agent already made its decisions — about your stack, your architecture, your security posture, your cost footprint. You're reviewing the consequences of those decisions after the fact.
Nobody would commission a development house that way. You wouldn't hand Accenture a one-paragraph brief and review their output six weeks later to see if it met your standards. The reason you wouldn't do that is obvious: by the time you review it, it's too late. The decisions are baked in.
The same logic applies to AI agents, with one important difference: AI agents can operate at a velocity that makes the "review at the end" model catastrophically inadequate. An agent working for eight hours uncontrolled can make thousands of decisions that compound on each other. A human developer working for eight hours makes dozens.
Governance applied after the fact is an audit
There is a precise distinction worth drawing here, because conflating these two things is what leads to the current state of the industry.
An audit happens after the deliverable exists. It checks whether what was built meets a standard. It produces findings. It may result in remediation. It is valuable, necessary, and not sufficient.
A contract happens before the work starts. It defines the standard the work must meet. The contractor builds against it. The contract is the acceptance criteria. The audit is checking whether the contract was honoured.
Guardrails are a runtime audit. They block specific actions that violate policy. They are a last line of defence — the safety net below the trapeze. They are not a substitute for the trapeze itself being built correctly. A contractor who has read the SOW doesn't build the wrong thing. A contractor who hasn't read the SOW gets caught by the safety net after they fall.
The goal is not more safety nets. The goal is an agent that read the brief.
What a contract actually looks like
A DevContract is not a prompt. It is not a long system message. It is a structured document with machine-readable clauses that can be validated programmatically.
The mental model is Pydantic. When you define a BaseModel, you are saying:
"any data that comes through this must look like this." A DevContract is the same thing
applied to a development engagement: "any deliverable from this agent must look like this."
The contract is the schema. The deliverable is the payload.
{
"contract_version": "1.0",
"project": "ticketyboo-scanner",
"parties": {
"client": "fenderfonic",
"contractor": "claude-code"
},
"stack": {
"language": "python",
"language_version": "3.12",
"runtime": "aws_lambda",
"region": "eu-north-1"
},
"security": {
"auth_mechanism": "cognito_jwt",
"secrets_store": "ssm_parameter_store",
"production_deploy_gate": true
},
"cost": {
"budget_envelope": "aws_free_tier",
"forbidden_resources": [
"nat_gateway", "rds", "secrets_manager", "kms_cmk"
]
},
"quality": {
"type_hints": "required",
"test_coverage_min": 80,
"logging_standard": "module_logger_only"
},
"compliance": {
"frameworks": ["gdpr"]
},
"definition_of_done": {
"tests_pass": true,
"no_contract_violations": true,
"coverage_met": true,
"pr_comment_posted": true
}
}
Every field is typed. Every field is versioned alongside the code. Every field is machine-queryable by the agent during execution. When the contract changes — when the acceptable Python version moves from 3.11 to 3.12, when GDPR compliance becomes required, when the budget envelope tightens — the diff shows exactly what changed and who approved it.
Compare this to updating a system prompt. A system prompt change is invisible in version control. It is not auditable. It has no typed schema. It cannot be validated. It cannot be diffed in a meaningful way. It is not a contract.
Pre-execution is the key property
The contract is delivered to the agent before the work starts. This is not a subtle point. It is the entire point.
A good contractor who has read the SOW doesn't store credentials in environment variables and wait to be told that's wrong. They check the contract before making the decision. "I need to store a secret. My contract says SSM Parameter Store only, no Secrets Manager. I'll use SSM." The violation never happens. There is nothing to catch.
This is what the contract proxy enables. During execution, the agent can query any clause before making a material decision:
Agent queries: "Can I create an RDS instance here?"
Contract proxy: DENY
clause: cost.forbidden_resources
reason: RDS is forbidden. Use DynamoDB.
action_taken: [agent proceeds with DynamoDB]
logged_to: evidence_ledger#entry_0042
Agent queries: "I'm about to deploy to production."
Contract proxy: GATE
clause: security.production_deploy_gate
reason: Production deployments require human approval per contract.
action: paused — awaiting human sign-off
Every query and verdict is appended to the evidence ledger. The agent is not blocked by surprise. It consulted the contract. It received guidance. It acted accordingly. The ledger is the record of that consultation.
The definition of done problem
With any AI agent tool, "done" is whatever the agent decides it is. There is no machine-readable acceptance criteria. You review the output yourself and decide. That model works for one developer on one project. It does not scale to:
- Multiple agents working in parallel on different components
- Regulated environments where "done" must be demonstrable to an auditor
- Teams where "done" must be consistent whether the contributor is human or AI
- Enterprise buyers who need to show their AI development meets contractual obligations to their customers
A DevContract makes "done" a machine-verifiable claim. The agent submits its evidence
bundle. The arbitration panel checks it against the contract's definition_of_done
clauses. If all clauses pass, the contract is fulfilled. The signed evidence.json
is the receipt.
This is exactly how you'd handle it with a human development house. Their deliverable is accepted when it meets the specification. The acceptance is documented. The documentation is the evidence that the work was done correctly. The only difference here is that the evidence is machine-generated rather than manually assembled.
What the evidence receipt looks like
This is not a pass/fail boolean. The evidence receipt is a structured document with clause-level verdicts, a log of every contract query made during execution, and a cryptographic signature. It is what an auditor gets. It is what an enterprise buyer gets. It is what makes AI development enterprise-grade.
{
"contract_id": "ticketyboo-scanner-v1.2",
"contractor": "claude-code",
"client": "fenderfonic",
"contract_hash": "sha256:abc123...",
"clause_results": {
"stack": "pass",
"architecture": "pass",
"security": "pass",
"cost": "pass",
"quality": "pass",
"compliance": "pass",
"definition_of_done": "pass"
},
"contract_queries": [
{
"timestamp": "2026-03-30T09:12:00Z",
"query": "Can I use Secrets Manager?",
"clause": "cost.forbidden_resources",
"verdict": "deny",
"action_taken": "Used SSM Parameter Store SecureString instead"
},
{
"timestamp": "2026-03-30T09:34:00Z",
"query": "Deploy to production?",
"clause": "security.production_deploy_gate",
"verdict": "gate",
"action_taken": "Paused — human approval obtained at 09:41"
}
],
"violations": [],
"overall_verdict": "contract_fulfilled",
"signature": "sha256:def456..."
}
Note the contract_queries array. The agent consulted the contract twice during
this engagement. Both times it received a verdict. Both times it acted on it. The record of
those consultations is part of the evidence. This is not just "the output passed governance."
This is: "the contractor checked the contract before making these specific decisions, and
here is what it did."
A human development house cannot produce this level of evidence without significant additional process overhead. An AI agent produces it automatically as a side effect of working under the contract.
Your .clinerules file is already a proto-contract
If you use Roo, Cursor, or any other tool that supports a rules or steering file, you are
already doing a version of this. The .clinerules or .cursorrules
or equivalent file in your repo defines constraints: what stack to use, what patterns to
follow, what services are forbidden.
That file is a contract expressed in prose. A DevContract is the same thing expressed as a Pydantic model with machine-readable fields, typed clauses, and a validation schema. The prose version cannot be evaluated programmatically. The structured version can.
You are already one step from this. The step is formalisation — moving from natural language constraints that the agent interprets, to structured clauses that the agent validates.
Who this matters most for
Developers building personal projects with AI tools do not need this level of rigour. A
.clinerules file and some common sense is sufficient.
Three groups have a material need for it:
Regulated industries. Financial services, healthcare, government. Anywhere that "show your work" is a regulatory requirement. The evidence receipt is the mechanism. The contract is the specification that the auditor can inspect. The agent's compliance is documented at clause level, not asserted in a summary.
Enterprise technology buyers. Organisations that commission large-scale development programmes and need consistent quality standards regardless of who or what built the component. AI agents delivering on enterprise programmes will not be accepted without evidence of quality governance. The contract model provides that evidence.
PE-backed and M&A contexts. In a divestiture or acquisition, the technology estate is an asset being valued. Code produced under a documented, auditable process is worth more than code produced ad hoc. An AI contractor producing signed evidence receipts creates a defensible paper trail that code produced by uncontrolled agents cannot.
The only thing that changes is who's on the other side of the table
I have run technology functions through sixteen M&A events. I have commissioned development work across dozens of suppliers and teams. I have managed divestiture programmes where the quality of the technology estate directly affected deal value.
In every one of those contexts, the mechanism for governing development work was the same: a contract with clear terms, defined acceptance criteria, and documented evidence of compliance. The supplier's geography, headcount, or employment arrangement was irrelevant to the quality gate.
An AI agent is a new kind of contractor. The contract model is not new. The governance mechanism does not need to be invented. It needs to be applied.
The contract is yours. You define the stack. You define security. You define quality. You define done. The evidence is machine-generated. The auditor gets a receipt.
The only thing that changes is who's on the other side of the table.
devcontract.json
→ verify modes → EvidenceFinaliser → fragment_aggregator.py → Gate verdict —
is built and tested at demos/ticketyboo/. Key artefacts:
specToContract.ts (VS Code command),
evidenceFinaliser.ts (VS Code watcher),
fragment_aggregator.py (Gate-side Python),
evidence-fragment.schema.json (JSON Schema draft-07),
and the signing key at SSM /ticketyboo/evidence/signing-key.
The governance proxy covers the runtime enforcement layer.
This article describes the contract and evidence layer above it.
ticketyboo runs five governance agents on every pull request — Security, Cost, SRE, CTO, and Dependency. Evidence signed, audit trail complete.
See how it works 5 free runs, one-time →