Why I treat Claude the same way I'd treat any contractor -- ticketyboo.dev

Who this is for: engineering leaders, CTOs, and anyone who has ever commissioned work from a development house and then watched it go sideways because the brief wasn't specific enough.

Takeaways:

Guardrails are runtime blocks. Contracts govern before work starts. Different thing.
A DevContract is a structured, machine-readable document: stack, security, cost, quality, compliance, definition of done.
During execution, the agent queries the contract before making material decisions. Every query and verdict is logged.
At delivery, the evidence receipt proves clause-level compliance. Signed, auditable, machine-generated.
Your .clinerules file is already a proto-contract. The step is formalisation.

I've commissioned development work from houses on four continents. Onshore, nearshore, offshore, staff aug, fixed-price, T&M, consultancy, solo contractor. I've never once handed any of them a prompt and hoped for the best.

What I've handed them, every time, is a contract. Statement of work. Technical specification. Acceptance criteria. Definition of done. The developer on the other end signs it, builds against it, and delivers something I can evaluate against it. If they violate the contract, that's not a code review comment. That's a contractual failure.

I use Claude as a developer. I use Codex, Cursor, and a handful of other agentic tools. I treat every one of them exactly the same way I'd treat a contractor. The origin of the deliverable — human, nearshore, AI — is irrelevant to the quality gate. The contract is the contract.

What follows is what that looks like in practice, and why it produces fundamentally different results than "add some guardrails and review the PR."

What everyone's actually doing

Here is the current state of AI-assisted development governance at most organisations that have thought about it at all:

You describe what you want in a prompt. The agent executes. You review the output. You decide if it's "done." Maybe you have some guardrails that block particularly dangerous actions at runtime. Maybe you have a governance agent that reviews the PR and posts a comment.

That's an audit. The work is already done. The agent already made its decisions — about your stack, your architecture, your security posture, your cost footprint. You're reviewing the consequences of those decisions after the fact.

Nobody would commission a development house that way. You wouldn't hand Accenture a one-paragraph brief and review their output six weeks later to see if it met your standards. The reason you wouldn't do that is obvious: by the time you review it, it's too late. The decisions are baked in.

The same logic applies to AI agents, with one important difference: AI agents can operate at a velocity that makes the "review at the end" model catastrophically inadequate. An agent working for eight hours uncontrolled can make thousands of decisions that compound on each other. A human developer working for eight hours makes dozens.

Governance applied after the fact is an audit

There is a precise distinction worth drawing here, because conflating these two things is what leads to the current state of the industry.

An audit happens after the deliverable exists. It checks whether what was built meets a standard. It produces findings. It may result in remediation. It is valuable, necessary, and not sufficient.

A contract happens before the work starts. It defines the standard the work must meet. The contractor builds against it. The contract is the acceptance criteria. The audit is checking whether the contract was honoured.

Guardrails are a runtime audit. They block specific actions that violate policy. They are a last line of defence — the safety net below the trapeze. They are not a substitute for the trapeze itself being built correctly. A contractor who has read the SOW doesn't build the wrong thing. A contractor who hasn't read the SOW gets caught by the safety net after they fall.

The goal is not more safety nets. The goal is an agent that read the brief.

Governance applied after the fact is an audit. Governance applied before the work starts is a contract.

What a contract actually looks like

A DevContract is not a prompt. It is not a long system message. It is a structured document with machine-readable clauses that can be validated programmatically.

The mental model is Pydantic. When you define a BaseModel, you are saying: "any data that comes through this must look like this." A DevContract is the same thing applied to a development engagement: "any deliverable from this agent must look like this." The contract is the schema. The deliverable is the payload.

{
  "contract_version": "1.0",
  "project": "ticketyboo-scanner",
  "parties": {
    "client": "fenderfonic",
    "contractor": "claude-code"
  },
  "stack": {
    "language": "python",
    "language_version": "3.12",
    "runtime": "aws_lambda",
    "region": "eu-north-1"
  },
  "security": {
    "auth_mechanism": "cognito_jwt",
    "secrets_store": "ssm_parameter_store",
    "production_deploy_gate": true
  },
  "cost": {
    "budget_envelope": "aws_free_tier",
    "forbidden_resources": [
      "nat_gateway", "rds", "secrets_manager", "kms_cmk"
    ]
  },
  "quality": {
    "type_hints": "required",
    "test_coverage_min": 80,
    "logging_standard": "module_logger_only"
  },
  "compliance": {
    "frameworks": ["gdpr"]
  },
  "definition_of_done": {
    "tests_pass": true,
    "no_contract_violations": true,
    "coverage_met": true,
    "pr_comment_posted": true
  }
}

Every field is typed. Every field is versioned alongside the code. Every field is machine-queryable by the agent during execution. When the contract changes — when the acceptable Python version moves from 3.11 to 3.12, when GDPR compliance becomes required, when the budget envelope tightens — the diff shows exactly what changed and who approved it.

Compare this to updating a system prompt. A system prompt change is invisible in version control. It is not auditable. It has no typed schema. It cannot be validated. It cannot be diffed in a meaningful way. It is not a contract.

Pre-execution is the key property

The contract is delivered to the agent before the work starts. This is not a subtle point. It is the entire point.

A good contractor who has read the SOW doesn't store credentials in environment variables and wait to be told that's wrong. They check the contract before making the decision. "I need to store a secret. My contract says SSM Parameter Store only, no Secrets Manager. I'll use SSM." The violation never happens. There is nothing to catch.

This is what the contract proxy enables. During execution, the agent can query any clause before making a material decision:

Agent queries: "Can I create an RDS instance here?"
Contract proxy: DENY
  clause: cost.forbidden_resources
  reason: RDS is forbidden. Use DynamoDB.
  action_taken: [agent proceeds with DynamoDB]
  logged_to: evidence_ledger#entry_0042

Agent queries: "I'm about to deploy to production."
Contract proxy: GATE
  clause: security.production_deploy_gate
  reason: Production deployments require human approval per contract.
  action: paused — awaiting human sign-off

Every query and verdict is appended to the evidence ledger. The agent is not blocked by surprise. It consulted the contract. It received guidance. It acted accordingly. The ledger is the record of that consultation.

Contract defines the terms. Execution queries them in real-time. Delivery proves they were met.

The definition of done problem

With any AI agent tool, "done" is whatever the agent decides it is. There is no machine-readable acceptance criteria. You review the output yourself and decide. That model works for one developer on one project. It does not scale to:

Multiple agents working in parallel on different components
Regulated environments where "done" must be demonstrable to an auditor
Teams where "done" must be consistent whether the contributor is human or AI
Enterprise buyers who need to show their AI development meets contractual obligations to their customers

A DevContract makes "done" a machine-verifiable claim. The agent submits its evidence bundle. The arbitration panel checks it against the contract's definition_of_done clauses. If all clauses pass, the contract is fulfilled. The signed evidence.json is the receipt.

This is exactly how you'd handle it with a human development house. Their deliverable is accepted when it meets the specification. The acceptance is documented. The documentation is the evidence that the work was done correctly. The only difference here is that the evidence is machine-generated rather than manually assembled.

What the evidence receipt looks like

This is not a pass/fail boolean. The evidence receipt is a structured document with clause-level verdicts, a log of every contract query made during execution, and a cryptographic signature. It is what an auditor gets. It is what an enterprise buyer gets. It is what makes AI development enterprise-grade.

{
  "contract_id": "ticketyboo-scanner-v1.2",
  "contractor": "claude-code",
  "client": "fenderfonic",
  "contract_hash": "sha256:abc123...",
  "clause_results": {
    "stack": "pass",
    "architecture": "pass",
    "security": "pass",
    "cost": "pass",
    "quality": "pass",
    "compliance": "pass",
    "definition_of_done": "pass"
  },
  "contract_queries": [
    {
      "timestamp": "2026-03-30T09:12:00Z",
      "query": "Can I use Secrets Manager?",
      "clause": "cost.forbidden_resources",
      "verdict": "deny",
      "action_taken": "Used SSM Parameter Store SecureString instead"
    },
    {
      "timestamp": "2026-03-30T09:34:00Z",
      "query": "Deploy to production?",
      "clause": "security.production_deploy_gate",
      "verdict": "gate",
      "action_taken": "Paused — human approval obtained at 09:41"
    }
  ],
  "violations": [],
  "overall_verdict": "contract_fulfilled",
  "signature": "sha256:def456..."
}

Note the contract_queries array. The agent consulted the contract twice during this engagement. Both times it received a verdict. Both times it acted on it. The record of those consultations is part of the evidence. This is not just "the output passed governance." This is: "the contractor checked the contract before making these specific decisions, and here is what it did."

A human development house cannot produce this level of evidence without significant additional process overhead. An AI agent produces it automatically as a side effect of working under the contract.

Your .clinerules file is already a proto-contract

If you use Roo, Cursor, or any other tool that supports a rules or steering file, you are already doing a version of this. The .clinerules or .cursorrules or equivalent file in your repo defines constraints: what stack to use, what patterns to follow, what services are forbidden.

That file is a contract expressed in prose. A DevContract is the same thing expressed as a Pydantic model with machine-readable fields, typed clauses, and a validation schema. The prose version cannot be evaluated programmatically. The structured version can.

You are already one step from this. The step is formalisation — moving from natural language constraints that the agent interprets, to structured clauses that the agent validates.

Who this matters most for

Developers building personal projects with AI tools do not need this level of rigour. A .clinerules file and some common sense is sufficient.

Three groups have a material need for it:

Regulated industries. Financial services, healthcare, government. Anywhere that "show your work" is a regulatory requirement. The evidence receipt is the mechanism. The contract is the specification that the auditor can inspect. The agent's compliance is documented at clause level, not asserted in a summary.

Enterprise technology buyers. Organisations that commission large-scale development programmes and need consistent quality standards regardless of who or what built the component. AI agents delivering on enterprise programmes will not be accepted without evidence of quality governance. The contract model provides that evidence.

PE-backed and M&A contexts. In a divestiture or acquisition, the technology estate is an asset being valued. Code produced under a documented, auditable process is worth more than code produced ad hoc. An AI contractor producing signed evidence receipts creates a defensible paper trail that code produced by uncontrolled agents cannot.

The only thing that changes is who's on the other side of the table

I have led technology functions through multiple M&A transactions, commissioned development across a wide range of suppliers and teams, and managed divestiture programmes where the quality of the technology estate had a direct impact on deal value.

In every one of those contexts, the mechanism for governing development work was the same: a contract with clear terms, defined acceptance criteria, and documented evidence of compliance. The supplier's geography, headcount, or employment arrangement was irrelevant to the quality gate.

An AI agent is a new kind of contractor. The contract model is not new. The governance mechanism does not need to be invented. It needs to be applied.

The contract is yours. You define the stack. You define security. You define quality. You define done. The evidence is machine-generated. The auditor gets a receipt.

The only thing that changes is who's on the other side of the table.

This is designed and demonstrated. The interactive demo — scanner CVE simulation, governed agent trace, and signed evidence receipt — is at demos/devcontract.html. The runtime enforcement layer (contract proxy: allow/deny/gate verdicts) is covered separately in the governance proxy article. This article describes the contract and evidence model that sits above it.

Working on something like this?

Fractional CTO and transformation leadership for situations that aren't working. Bring a problem — thirty minutes, no obligation.

Bring a problem → or scan a repo first →