The ticketyboo.dev platform was substantially built by AI agents working within a human-defined governance framework. The Terraform modules, the Lambda functions, the scanner logic, and the articles you're reading were all produced by AI agents acting on structured specifications — with humans reviewing, approving, and adjusting at key gates.

This is not a novelty. It's a repeatable engineering methodology. This article documents what we've learned about making agentic development work reliably, safely, and in a way that produces code you'd actually want to own.

What makes a development agent different from a code completion tool

Code completion (GitHub Copilot, Claude in an IDE) operates at the level of a single file or function. A development agent operates at the level of a task — it reads context across multiple files, plans a sequence of changes, executes them, and verifies the results.

The defining characteristic of an agent is the loop: Plan → Execute → Verify → Adjust. Each iteration of the loop produces a concrete artefact (a file, a command output, a test result) that informs the next iteration. Without this loop, you have autocomplete. With it, you have a collaborator that can take a task from specification to implementation.

The specification-first pattern

Agentic development works best when the task is fully specified before execution begins. Ambiguity at the specification level produces ambiguous code — but unlike human ambiguity, AI ambiguity can be confident and subtly wrong.

The specification pattern we use has three documents:

requirements.md

User-facing functional requirements. Written in plain language, numbered for traceability. Each requirement has a unique ID that appears in both the design document and the task list. If a requirement can't be traced to a task, it won't get implemented.

design.md

Technical design document. Architecture diagrams (mermaid), data models, interface definitions, key design decisions with rationale. This is the document that an agent reads to understand how to implement a requirement — not just what the output should be.

tasks.md

Atomic, ordered implementation tasks. Each task references the requirements it satisfies and the design sections it implements. Tasks are small enough that each one can be executed, reviewed, and approved independently. No task should take more than a few hundred lines of code.

Governing agent actions

The most important governance rule for agentic development: the agent can propose; only a human can approve. This applies at different granularities:

Action type Agent can Human must
Write code Generate and apply Review diff before merge
Run tests Execute and report Interpret failures
Deploy to production Prepare and propose Approve and trigger
Create AWS resources Write Terraform, run plan Review plan, approve apply
Rotate secrets Identify compromised credentials Rotate and propagate
The irreversibility rule: Any action that is difficult or impossible to undo (deploying to production, deleting data, rotating credentials, modifying IAM policies) requires explicit human approval. Agents should be conservative by default — propose, don't execute, when in doubt.

Context management: the unsolved problem

The biggest practical challenge in agentic development is context degradation. As a task gets longer, older context gets truncated or deprioritised. An agent that was given the right constraints at the start of a task may violate them by the end because the constraints have fallen outside its effective context window.

Mitigations we use:

The coding standards contract

Agentic development produces consistent code only if the coding standards are explicit, machine-readable, and enforced by automated tools rather than post-hoc review. Human reviewers catching style violations in agent-generated code is a productivity waste — the linter should catch them.

For this project, the standards contract includes:

# Python standards (enforced by ruff + mypy)
- Type hints on ALL function signatures (mypy --strict)
- Docstrings on all public functions (pydocstyle)
- logging.getLogger(__name__) — never print()
- Specific exception types — never bare except:
- All Lambda responses via _build_response() helper

# Infrastructure standards (enforced by tfsec)
- All resources tagged: Project, Environment, Owner
- No public S3 buckets
- SSE-S3 encryption on all storage
- No NAT gateways, WAF, KMS CMKs, VPC endpoints

When an agent generates code that violates these standards, CI fails and the agent is asked to fix the violation. The feedback loop is automated — no human needs to review every line for style compliance.

What ticketyboo.dev was built with

This platform was built using Roo (Claude-based), Kiro (Amazon Q-based), and direct Claude API calls for specific reasoning tasks. Each tool has different strengths: Roo is effective for implementation tasks with clear specifications; Kiro is strong for architecture review and cross-cutting constraint enforcement; direct API calls are used for quorum reasoning on design decisions.

The governance framework that shaped this development is open. The GATEKEEP specification, the .clinerules file, and the spec documents are all in the public repository. Copy them, adapt them, use them.

Related tools and articles

→ Multi-model reasoning → AI-assisted remediation → Governance as code → Scan your repository