Agentic development

The ticketyboo.dev platform was substantially built by AI agents working within a human-defined governance framework. The Terraform modules, the Lambda functions, the scanner logic, and the articles you're reading were all produced by AI agents acting on structured specifications — with humans reviewing, approving, and adjusting at key gates.

This is not a novelty. It's a repeatable engineering methodology. This article documents what I've learned about making agentic development work reliably, safely, and in a way that produces code you'd actually want to own.

What makes a development agent different from a code completion tool

AI code completion tools operate at the level of a single file or function. A development agent operates at the level of a task: it reads context across multiple files, plans a sequence of changes, executes them, and verifies the results.

The defining characteristic of an agent is the loop: Plan → Execute → Verify → Adjust. Each iteration of the loop produces a concrete artefact (a file, a command output, a test result) that informs the next iteration. Without this loop, you have autocomplete. With it, you have a collaborator that can take a task from specification to implementation.

The specification-first pattern

Agentic development works best when the task is fully specified before execution begins. Ambiguity at the specification level produces ambiguous code — but unlike human ambiguity, AI ambiguity can be confident and subtly wrong.

The specification pattern we use has three documents:

requirements.md

User-facing functional requirements. Written in plain language, numbered for traceability. Each requirement has a unique ID that appears in both the design document and the task list. If a requirement can't be traced to a task, it won't get implemented.

design.md

Technical design document. Architecture diagrams (mermaid), data models, interface definitions, key design decisions with rationale. This is the document that an agent reads to understand how to implement a requirement — not just what the output should be.

tasks.md

Atomic, ordered implementation tasks. Each task references the requirements it satisfies and the design sections it implements. Tasks are small enough that each one can be executed, reviewed, and approved independently. No task should take more than a few hundred lines of code.

Governing agent actions

The most important governance rule for agentic development: the agent can propose; only a human can approve. This applies at different granularities:

Action type	Agent can	Human must
Write code	Generate and apply	Review diff before merge
Run tests	Execute and report	Interpret failures
Deploy to production	Prepare and propose	Approve and trigger
Create AWS resources	Write Terraform, run plan	Review plan, approve apply
Rotate secrets	Identify compromised credentials	Rotate and propagate

The irreversibility rule: Any action that is difficult or impossible to undo (deploying to production, deleting data, rotating credentials, modifying IAM policies) requires explicit human approval. Agents should be conservative by default — propose, don't execute, when in doubt.

Two isometric 3x3x3 cube assemblies. Left (Ship It Fast, Fix It Later): perfect surface but chaotic scattered blocks underground. Right (Do It Once, Do It Right): perfectly stacked throughout with green structural blocks underground. Tagline: Same surface. One of them breaks at 2am. — Same surface. One of them breaks at 2am. Ship safely, not slowly.

Context management: the unsolved problem

The biggest practical challenge in agentic development is context degradation. As a task gets longer, older context gets truncated or deprioritised. An agent that was given the right constraints at the start of a task may violate them by the end because the constraints have fallen outside its effective context window.

Mitigations we use:

Checkpointing: At the end of each task, the agent writes a summary of decisions made and constraints applied. This summary is prepended to the next task's context.
Constraint files: Key rules (no npm, no NAT gateways, SSM not Secrets Manager) are in a .clinerules file that is injected into every agent context automatically.
Small tasks: Tasks that take more than ~1000 lines of code are split. Smaller tasks fit within context more reliably.
Explicit verification steps: Tasks include "verify X" steps that force the agent to re-check constraints, not just assume they were followed.

The coding standards contract

Agentic development produces consistent code only if the coding standards are explicit, machine-readable, and enforced by automated tools rather than post-hoc review. Human reviewers catching style violations in agent-generated code is a productivity waste — the linter should catch them.

For this project, the standards contract includes:

# Python standards (enforced by ruff + mypy)
- Type hints on ALL function signatures (mypy --strict)
- Docstrings on all public functions (pydocstyle)
- logging.getLogger(__name__)  # never print()
- Specific exception types  # never bare except:
- All Lambda responses via _build_response() helper

# Infrastructure standards (enforced by tfsec)
- All resources tagged: Project, Environment, Owner
- No public S3 buckets
- SSE-S3 encryption on all storage
- No NAT gateways, WAF, KMS CMKs, VPC endpoints

When an agent generates code that violates these standards, CI fails and the agent is asked to fix the violation. The feedback loop is automated — no human needs to review every line for style compliance.

The allowlist pattern: agent data never leaks

One concrete governance pattern the agents themselves run under: the Lambda proxy that serves the team dashboard uses an explicit field allowlist. Every response from DynamoDB passes through it before reaching the browser. Internal IDs, raw telemetry, anything not on the list — none of it can leak accidentally, because the allowlist is the only exit:

# api/team_proxy.py — field allowlist enforced on every response
FIELD_ALLOWLISTS: dict[str, list[str]] = {
    "activity":   ["id", "actor_type", "actor_id", "action", "entity_type",
                   "entity_id", "timestamp", "details_agent_id"],
    "runs":       ["id", "agent_id", "status", "started_at", "finished_at",
                   "source"],
    "board":      ["id", "title", "status", "assignee", "priority",
                   "created_at", "updated_at", "ref"],
    "costs":      ["period", "total", "currency", "agent_spend", "aws_costs",
                   "budget_remaining", "byAgent"],
    "status":     ["service", "status", "last_checked", "details"],
    "governance": ["id", "type", "status", "requestType", "createdAt",
                   "approvedAt", "ref", "title"],
}

def _filter_list(items: list, resource: str) -> dict:
    """Apply allowlist filtering to a list of DynamoDB items."""
    allowlist = FIELD_ALLOWLISTS.get(resource, [])
    filtered = [
        {k: v for k, v in item.items() if k in allowlist}
        for item in items
        if isinstance(item, dict)
    ]
    return {"items": filtered}

The pattern applies equally to humans writing the proxy and agents using it as a tool. An agent that calls /api/team/activity gets exactly the fields listed — no more. The governance rule is in the code, not in the prompt.

What ticketyboo.dev was built with

This platform was built using AI coding assistants and direct model API calls for specific reasoning tasks. Each tool has different strengths: the implementation agent is effective for tasks with clear specifications; the planning agent is strong for architecture review and cross-cutting constraint enforcement; direct API calls are used for quorum reasoning on design decisions.

The governance framework that shaped this development is open. The Gatekeep specification, the .clinerules file, and the spec documents are all in the public repository. Copy them, adapt them, use them.

Working on something like this?

Fractional CTO and transformation leadership for situations that aren't working. Bring a problem — thirty minutes, no obligation.

Bring a problem → or scan a repo first →

What makes a development agent different from a code completion tool

The specification-first pattern

requirements.md

design.md

tasks.md

Governing agent actions

Context management: the unsolved problem

The coding standards contract

The allowlist pattern: agent data never leaks

What ticketyboo.dev was built with

Related tools and articles

Working on something like this?