The first version of the agentic fixer-bot sent everything to a mid-tier model. It worked. It was also expensive in a way that didn't scale: a one-line label fix costs the same as an architectural refactor if you don't think about routing.
The second version classified first.
Three tiers of work
Every GitHub issue that arrives gets classified by estimated execution time before anything else happens. The classification drives everything downstream: which runtime, which model, which execution strategy.
| Tier | Estimated time | Runtime | Model |
|---|---|---|---|
| Simple | < 5 minutes | Lambda (direct) | Small model |
| Medium | 5 – 15 minutes | Step Functions | Small + mid-tier model |
| Complex | > 15 minutes | SQS → Worker | Mid-tier model |
Simple issues run entirely inside Lambda. No queue, no state machine, no extra infrastructure. If it's a one-line fix or a documentation update, direct Lambda execution is faster and cheaper than standing up an orchestration layer around it.
Medium issues need multi-step orchestration without hitting Lambda's 15-minute timeout. Step Functions handles the state transitions (spec generation, task execution, PR creation) as discrete steps with retry logic at each stage.
Complex issues go to a SQS queue and are picked up by long-running workers. These are the architectural changes, the major refactors, the things that might take an hour and produce fifty file changes. Workers have no effective timeout; they run until the task completes or explicitly fails.
The nine-step flow
Once a tier is assigned, execution follows a deterministic sequence regardless of the runtime used:
- GitHub issue created or labelled with the trigger label
- Lambda reads SSM config: is the system enabled? Under concurrent limit? Label required?
- Validate: passes all three checks or abort
- Spec generator runs (Claude 3.7 Sonnet): reads the issue, produces a YAML task breakdown with 5–20 tasks depending on complexity, estimates execution time
- Task executor runs DAG-based parallel execution: tasks without dependencies run concurrently, tasks with dependencies run sequentially after their prerequisites
- Files created, committed to a branch (
auto-fix/issue-{number}) - PR created with auto-generated description linking back to the issue
- GitHub Project Board synced via GraphQL API: issue and PR both added, status updated
- Trigger label removed;
agentic-processinglabel added to prevent re-processing
Steps 4 and 5 are where most of the interesting work happens. The spec generator uses a mid-tier model even for simple issues because spec quality determines everything downstream: a badly decomposed spec produces bad code regardless of which model executes it. The task executor uses Haiku for simple tasks (rename this variable, update this comment) and Sonnet for tasks that require understanding existing code patterns.
DAG-based parallel execution
The task executor builds a directed acyclic graph from the spec's dependency declarations. Tasks that don't depend on each other run in parallel using a thread pool. Tasks that depend on other tasks wait for their prerequisites to complete.
# Simplified task spec structure
tasks:
- id: add-rate-limit-middleware
description: "Implement rate limiting middleware"
depends_on: []
model: haiku
- id: add-rate-limit-tests
description: "Write tests for the rate limit middleware"
depends_on: [add-rate-limit-middleware]
model: haiku
- id: update-api-docs
description: "Update OpenAPI spec to document rate limit headers"
depends_on: [add-rate-limit-middleware]
model: haiku
- id: update-readme
description: "Add rate limiting section to README"
depends_on: []
model: haiku
In this example, add-rate-limit-middleware and update-readme
run in parallel. add-rate-limit-tests and update-api-docs
both wait for the middleware task to complete, then run in parallel with each other.
The total execution time is roughly: middleware time + max(tests time, docs time),
rather than the sum of all four.
Runtime controls via SSM
The system runs with a throttle layer in SSM Parameter Store. Five parameters control behaviour at runtime without requiring a redeploy:
enabled: global on/off switchmax_concurrent: maximum number of issues processing simultaneouslyrequire_label: whether an explicit trigger label is requireddry_run: generates specs and plans but doesn't commit code or create PRsmodel_override: force all tasks to a specific model for cost testing
The SSM check happens at the start of every invocation. Changing
max_concurrent from 1 to 5 takes effect on the next issue that arrives.
No deployment, no Lambda update.
What it costs
Haiku is roughly 20x cheaper than Sonnet per token at current pricing. A simple 5-task issue that routes through Haiku costs around $0.02–0.05. A complex 20-task issue that routes through Sonnet costs $0.50–2.00. Without the classifier, everything would run through Sonnet and the economics don't work at any meaningful volume.
Spec generation always uses a mid-tier model regardless of tier. Spec quality determines every downstream task: a cheap spec that misreads the issue produces code that has to be redone.
If the articles or tools have been useful, a coffee helps keep things running.
☕ buy me a coffeeticketyboo brings governed AI development to your pull request workflow. 5 governance runs free, one-time welcome grant. No card required.
View pricing Start free →