Routing work to the right model

The first version of the agentic fixer-bot sent everything to a mid-tier model. It worked. It was also expensive in a way that didn't scale: a one-line label fix costs the same as an architectural refactor if you don't think about routing.

The second version classified first.

Issue routing diagram: tier classification decision tree (Simple/Medium/Complex) and DAG parallel execution example — Left: the three-tier classification tree with model selection. Right: DAG parallel execution. Wall time is the longest branch, not the sum.

Three tiers of work

Every GitHub issue that arrives gets classified by estimated execution time before anything else happens. The classification drives everything downstream: which runtime, which model, which execution strategy.

Tier	Estimated time	Runtime	Model
Simple	< 5 minutes	Lambda (direct)	Small model
Medium	5 – 15 minutes	Step Functions	Small + mid-tier model
Complex	> 15 minutes	SQS → Worker	Mid-tier model

Simple issues run entirely inside Lambda. No queue, no state machine, no extra infrastructure. If it's a one-line fix or a documentation update, direct Lambda execution is faster and cheaper than standing up an orchestration layer around it.

Medium issues need multi-step orchestration without hitting Lambda's 15-minute timeout. Step Functions handles the state transitions (spec generation, task execution, PR creation) as discrete steps with retry logic at each stage.

Complex issues go to a SQS queue and are picked up by long-running workers. These are the architectural changes, the major refactors, the things that might take an hour and produce fifty file changes. Workers have no effective timeout; they run until the task completes or explicitly fails.

The nine-step flow

Once a tier is assigned, execution follows a deterministic sequence regardless of the runtime used:

GitHub issue created or labelled with the trigger label
Lambda reads SSM config: is the system enabled? Under concurrent limit? Label required?
Validate: passes all three checks or abort
Spec generator runs (Claude 3.7 Sonnet): reads the issue, produces a YAML task breakdown with 5–20 tasks depending on complexity, estimates execution time
Task executor runs DAG-based parallel execution: tasks without dependencies run concurrently, tasks with dependencies run sequentially after their prerequisites
Files created, committed to a branch (auto-fix/issue-{number})
PR created with auto-generated description linking back to the issue
GitHub Project Board synced via GraphQL API: issue and PR both added, status updated
Trigger label removed; agentic-processing label added to prevent re-processing

Steps 4 and 5 are where most of the interesting work happens. The spec generator uses a mid-tier model even for simple issues because spec quality determines everything downstream: a badly decomposed spec produces bad code regardless of which model executes it. The task executor uses Haiku for simple tasks (rename this variable, update this comment) and Sonnet for tasks that require understanding existing code patterns.

DAG-based parallel execution

The task executor builds a directed acyclic graph from the spec's dependency declarations. Tasks that don't depend on each other run in parallel using a thread pool. Tasks that depend on other tasks wait for their prerequisites to complete.

# Simplified task spec structure
tasks:
  - id: add-rate-limit-middleware
    description: "Implement rate limiting middleware"
    depends_on: []
    model: haiku

  - id: add-rate-limit-tests
    description: "Write tests for the rate limit middleware"
    depends_on: [add-rate-limit-middleware]
    model: haiku

  - id: update-api-docs
    description: "Update OpenAPI spec to document rate limit headers"
    depends_on: [add-rate-limit-middleware]
    model: haiku

  - id: update-readme
    description: "Add rate limiting section to README"
    depends_on: []
    model: haiku

In this example, add-rate-limit-middleware and update-readme run in parallel. add-rate-limit-tests and update-api-docs both wait for the middleware task to complete, then run in parallel with each other. The total execution time is roughly: middleware time + max(tests time, docs time), rather than the sum of all four.

Runtime controls via SSM

The system runs with a throttle layer in SSM Parameter Store. Five parameters control behaviour at runtime without requiring a redeploy:

enabled: global on/off switch
max_concurrent: maximum number of issues processing simultaneously
require_label: whether an explicit trigger label is required
dry_run: generates specs and plans but doesn't commit code or create PRs
model_override: force all tasks to a specific model for cost testing

The SSM check happens at the start of every invocation. Changing max_concurrent from 1 to 5 takes effect on the next issue that arrives. No deployment, no Lambda update.

What it costs

Haiku is roughly 20x cheaper than Sonnet per token at current pricing. A simple 5-task issue that routes through Haiku costs around $0.02–0.05. A complex 20-task issue that routes through Sonnet costs $0.50–2.00. Without the classifier, everything would run through Sonnet and the economics don't work at any meaningful volume.

Spec generation always uses a mid-tier model regardless of tier. Spec quality determines every downstream task: a cheap spec that misreads the issue produces code that has to be redone.

The full pipeline: See the AutoDev pipeline article for how this fits into the end-to-end flow from issue to merged PR. The routing logic described here is the execution layer; AutoDev is the proposal and governance layer above it.

Working on something like this?

Fractional CTO and transformation leadership for situations that aren't working. Bring a problem — thirty minutes, no obligation.

Bring a problem → or scan a repo first →

Routing work to the right model

Three tiers of work

The nine-step flow

DAG-based parallel execution

Runtime controls via SSM

What it costs

Related articles

Working on something like this?