AI-assisted remediation

A scanner that finds problems is useful. A scanner that helps you fix them is transformative. The gap between "here are your 47 findings" and "here's a reviewed, tested patch for each one" is where most automated security tools leave you — alone, with a to-do list and no roadmap.

AI-assisted remediation closes that gap. Not by blindly applying fixes, but by using AI to generate candidate patches, reason about their correctness, and surface them for human review in the appropriate context. The human remains the decision authority. The AI handles the cognitive load of writing and reasoning about code at scale.

Why remediation is harder than scanning

Scanning is essentially pattern matching — find code or configuration that matches a known vulnerable pattern. Remediation requires understanding intent: what was this code supposed to do, what was it actually doing wrong, and what's the minimal change that corrects it without introducing new problems?

That context window — the mental model of a codebase, its dependencies, its deployment environment, and its constraints — is exactly what large language models have become competent at holding. A well-prompted AI model can read a codebase, understand a finding, and propose a contextual fix that accounts for the surrounding code.

The remediation workflow

The workflow we've developed has five stages:

1. Triage and prioritise

Not all findings are equal. A hardcoded AWS key is a drop-everything emergency. A missing type hint is a low-priority housekeeping task. The first job is to sort findings by severity and exploitability, and group related findings that can be addressed in a single change (e.g. "all print() calls in api/").

2. Generate candidate patches

For each finding (or group), an AI model generates a candidate patch. The prompt includes: the finding description, the affected file content, the repository structure, and any relevant constraints (framework version, coding standards). The model is asked to produce a minimal, targeted change — not a refactor.

# Example prompt structure for remediation
system: You are a senior Python engineer. Apply minimal, targeted fixes.
        Do not refactor beyond the scope of the finding.
        Preserve existing code style and variable names.

user: Finding: "8 occurrences of print() used for production logging in api/handler.py"
      Severity: Medium
      File content: [handler.py content]
      Coding standard: Use logging.getLogger(__name__). Never print().

      Produce a unified diff patch.

3. Verify with a second model

The candidate patch is submitted to a second model (different architecture, different training data) for adversarial review. The reviewer model is asked: does this patch correctly fix the finding? Does it introduce any new issues? Does it break any observable behaviour? Disagreements between models are flagged for human review.

4. Run the test suite

Every patch is applied to a sandbox environment and the test suite is run. This is why governance finding #1 is always "no tests detected" — without a test suite, automated remediation is unsafe. You have no harness to verify the fix didn't break something. The test suite is the trust anchor for the whole workflow.

5. Human review and merge

Patches that pass model review and tests are opened as pull requests, grouped by finding category. The human reviewer sees: the finding, the original code, the patch, the model review notes, and the test results. The decision to merge is always made by a human.

The human-in-the-loop rule: AI generates, AI verifies, AI tests — but a human approves every merge. This isn't a limitation; it's a feature. The AI handles the cognitive load of 40 findings; the human exercises judgement on each one. That division of labour is where the productivity gain comes from.

What AI handles well

Mechanical fixes: replacing print() with logging, adding type hints, fixing import ordering
Dependency upgrades with clear upgrade paths
Adding missing configuration files (ruff.toml, .gitignore, SECURITY.md templates)
Refactoring bare except: clauses to catch specific exceptions
Adding basic pytest test stubs for existing functions

What AI handles poorly (and requires human judgment)

Credential rotation — AI can find the credential, humans must rotate and propagate it
Architectural changes (switching auth mechanisms, changing data models)
Security findings where the "fix" changes public API behaviour
IaC changes that require coordinating a deployment window

Building a remediation pipeline for your project

You don't need a sophisticated orchestration platform to start. A simple GitHub Actions workflow that:

Runs the scanner on a schedule (weekly)
Calls an LLM API for each new medium/low finding
Opens a PR with the candidate patch and model review notes
Tags the PR with the finding category and severity

...already gives you most of the value. The scanner finds the problems. The AI drafts the solutions. You spend your review time on judgement calls, not on writing boilerplate fixes.

The ticketyboo.dev scanner output is structured specifically to feed into this workflow. Each finding includes a remediation field — a human-readable suggested fix that can be used as part of a remediation prompt.

Why remediation is harder than scanning

The remediation workflow

1. Triage and prioritise

2. Generate candidate patches

3. Verify with a second model

4. Run the test suite

5. Human review and merge

What AI handles well

What AI handles poorly (and requires human judgment)

Building a remediation pipeline for your project

Related tools and articles