The ticketyboo.dev scanner checks GitHub repositories for governance, dependency, security, and IaC issues. Like most rule-based scanners, it started with fixed heuristics: if no README exists, raise a finding. If a .env file is committed, raise a finding. Reliable, but static. Once written, the accuracy is frozen.
The problem with frozen accuracy is that it compounds over time. A scanner that flags "Missing CODEOWNERS file" for every single-contributor hobby repo will train users to ignore it. A scanner that misses hardcoded AWS account IDs in Terraform because that pattern wasn't in the original rules will keep missing them. The signal degrades until no one trusts it.
This demo adds a learning loop. It has four components.
The four components
Component 1: Confidence scores on findings
The scanner's Finding dataclass gets a confidence field
(0.0 to 1.0, default 0.7). On each scan, after findings are deduplicated, the
scanner applies confidence scores based on what it knows from the lessons document:
suppressed patterns get 0.3, boosted patterns get 0.95, everything else gets
category-level defaults (security 0.85, governance 0.75, etc.).
Confidence doesn't filter findings. It annotates them. A finding with 0.3 confidence still appears, but the UI can surface it differently. More importantly, it becomes a signal for the feedback loop: if a low-confidence finding is consistently marked accurate, that's a boost candidate.
Component 2: Human feedback API
One endpoint: POST /api/scan/{scan_id}/feedback. Body contains the
finding ID (a stable hash of category + file + title), a verdict
(accurate or false_positive), and an optional note.
Rate-limited to prevent abuse. No authentication: the scanner is already public.
The finding ID is deterministic. The same finding (same category, file, title) will produce the same ID across different scans of the same repo. This makes it possible to correlate feedback across scan runs without storing links between them.
Feedback is stored in a separate DynamoDB table (scanner-feedback)
with a GSI on verdict type. 90-day TTL. The aggregator queries both verdict types
via that GSI and merges them.
Component 3: Nightly aggregator
An EventBridge cron triggers a Lambda at 02:00 UTC daily. It reads all feedback from the last 30 days, calculates accuracy metrics, identifies patterns, calls a mid-tier model to synthesise a lessons-learned document, writes it to S3, and writes the day's accuracy and false positive rates to a metrics table.
The aggregator logic is explicit about what it looks for:
Suppress candidates: categories where FP rate > 60% (at least 5 samples)
Boost candidates: findings with confidence < 0.6 that were marked accurate
Watch-for patterns: recurring findings across multiple repos (from reviewer notes)
The model call takes roughly 4K tokens in, 1K out. At current pricing that's about $0.01. With CloudWatch and DynamoDB, the full daily run costs under $0.02.
The prompt instructs the model to produce structured markdown with three sections:
Suppress, Boost, Watch for. The structure is rigid by design. The scanner's
_apply_confidence() function parses these sections line by line to
extract finding titles for the suppress and boost lists.
Component 4: RAG injection
At the start of each scan, the scanner calls _load_lessons(), which
does a single S3 GetObject for scanner/lessons-learned.md.
If the file doesn't exist, the call returns None and the scan
proceeds with defaults. Graceful degradation is the whole point: the learning
loop is additive, not required.
The lessons document is passed into ScanContext as
lessons_context. The scanner's deep analysis layers (SAST, secret
detection, IaC) can optionally inject the relevant section into their LLM
prompts. Even for shallow scans, the confidence scoring logic reads the lessons
document to adjust per-finding confidence values.
What the lessons document looks like
# Scanner Lessons Learned
Generated: 2026-03-27 02:00 UTC
Based on: 47 scans, 89 feedback items
## Suppress (commonly false positive)
- "Missing CODEOWNERS file" in repos with < 3 contributors (FP rate: 78%)
- "No branch protection" on personal/demo repos (FP rate: 85%)
## Boost (commonly missed but accurate)
- Hardcoded AWS account IDs in terraform (accuracy: 92%, often missed)
- Missing rate limiting on public API endpoints (accuracy: 88%)
## Watch for (emerging patterns)
- Repos using deprecated Node.js 16 runtime (seen in 4 of last 10 scans)
- Missing .gitignore entries for .env files (seen in 6 of last 10 scans)
This is injected verbatim into the scan prompt. The model sees it as context before making any decisions. The cost of the S3 read is negligible. The benefit compounds over weeks as patterns accumulate.
What this isn't
This is not a training loop. The model weights don't change. The scanner doesn't get smarter in a deep learning sense. What changes is the context it reasons from. That's a much smaller, cheaper, more controllable mechanism, and for a tool at this scale, it's the right mechanism.
It's also not a fully automated loop. The feedback requires a human to mark findings. That's intentional. Fully automated feedback (using one model's output to improve another's input) collapses into an echo chamber faster than you'd expect. A human signal, even occasional, anchors the loop to reality.
The accuracy dashboard
The scanner-learning demo page shows the metrics written by the aggregator: accuracy rate trend, false positive rate, and the current lessons document. It also provides the feedback form for submitting verdicts directly.
With 30 days of feedback, the suppression and boost lists become meaningful. Before that, the aggregator runs but produces a mostly empty document. That's fine. The loop doesn't need to produce value on day one.
Patterns at work
This demo implements five patterns from the agentic design pattern taxonomy:
- Memory (8): scan history and feedback persisted in DynamoDB, lessons in S3
- Learning and adaptation (9): nightly aggregator improves future scan prompts from feedback
- Human-in-the-loop (13): humans mark findings; that signal anchors the learning cycle
- RAG (14): lessons document injected as context before scanning
- Evaluation and monitoring (19): accuracy and false positive rates tracked daily in DynamoDB
Found this useful? Support the project.
Working on something like this?
Fractional CTO and transformation leadership for situations that aren't working. Bring a problem — thirty minutes, no obligation.
Bring a problem → or scan a repo first →