What Paperclip was
Paperclip was a hosted AI agent orchestrator. It ran on a Lightsail instance, managed agent definitions, scheduled heartbeats, routed LLM calls, and maintained a ticket system for agent-created work items. The ticketyboo.dev ops team — CTO, SRE, Security, and Cost agents — ran through it.
The architecture looked like this:
EventBridge schedule
→ Lightsail (Paperclip API)
→ OpenRouter (LLM call)
→ Paperclip ticket system
→ Lambda proxy (team_proxy.py)
→ Browser (team dashboard)
Five links. Each one a failure point. The Lightsail instance needed SSH access for debugging. The Paperclip API needed its own auth tokens. The LLM calls needed an OpenRouter key and had variable latency. The ticket system was Paperclip-internal, so board data couldn't be accessed without the API being up. The Lambda proxy had to translate Paperclip's response format into the frontend contract.
What failed
Nothing dramatic. The system just wasn't reliable enough for what it was doing. The Lightsail instance would occasionally become unreachable. LLM calls would time out or return unexpected formats. The Paperclip API had its own deployment cycle that didn't align with the platform's. When any link in the chain broke, the team dashboard showed stale data or nothing at all.
The deeper problem: the ops agents don't need LLMs. The SRE agent checks CloudWatch metrics against thresholds. The Cost agent queries Cost Explorer and compares against a budget. The Security agent scans IAM policies for known anti-patterns. These are deterministic operations. Wrapping them in an LLM call added latency, cost, and unpredictability without adding capability.
What replaced it
EventBridge schedule
→ Lambda (agent function)
→ DynamoDB (team-activity table)
→ Lambda proxy (team_proxy.py)
→ Browser
Three links instead of five. No Lightsail. No LLM. No external ticket system. Each agent is a Lambda function that runs on a schedule, does its checks, and writes results directly to DynamoDB. The proxy reads from DynamoDB and GitHub Issues (for the board). Done.
The agent functions share a base module (agent_base.py) that handles
DynamoDB writes, TTL management, and the agent ID registry. Each agent
implements a single handler function:
# agents/sre_handler.py — simplified
from agents.agent_base import write_run, write_activity, write_status
def handler(event, context):
"""SRE heartbeat: check CloudWatch, write results."""
checks = run_health_checks() # CloudWatch, Lambda errors, cache ratio
for service, result in checks.items():
write_status(service, result["status"], result.get("details"))
status = "succeeded" if all_healthy(checks) else "failed"
write_run("sre", status, source="scheduled", summary=summarise(checks))
write_activity("sre", "heartbeat.invoked")
What stayed the same
The frontend contract didn't change. team.js still calls
/api/team/runs, /api/team/activity, etc.
The field allowlist in team_proxy.py still filters every response.
The agent UUIDs were preserved so the dashboard's agent registry didn't need updating.
The governance model survived intact. The approval gates, escalation rules, and auto-reject conditions are the same declarative JSON they always were. The rules don't care whether a Paperclip process or a Lambda function evaluates them — they define what's allowed, not how it's enforced.
Board data moved from Paperclip's internal ticket system to GitHub Issues. This is better: issues are visible without the ops infrastructure being up, they have a public URL, and they integrate with the existing PR workflow.
What it costs
Before: Lightsail instance ($3.50/month, the cheapest tier) plus OpenRouter LLM calls ($2–5/month depending on heartbeat frequency).
After: Lambda invocations and DynamoDB writes, both well within Free Tier. The four agents run a combined ~150 invocations/day. At 128MB memory and sub-second execution, that's nowhere near the 1M free requests/month. DynamoDB writes with 30-day TTL auto-expire, keeping the table small.
Net saving: ~$5–8/month. Not life-changing, but on a platform that's committed to Free Tier, every dollar matters.
The pattern
If your AI ops agents are doing deterministic work — checking metrics, scanning configs, comparing values against thresholds — you probably don't need an agent orchestrator. You need scheduled functions and a results table. The orchestrator adds value when agents need to reason, plan multi-step actions, or coordinate with each other through natural language. For everything else, it's overhead.
The declarative governance model (rules as JSON, approval gates, escalation conditions) is worth keeping regardless of execution engine. Separate the rules from the runtime. The rules are the valuable part.
tools/paperclip-config/ — agent definitions, governance rules,
skills. They're useful as documentation of the declarative pattern even though
the runtime that consumed them is gone.
If the articles or tools have been useful, a coffee helps keep things running.
☕ buy me a coffeeScan any public GitHub repo for dependency risk, secrets, and code quality issues — free, no account needed.
Scan a repo free See governance agents →