Four servers, two NAS boxes, a Docker host, and a Kali machine. Real infrastructure, running real services. The question was how much of it an agent could manage without being asked.
The hive structure
| Hive | Domain | What it owns |
|---|---|---|
| Infrastructure | Servers, network, backups | 6 devices (media1, media2, docker-host, kali, NAS1, NAS2), backup scripts, repo sync |
| Security | Monitoring, anomaly detection | Security Minion v2.0 — autonomous Tier 3 worker agent |
| MCP | Tool integration | browser-use, aws-docs, aws-terraform, git, github, filesystem, memory |
| API | LLM access, cost control | OpenRouter integration, 4-tier model routing, cost tracking |
| Automation | Browser automation, data pipelines | File-based desktop integration, extraction staging, processing pipeline |
| Product | R&D, self-analysis | Pattern extraction, reference model, documentation |
Each hive owns its own budget, configuration, data, and reporting. Cross-hive communication goes through the shared workflow system — issues, tasks, decisions — not direct calls.
The Security Minion: an autonomous Tier 3 worker
The Security Minion runs a cycle: discover, learn, monitor, suppress, report. Discovery is auto-detected — Docker containers from compose files, servers by IP, NAS by SMB probe. First run builds a profile: expected ports, expected services. Every run after that, deviations from the profile are anomalies.
Suppress is the step most monitoring systems skip. Without it, you stop reading the alerts. The minion tracks which alerts have been raised and resolved vs suppressed, so "port 8080 always open on the media server" stops firing.
The four-tier LLM routing model
| Tier | Model | Cost per 1M tokens | Used for |
|---|---|---|---|
| Simple | Llama 3.1 8B | $0.06 | Status checks, log parsing, data extraction |
| Analysis | Small model | $0.25 | Trend analysis, basic recommendations |
| Complex | Mid-tier model | $3.00 | Policy decisions, multi-factor analysis |
| Critical | Large model | $15.00 | High-risk changes, compliance decisions |
250x: the gap between the smallest and largest model tier, measured on real queries, not benchmarks. Most queries are simple: is this service running, what does this log line mean. The Engine's issue routing system uses the same pattern with three tiers instead of four.
The policy lifecycle
The agent architecture specified a five-step policy lifecycle:
- Draft — create in YAML, define rules and thresholds, document rationale, estimate impact
- Test — run in dry-run mode, collect metrics, identify false positives, assess performance
- Validate — review test results, LLM analysis of outcomes, risk assessment, stakeholder approval
- Deploy — enable in production, monitor closely, collect feedback
- Iterate — analyse effectiveness, adjust thresholds, refine rules
A policy that's never been run in dry-run mode is a hypothesis. Gatekeep's YAML structure and review/deploy CLI are the production form of this lifecycle.
Risk-aware decision making
Before any change: a risk assessment. Four dimensions (impact scope, data sensitivity, reversibility, financial impact) map to four risk levels, each with a corresponding model tier and approval requirement:
- Low risk: smallest model tier, no approvals needed, optional rollback plan
- Medium risk: small model, manager approval, standard testing, rollback recommended
- High risk: mid-tier model, manager + compliance approval, thorough testing, rollback required
- Critical risk: large model, CEO + legal + compliance approval, extensive testing, rollback required
What carried forward
BuildABeast was created on 18 January 2026. Brood-Hive: 5 February. Engine: 13 February. The hive structure, tiered routing, monitoring patterns, and policy lifecycle all moved from home lab to cloud infrastructure in three weeks. The "autonomous software factory" framing came directly from the BuildABeast docs. Paperclip formalised it six weeks later.
If the articles or tools have been useful, a coffee helps keep things running.
☕ buy me a coffeeticketyboo brings governed AI development to your pull request workflow. 5 governance runs free, one-time welcome grant. No card required.
View pricing Start free →