OMNI: building a self-aware capability mesh

Most AI agent projects start by asking "what can we make this agent do?" OMNI started from a different question: "how does the agent know what it can't do — and what should it do about that?"

The answer shaped the whole architecture. Instead of a single capable agent that pretends it can handle anything, OMNI is a mesh of registered capabilities — each one discrete, governed, and independently deployable — with an orchestration layer that discovers what's available and composes what's needed. When a request falls outside what's registered, OMNI says so clearly. And it records that gap as a demand signal for the next thing to build.

The capability mesh pattern

A capability is a single thing an agent can do: run a security scan, review a Terraform plan, look up a compliance rule, analyse a document, generate code for a task. Each capability is a Lambda function registered with a gateway, with a name, a description of what it does, and a contract for inputs and outputs.

The orchestration layer — the agent itself — doesn't implement any of these. It discovers them from the registry at startup, and at request time it decides which ones to invoke and in what order. Adding a new capability means registering a new Lambda. The agent picks it up automatically on next startup; no changes to the agent code.

This matters for team scale. Different people can own different capabilities. The security capability can be improved by the security-focused engineer without touching the orchestrator. The knowledge retrieval capability can be swapped out for a better implementation without any other change propagating through the system. Loose coupling isn't just a software principle here — it's a team operating model.

Architecture diagram showing request flow through orchestrator to capability registry, fanning out to individual capabilities (security scan, TF plan review, code generation, doc analysis, knowledge lookup), with a demand signal feedback loop from unhandled requests back to capability proposals. — Request → classify → compose → respond. Gaps become demand signals. The backlog builds itself.

Self-awareness as a first-class feature

Every interaction OMNI handles gets classified: what domain was it in, which capabilities were invoked, did the response fully satisfy the request, and if not, why not? This classification isn't logged and forgotten — it's stored and surfaced.

The "honest admission" design rule is the most important one: when a request can't be handled by any registered capability, OMNI says so explicitly rather than generating a plausible-sounding non-answer. This turns out to be genuinely hard to get right. Models want to be helpful. The constraint against confabulation has to be enforced at the prompt level, not hoped for.

The demand signal loop: Every time OMNI can't help, it records the request category. After enough unfulfilled requests in a category, it surfaces a capability proposal — "we're getting a lot of requests about X that we can't handle; here's a sketch of what a capability for that would look like." The backlog is built from real demand, not from what seemed like a good idea in a planning meeting.

Governance at the mesh level

Individual capabilities can have different governance requirements. A read-only security scan can run without approval. A capability that modifies infrastructure requires a governance gate — the request is classified, held, and a human approves before execution.

This tiered governance is built into the capability registration contract, not into the orchestrator. When a new capability registers itself as "tier 3 — requires approval", the orchestrator handles that automatically. Capability authors declare what governance tier their capability belongs to; they don't need to implement the approval flow themselves.

Data classification works the same way. A request that touches personal data is automatically flagged and handled according to the data classification registered for the relevant capability. The governance rules don't live in the agent; they live in the mesh contract.

The architecture in practice

OMNI runs on AWS: a Lambda handler, a Strands SDK agent that talks to AgentCore Gateway for capability discovery and MCP-based tool invocation, DynamoDB for sessions and interaction history, S3 for document uploads, and a static frontend. The infrastructure is not unusual — what's different is the agent design.

At launch, OMNI has access to 54 registered capabilities across security, governance, AutoDev, knowledge, and DevOps. Some of those are mature and well-tested; some are early. The maturity of a capability is tracked in the registry, and the agent surfaces that to users — "this capability is in beta; results should be verified" is a better answer than a confident wrong one.

What I've learned from building it

The hardest part hasn't been the capability implementations — those are mostly straightforward Lambda functions. The hardest part is the classification layer: getting consistent, meaningful classifications out of interactions that vary enormously in phrasing, intent, and domain.

The second hardest thing is managing the gap between what OMNI thinks it can do (based on capability descriptions) and what it actually can do (based on the quality of those capabilities). A capability that exists but produces poor output is worse than no capability — because the agent invokes it confidently, and the poor output looks like a confident answer. Quality gates on capability registration are something I'm still working on.

That said — the demand signal loop is the part I'd build first if starting again. Knowing what you can't do, and having that drive what you build next, is a more grounded approach than most teams are working with.

Working on something like this?

Fractional CTO and transformation leadership for situations that aren't working. Bring a problem — thirty minutes, no obligation.

Bring a problem → or scan a repo first →

OMNI: building a self-aware capability mesh

The capability mesh pattern

Self-awareness as a first-class feature

Governance at the mesh level

The architecture in practice

What I've learned from building it

Related tools and articles

Working on something like this?