Serverless agent architecture

Summary

Who it's for Engineers choosing a runtime for agentic workloads who are weighing Lambda against containerised servers or managed agent frameworks.

3 key takeaways

The four-tier hierarchy (Orchestrator, Subagents, Skills, Tools) maps cleanly to Lambda. Each tier can be a separate function, a module within a function, or both, depending on latency and cost trade-offs.
Serverless isolation is not a feature you implement. It is a property of the infrastructure. Each Lambda invocation has its own memory space. Cross-tenant leakage is structurally impossible, not just unlikely.
The AWS Free Tier constraint forces architectural clarity. 1 million requests per month, 400,000 GB-seconds of compute. Every design decision that respects these limits also happens to be the right design decision for production agents.

~8 min read

Four tiers. Each tier knows less about the tiers below it.

Why the four tiers matter

The tier hierarchy is not an organisational preference. It's a coupling contract. The Orchestrator knows about Subagents but not about how they implement their domain logic. A Subagent knows about Skills but not about which API they call. A Skill knows about Tools but not about how the result will be used. Each tier can be replaced without touching the tiers above it.

In Lambda terms, this usually means: the Orchestrator is one function, Subagents are separate functions (or separate modules within the Orchestrator's deployment package, if the invocation cost of separate functions is not justified), Skills are Python modules, and Tools are boto3 clients or HTTP calls. The right split depends on whether the Subagent needs its own timeout, memory allocation, or IAM role. If yes, it's a separate function. If not, it's a module.

The ticketyboo.dev ops team has four Subagents (CTO, SRE, Security, Cost). Each is a separate Lambda function with its own EventBridge schedule, its own IAM role, and its own timeout. They are independent. The CTO agent's failure does not affect the SRE agent. That's the correct failure isolation boundary for operational telemetry.

Serverless isolation isn't a feature you implement. It's a property of the infrastructure.

Isolation by construction

A server-based agent that handles multiple concurrent conversations needs explicit isolation code: separate context objects, careful scoping of shared state, tests that prove one session cannot read another's memory. This is not hard to get right, but it is easy to get wrong, and the failure mode (cross-session state leakage) is a security problem, not just a correctness problem.

Lambda invocations are isolated by the runtime. Each invocation has its own memory address space. Global variables are scoped to the container, not the invocation. A Lambda that handles a request from tenant A cannot, by any code path, read the in-memory state of tenant B's concurrent invocation. This is a runtime guarantee, not a code convention.

For agentic workloads, this matters more than for simple APIs. An agent handling a security assessment has access to credentials, configuration, and sensitive findings. The isolation boundary is the first line of defence. With Lambda, that boundary requires no code.

Stateless execution and external state

Lambda's stateless model forces a clean separation between compute and state. Everything that needs to persist goes to an external store: DynamoDB for structured agent telemetry, S3 for documents and scan results, SSM Parameter Store for configuration, EventBridge for scheduling. The Lambda function itself is stateless.

This turns out to be the right model for agents. An agent's working memory is the invocation's local variables: they exist for the duration of the call and are discarded. An agent's persistent memory is an external store: it exists between calls and is explicitly written and read. The boundary between working memory and persistent memory is the Lambda invocation boundary. You can't accidentally blur it.

The Roo context MCP server is the external persistent memory for the AI coding agents. The DynamoDB team-activity table is the persistent memory for the ops agents. The S3 bucket holds scan results and lessons documents. Nothing important lives in Lambda memory.

The Free Tier constraint as an architectural forcing function

The AWS Free Tier allows 1 million Lambda requests per month and 400,000 GB-seconds of compute. At 128MB per function, that is 3.2 million seconds of execution time. For the ticketyboo.dev workload (four scheduled ops agents, on-demand scanner invocations, API calls), the Free Tier is not a constraint. It is a comfortable envelope.

But the constraint matters architecturally. A design that fits within the Free Tier limits is, almost by definition, a design that has avoided unnecessary compute, unnecessary concurrency, and unnecessary complexity. The optimisations that keep a workload within the Free Tier (right-sized memory, short timeouts, stateless execution, external state) are the same optimisations that make the workload reliable and cheap in production.

Resource-aware design is not about penny-pinching. It's about understanding what your workload actually needs. An agent that runs in 250ms at 128MB does not need a 3GB container with a 30-second timeout. The tighter the envelope, the clearer the design.

What the framework abstraction costs

Most agentic frameworks assume a server: a long-running process that maintains state, handles concurrent sessions, and provides a streaming interface. Some provide Lambda adapters. The adapters work, but they often carry assumptions (persistent connection pools, in-memory caches, background threads) that don't translate cleanly to the Lambda execution model.

The ticketyboo.dev agents are plain Python. Each Lambda handler is a module with a function. There is no framework. The tool definitions are defined inline. The prompts are strings. The state is DynamoDB and S3. The cost of "no framework" is a few hundred lines of boilerplate per agent. The benefit is a codebase where every execution path is visible, every dependency is explicit, and there is nothing the framework does that you cannot see.

Patterns demonstrated

Resource-Aware Optimization (16): Lambda sizing, Free Tier constraints, model tier routing by task complexity
Multi-Agent (7): four independent ops agents with separate invocation boundaries and IAM roles
Memory Management (8): explicit separation of working memory (invocation) from persistent memory (DynamoDB, S3)
Exception Handling (12): Lambda structured error responses, graceful degradation on tool call failures
Goal Setting (11): EventBridge schedules encode when each agent should run and at what frequency

Reference: Antonio Gulli, Agentic Design Patterns with Claude (Anthropic, 2025). Patterns 16, 7, 8, 12, 11. All implementations are original.

If the articles or tools have been useful, a coffee helps keep things running.

Working on something like this?

Fractional CTO and transformation leadership for situations that aren't working. Bring a problem — thirty minutes, no obligation.

Bring a problem → or scan a repo first →