The MCP token tax on a homelab budget

Summary

Who it's for Engineers evaluating MCP against direct API calls for cost-sensitive workloads: homelab automation, AWS Free Tier-constrained deployments, high-volume batch processing where token overhead compounds. Also relevant for anyone designing MCP server schemas and wanting to understand the cost implications of tool count and description verbosity.

Key observations

MCP server description: 200-400 tokens per session setup. Tool schema per registered tool: 50-150 tokens. Invocation format per call: 20-40 tokens. A 10-tool server adds ~1,500 tokens of overhead per session.
At $0.80/MTok input (small model tier): a 10-tool session costs ~$0.0012 extra in overhead. At 1,000 sessions/month: $1.20/month in pure overhead cost.
MCP overhead is justified when tools are reused across sessions (amortised), when tool schemas provide genuine structure the model uses, and when tool-use observability is a requirement.
Direct API calls are preferable for single-purpose scripts, one-shot tasks, and tightly controlled pipelines where you own the entire call path.

The Model Context Protocol adds tokens to every request. This is not a flaw. It is the cost of the abstraction. A protocol that makes tools discoverable, composable, and observable by necessity adds structure. That structure consumes tokens. The question is whether the tokens are buying something.

On a homelab or AWS Free Tier budget, the answer matters more than it does for well-funded production systems. Lambda charges for execution duration. Token counts drive inference costs. A session that consumes 1,500 extra tokens on every invocation is not a concern at 100 sessions/month. It is a line item worth examining at 100,000 sessions/month.

Where the tokens go

An MCP session has three token overhead sources: the server description, the tool schemas, and the per-call invocation format. They are additive and they all hit the input token count.

Server description is the natural language description of what the MCP server does, included in the context when the server is initialised. A well-written server description is 200-400 tokens. A verbose one with examples and caveats can reach 600. This is a one-time cost per session, not per call.

Tool schemas are the JSON Schema definitions of each tool: its name, description, parameter names, types, and descriptions. A minimal tool schema (name, description, 2 parameters) is around 50 tokens. A well-documented tool with 5 parameters and descriptive field descriptions is 100-150 tokens. Multiply by the number of registered tools.

Invocation format is the structured representation of each tool call and its result. The call itself (tool name, parameters as JSON) adds 20-40 tokens. The result wrapper adds another 10-20. On a session with 10 tool calls, that's 300-600 tokens of invocation overhead.

# Token overhead breakdown for a 10-tool MCP server, 10-call session

server_description_tokens     = 300   # one-time, per session
tool_schemas_tokens            = 100   # average per tool
tools_registered               = 10
tool_schema_total              = tool_schemas_tokens * tools_registered  # 1,000

invocation_per_call_tokens     = 30    # average per tool call
calls_in_session               = 10
invocation_total               = invocation_per_call_tokens * calls_in_session  # 300

session_overhead_total         = (
    server_description_tokens +        # 300
    tool_schema_total +                # 1,000
    invocation_total                   # 300
)
# session_overhead_total = 1,600 tokens

# At $0.80 / 1M input tokens (small model tier):
cost_per_session_overhead      = (session_overhead_total / 1_000_000) * 0.80
# = $0.00128 per session in overhead alone

# At 1,000 sessions / month:
monthly_overhead_cost          = cost_per_session_overhead * 1_000
# = $1.28 / month

$1.28/month sounds trivial. It is trivial for most use cases. The point is not that MCP is expensive. The point is that the overhead is predictable and measurable, and that it scales linearly with tool count and session volume. Understanding it lets you make deliberate decisions about tool count, schema verbosity, and whether the abstraction is earning its cost for a given workload.

Token composition for 5 representative tasks: Direct API (D) vs MCP (M). MCP overhead is dominated by tool schemas. Tasks with richer tool sets (T1 code gen, T4 data analysis) show the largest absolute overhead. The overhead is fixed per session, not per token of user content.

The break-even calculation

MCP overhead is not purely a cost to minimise. The schema overhead buys discoverability: the model knows what tools exist, what they accept, and how to call them correctly without bespoke prompt engineering. That's worth tokens if the tools are used repeatedly.

The break-even point is where the structure overhead is justified by the reduction in error rate and re-tries. A direct API call approach that requires 3 attempts to get the correct structured output may cost more tokens in total than an MCP call that gets it right on the first attempt because the schema is explicit.

The relevant calculation is not "MCP overhead vs zero" but "MCP overhead vs the tokens spent on error correction and retry in a direct approach."

Break-even analysis for a 10-tool MCP server vs direct API calls. The MCP setup overhead (~$0.0012) is paid back after approximately 4-5 tool calls per session, because schema-guided calls reduce retry tokens. Sessions with more calls favour MCP. One-shot sessions favour direct API.

When MCP overhead is justified

Three conditions make MCP overhead worthwhile on a constrained budget:

Tools are reused across sessions. The schema overhead is paid once per session, not once per tool call. If a session makes 10 calls to the same tool set, the overhead per call is 150 tokens (15% of the 1,000 token schema cost). If a session makes 1 call and ends, the overhead per call is 1,000 tokens (the full schema cost). The amortisation only works if tools are exercised.

Tool observability is a requirement. MCP tool calls are structured, logged, and inspectable. If you need an audit trail of what tools were called, with what parameters, and what they returned, that observability is built into the protocol. Building equivalent observability into direct API calls requires additional instrumentation. The token overhead may be cheaper than the engineering overhead.

The tool schema provides genuine structure the model uses. A well-written tool schema reduces ambiguity. The model knows the exact parameter names, types, and constraints. On tasks where structured output is required (tool calls that produce machine-readable results consumed by downstream steps), schema precision reduces error rates. The reduction in retries can offset the schema overhead.

When direct API calls are preferable

Three conditions favour direct API calls:

Single-purpose scripts. A script that does one thing, called once, does not benefit from tool discoverability. The MCP overhead adds tokens without providing structure that the model needs to navigate. Write the function directly, call it directly, pay only for the tokens you actually need.

One-shot tasks. If a session makes exactly one tool call and terminates, the amortisation argument fails. The schema overhead for 10 tools is present regardless of which tool is called. For one-shot tasks with a known tool, a direct API call is cheaper and simpler.

You own the entire pipeline. MCP's discoverability value is highest when the client (the model) and the server (the tool provider) are developed independently and need a contract. If you control both sides of the interface, the contract can be simpler. A direct function call with typed parameters achieves the same result with less overhead.

# Direct approach: controlled pipeline, typed interface
async def run_scan_task(repo_url: str, scan_depth: int = 3) -> ScanResult:
    """Call GitHub API and analysis functions directly."""
    repo = await github_client.get_repo(repo_url)
    files = await github_client.list_files(repo, depth=scan_depth)
    analysis = await analyser.run(files)
    return ScanResult(repo=repo, findings=analysis.findings)

# MCP approach: tool exposed for model-driven invocation
# Adds: server description tokens + schema tokens + invocation format tokens
# Justified when: model decides which tools to call, observability required,
# tool reused across many sessions

Reducing overhead without removing MCP

If MCP is the right choice for a workload but overhead needs to be managed, three levers are available:

Reduce tool count per server. Register only the tools a session actually needs. A session that only writes files doesn't need the read tools in its schema. Selective tool registration reduces schema overhead proportionally: a 5-tool server adds ~650 tokens instead of ~1,300 for a 10-tool server.

Tighten tool descriptions. Tool descriptions are the most verbose part of a schema. A description that is 3 sentences instead of 6 saves 50 tokens per tool. Across 10 tools, that's 500 tokens per session. Write descriptions that are precise, not thorough.

Cache schema context. For a hosted API that supports prompt caching (Anthropic's cache_control feature), the tool schemas can be marked as cacheable. The first request pays full token cost. Subsequent requests within the cache window pay a reduced rate or zero for the cached portion. Input cache hits typically cost 10% of the standard input token rate.

The bottom line: a 10-tool MCP server costs approximately $1.20/month in pure overhead at 1,000 sessions/month at small model pricing. That cost is justified if sessions use multiple tools, if observability is a requirement, or if the schema reduces error rates on structured tasks. It is not justified for one-shot scripts, single-tool sessions, or tightly controlled pipelines where you own both sides of the interface. Know what you're paying for.

Working on something like this?

Fractional CTO and transformation leadership for situations that aren't working. Bring a problem — thirty minutes, no obligation.

Bring a problem → or scan a repo first →

The MCP token tax on a homelab budget

Where the tokens go

The break-even calculation

When MCP overhead is justified

When direct API calls are preferable

Reducing overhead without removing MCP

Related

Working on something like this?