Spend like you mean it | ticketyboo.dev

Who this is for: Engineering leads and technical decision-makers with AI budget to allocate — or those about to get some.

Key observations:

For every dollar spent on API tokens, expect $2–7 more in hidden costs: token inflation, retries, embedding storage, human review, engineering maintenance, model migration.
Wasted cloud spend climbed to 29% in 2025 — the first increase in five years — as organisations deployed GenAI without frameworks for managing it.
A team that built a custom LLM for generic customer support had an obsolete model by 2026. A team that bought generic and invested the savings in proprietary fraud detection had a defensible moat.
The honest productivity number from 40 companies over 15 months: 10%. Not 10x. Engineer for the 10%.

The FinOps Foundation tracks what organisations do with their cloud spend. In 2025, for the first time in five years, wasted cloud spend went up — to 29% of total cloud budgets.^[1] The culprit was GenAI: teams deploying AI workloads without the cost management discipline they'd developed for traditional cloud services.

It's not a surprise. AI costs work differently from compute costs. They scale at the request level. They vary by model, by prompt length, by context window use. The optimisation levers are inside application code, not cloud consoles. You can't right-size a token the way you can right-size an EC2 instance.

The teams that are managing AI spend well share a common approach: they know which bucket each spend sits in before they commit. Subscribe, burst, or own. The decision criteria for each are different. The risk of getting it wrong is different. And the answer to "what do I actually own after this investment?" changes significantly across the three.

Three buckets

Burst / Prepay

What it is

High-volume short-window work. Batch inference, data processing pipelines, code review automation at scale.

Decision criteria

You can predict the shape of the work even if not the exact timing. Commit discounts and spot pricing make sense.

Risk of getting it wrong

Treating burst workloads as always-on. Paying for idle capacity. Or missing the window and paying on-demand rates for what should have been batch.

Own / Build

What it is

Where your data, pipeline, evaluations, or fine-tuning is the competitive advantage.

Decision criteria

The asset is shaped to your problem. If you outsourced it, you'd own nothing at the end. High asset specificity.

Risk of getting it wrong

Building proprietary infrastructure for something the market will commoditise. Building generic = the model catches up.

The framework isn't original — it maps roughly to how economists think about make-vs-buy decisions. High asset specificity (the asset is shaped to your problem, not the market's) pushes toward ownership. Low asset specificity pushes toward market procurement. What changes in the AI era is that the line between high and low specificity shifts quickly, as foundation models absorb more of what used to require custom builds.

The hidden multiplier

For every dollar you spend on API tokens, expect to spend $2–7 more on everything the pricing page doesn't mention: token inflation from system prompts and conversation history, retry overhead when calls fail or timeout, embedding storage costs, the human review layer for high-stakes outputs, engineering time maintaining prompt templates and handling model migrations, and the cost of switching models when the provider deprecates an API version.^[2]

$2–7 In hidden costs per $1 of token spend

29% Wasted cloud spend — first rise in 5 years

40–60% Hidden premium: data prep, governance, integration

40% Of AI time savings lost to correcting errors

Sources: AI Courses March 2026; FinOps Foundation / Flexera 2025; CloudNuro March 2026; Workday 2026

Cost iceberg diagram. Above the waterline: $1 in API tokens. Below: $2 to $7 in hidden costs including token inflation, retry overhead, embedding storage, human review, engineering maintenance, and model switching. — The token bill is the tip. Integration, maintenance, review, and migration costs sit below the waterline.

The 40% correction tax is the one that catches people out. Workday found that 40% of the time savings attributed to AI are consumed by correcting AI-generated errors and rewriting low-quality outputs. The gain is real. It's just smaller than it looks when you're looking at raw generation speed.

McKinsey estimates that integration and maintenance account for 40–60% of total AI operating costs. The token bill is the visible part. The operational layer — the data pipelines, the evaluation infrastructure, the human review workflows, the incident response when the model hallucinates in production — is the iceberg below the waterline.

The case study that defines the choice

Company A vs Company B (Forbes, March 2026)^[3] Company A built a custom LLM for generic customer support automation. Significant investment, months of work. Company B bought an off-the-shelf model for the same use case and invested the savings into proprietary fraud detection — their own data, their own labels, their own evaluation framework.

By 2026, Company A's custom support model was obsolete. OpenAI had caught up. The thing they'd built proprietary infrastructure around was now a commodity API call. Company B's fraud detection IP had compounded into a defensible moat — trained on their specific transaction patterns, evaluated against their specific fraud vectors, improving with every case they saw.

The lesson isn't that you should never build. It's that what you build proprietary infrastructure around should be something the market won't commoditise under you. Generic customer support was always going to be commoditised. Proprietary fraud patterns for your specific customer base were never going to be.

The test is simple: if OpenAI ships a new model tomorrow that makes this capability 10x better, does that help me or destroy my investment? If it helps you, you were probably right to subscribe. If it destroys your investment, you built something you should have rented.

The lock-in you should actually worry about

The Brookings Institution published a note in March 2026 pointing out something that doesn't get enough attention: every major AI model provider has explicit policies allowing them to disconnect competing AI application developers.^[4] This isn't theoretical — it's in the terms. The platform risk is structural.

The paradox is that the lock-in runs one direction. You can be locked into a provider. The provider has no equivalent lock — open-source alternatives and competitors mean switching costs are near zero for them, not for you. The asymmetry is worth understanding before you build deeply against any single provider's APIs.

The practical response is multi-model routing — building against an abstraction layer that lets you swap the underlying model without rewriting application logic. OpenRouter is one implementation. LangChain, LiteLLM, and similar tools exist for the same reason. The cost is some abstraction overhead. The benefit is optionality when the model landscape changes, which it does approximately every six months.

Model API pricing timeline from 2023 to 2026 showing input token costs per million. A leading model at $10 in 2024 drops to $1.25 in 2026. A large model falls from $15 to $5. An open-weight model enters at $0.14. Budget tiers reach $0.05. — Prices fall 60-70% per model generation. The commodity floor keeps dropping. Build proprietary around things that survive the next price cut.

Where ticketyboo.dev sits

The spending decisions on this platform aren't interesting by scale — it's a small site — but they're illustrative of the framework in practice.

Infrastructure: AWS Free Tier throughout. Lambda, DynamoDB, CloudFront, S3, API Gateway — all within free tier limits. The infrastructure isn't core IP. It's commodity. Subscribe (or in this case, use the free tier).

Secrets management: SSM Parameter Store instead of Secrets Manager. Same capability for this use case, $0 instead of $0.40 per secret per month. The governance framework (Gatekeep) is the owned asset — the rules, the decision process, the evaluation criteria. The infrastructure hosting it is commodity.

Model access: OpenRouter for any model API usage, not direct provider contracts. Switchable. No vendor lock-in on the model layer. The evaluation framework — what "good" looks like for this specific context — is the owned asset.

None of these are large decisions at this scale. The principle they reflect is the same one that applies at enterprise scale: be deliberate about which layer you own and which layer you rent. The owned layer should be the one that's specific to your problem and compounds in value. The rented layer should be commodity infrastructure that you'd be silly to replicate.

The honest number

The most useful single data point on AI productivity comes from a clean analysis of 40 companies over 15 months: the actual measured gain in shipped pull requests as AI adoption increased was about 10%.^[5]

10% is real. It compounds. Over a year, across a team, it's meaningful. It's just not 10x. The BCG study on "AI brain fry" found that the number of AI tools in use didn't correlate with increased productivity — cognitive overhead from managing multiple tools was making workers more exhausted, not more productive.

Engineer for the 10%, not the 10x: The teams that extract the most value from AI investment are the ones that pick the right 10% to automate — the high-frequency, low-judgment, well-specified work — and don't try to automate the parts that require the judgment they're trying to preserve. Focused deployment beats broad adoption for productivity outcomes.

Nature published a paper in March 2026 coining the term "Foundational Uncertainty" to describe the strategic condition that GenAI creates: the capability landscape changes fast enough that established investment models become unreliable. The rational response under deep uncertainty is staged investment — commitments that preserve the option to adjust as the picture clarifies.

You're going to pay for AI infrastructure anyway. The tokens are coming. The question is whether you're paying intentionally — with clarity about what you own, what you rent, and what you're measuring — or paying reactively, chasing adoption metrics while the real cost accumulates below the waterline.

References

↩ "Flexera 2025 State of the Cloud Report — wasted cloud spend rises to 29%" — FinOps Foundation / Flexera, 2025. source
↩ "The hidden costs of AI: $2–7 for every $1 of token spend" — AI Courses, March 2026. source
↩ "Build generic = lose, buy generic + build proprietary = win" — Forbes, March 2026. source
↩ "AI platform lock-in and the right to disconnect" — Brookings Institution, March 2026. source
↩ "40 companies, 15 months: actual AI productivity gain is 10%" — Abi Noda / Dev Interrupted, November 2024–February 2026. source

Working on something like this?

Fractional CTO and transformation leadership for situations that aren't working. Bring a problem — thirty minutes, no obligation.

Bring a problem → or scan a repo first →