The FinOps Foundation tracks what organisations do with their cloud spend. In 2025, for the first time in five years, wasted cloud spend went up — to 29% of total cloud budgets.[1] The culprit was GenAI: teams deploying AI workloads without the cost management discipline they'd developed for traditional cloud services.
It's not a surprise. AI costs work differently from compute costs. They scale at the request level. They vary by model, by prompt length, by context window use. The optimisation levers are inside application code, not cloud consoles. You can't right-size a token the way you can right-size an EC2 instance.
The teams that are managing AI spend well share a common approach: they know which bucket each spend sits in before they commit. Subscribe, burst, or own. The decision criteria for each are different. The risk of getting it wrong is different. And the answer to "what do I actually own after this investment?" changes significantly across the three.
Three buckets
Always-on access that isn't your core IP. Foundation model APIs, monitoring, error tracking, base infrastructure.
Decision criteriaSwitching cost is low. Your use of it is table stakes. You're renting commodity infrastructure.
Risk of getting it wrongBuilding differentiation on top of something commodity. When the commodity improves, your wrapper becomes obsolete.
High-volume short-window work. Batch inference, data processing pipelines, code review automation at scale.
Decision criteriaYou can predict the shape of the work even if not the exact timing. Commit discounts and spot pricing make sense.
Risk of getting it wrongTreating burst workloads as always-on. Paying for idle capacity. Or missing the window and paying on-demand rates for what should have been batch.
Where your data, pipeline, evaluations, or fine-tuning is the competitive advantage.
Decision criteriaThe asset is shaped to your problem. If you outsourced it, you'd own nothing at the end. High asset specificity.
Risk of getting it wrongBuilding proprietary infrastructure for something the market will commoditise. Building generic = the model catches up.
The framework isn't original — it maps roughly to how economists think about make-vs-buy decisions. High asset specificity (the asset is shaped to your problem, not the market's) pushes toward ownership. Low asset specificity pushes toward market procurement. What changes in the AI era is that the line between high and low specificity shifts quickly, as foundation models absorb more of what used to require custom builds.
The hidden multiplier
For every dollar you spend on API tokens, expect to spend $2–7 more on everything the pricing page doesn't mention: token inflation from system prompts and conversation history, retry overhead when calls fail or timeout, embedding storage costs, the human review layer for high-stakes outputs, engineering time maintaining prompt templates and handling model migrations, and the cost of switching models when the provider deprecates an API version.[2]
Sources: AI Courses March 2026; FinOps Foundation / Flexera 2025; CloudNuro March 2026; Workday 2026
The 40% correction tax is the one that catches people out. Workday found that 40% of the time savings attributed to AI are consumed by correcting AI-generated errors and rewriting low-quality outputs. The gain is real. It's just smaller than it looks when you're looking at raw generation speed.
McKinsey estimates that integration and maintenance account for 40–60% of total AI operating costs. The token bill is the visible part. The operational layer — the data pipelines, the evaluation infrastructure, the human review workflows, the incident response when the model hallucinates in production — is the iceberg below the waterline.
The case study that defines the choice
By 2026, Company A's custom support model was obsolete. OpenAI had caught up. The thing they'd built proprietary infrastructure around was now a commodity API call. Company B's fraud detection IP had compounded into a defensible moat — trained on their specific transaction patterns, evaluated against their specific fraud vectors, improving with every case they saw.
The lesson isn't that you should never build. It's that what you build proprietary infrastructure around should be something the market won't commoditise under you. Generic customer support was always going to be commoditised. Proprietary fraud patterns for your specific customer base were never going to be.
The test is simple: if OpenAI ships a new model tomorrow that makes this capability 10x better, does that help me or destroy my investment? If it helps you, you were probably right to subscribe. If it destroys your investment, you built something you should have rented.
The lock-in you should actually worry about
The Brookings Institution published a note in March 2026 pointing out something that doesn't get enough attention: every major AI model provider has explicit policies allowing them to disconnect competing AI application developers.[4] This isn't theoretical — it's in the terms. The platform risk is structural.
The paradox is that the lock-in runs one direction. You can be locked into a provider. The provider has no equivalent lock — open-source alternatives and competitors mean switching costs are near zero for them, not for you. The asymmetry is worth understanding before you build deeply against any single provider's APIs.
The practical response is multi-model routing — building against an abstraction layer that lets you swap the underlying model without rewriting application logic. OpenRouter is one implementation. LangChain, LiteLLM, and similar tools exist for the same reason. The cost is some abstraction overhead. The benefit is optionality when the model landscape changes, which it does approximately every six months.
Where ticketyboo.dev sits
The spending decisions on this platform aren't interesting by scale — it's a small site — but they're illustrative of the framework in practice.
Infrastructure: AWS Free Tier throughout. Lambda, DynamoDB, CloudFront, S3, API Gateway — all within free tier limits. The infrastructure isn't core IP. It's commodity. Subscribe (or in this case, use the free tier).
Secrets management: SSM Parameter Store instead of Secrets Manager. Same capability for this use case, $0 instead of $0.40 per secret per month. The governance framework (Gatekeep) is the owned asset — the rules, the decision process, the evaluation criteria. The infrastructure hosting it is commodity.
Model access: OpenRouter for any model API usage, not direct provider contracts. Switchable. No vendor lock-in on the model layer. The evaluation framework — what "good" looks like for this specific context — is the owned asset.
None of these are large decisions at this scale. The principle they reflect is the same one that applies at enterprise scale: be deliberate about which layer you own and which layer you rent. The owned layer should be the one that's specific to your problem and compounds in value. The rented layer should be commodity infrastructure that you'd be silly to replicate.
The honest number
The most useful single data point on AI productivity comes from a clean analysis of 40 companies over 15 months: the actual measured gain in shipped pull requests as AI adoption increased was about 10%.[5]
10% is real. It compounds. Over a year, across a team, it's meaningful. It's just not 10x. The BCG study on "AI brain fry" found that the number of AI tools in use didn't correlate with increased productivity — cognitive overhead from managing multiple tools was making workers more exhausted, not more productive.
Nature published a paper in March 2026 coining the term "Foundational Uncertainty" to describe the strategic condition that GenAI creates: the capability landscape changes fast enough that established investment models become unreliable. The rational response under deep uncertainty is staged investment — commitments that preserve the option to adjust as the picture clarifies.
You're going to pay for AI infrastructure anyway. The tokens are coming. The question is whether you're paying intentionally — with clarity about what you own, what you rent, and what you're measuring — or paying reactively, chasing adoption metrics while the real cost accumulates below the waterline.
References
- ↩ "Flexera 2025 State of the Cloud Report — wasted cloud spend rises to 29%" — FinOps Foundation / Flexera, 2025. source
- ↩ "The hidden costs of AI: $2–7 for every $1 of token spend" — AI Courses, March 2026. source
- ↩ "Build generic = lose, buy generic + build proprietary = win" — Forbes, March 2026. source
- ↩ "AI platform lock-in and the right to disconnect" — Brookings Institution, March 2026. source
- ↩ "40 companies, 15 months: actual AI productivity gain is 10%" — Abi Noda / Dev Interrupted, November 2024–February 2026. source
ticketyboo runs five governance agents on every pull request — Security, Cost, SRE, CTO, and Dependency. Evidence signed, audit trail complete.
See how it works 5 free runs, one-time →