The Spreadsheet Never Lies
Back when I was an IT Manager, budget time was the one part of the job I genuinely dreaded.
I was technically biased — give me an infrastructure problem over a finance meeting any day. But the spreadsheet had to be built. So out it came, year after year. Rows and columns of licence costs, support contracts, hardware refresh cycles, staff costs, cloud compute, storage per GB. Every line item accounted for, justified, and defended.
Over the years that spreadsheet grew new rows. Cloud costs arrived and changed everything — suddenly you weren’t buying hardware, you were buying consumption. Then came storage costs per GB, virtual machine sprawl, networking costs, SaaS licensing, and the ongoing headache of software nobody was using but everyone was paying for.
Good IT management has always meant knowing what things cost. Not approximately. Precisely.
Now there is a new line to add to that spreadsheet.
Token cost.
But here is the thing. If you stop at the token line, you are optimising for the meter, not the mission.
Tokens 101 — What They Actually Are
Before the cost makes sense, the concept needs to.
A token is the basic unit an LLM uses to process text. Not a word. Not a character. Something in between — a chunk of text that the model reads, processes, and responds to.
When you type a message to a chatbot, the model doesn’t read it the way you wrote it. It breaks it into tokens first — fragments of words, whole words, punctuation, spaces — and processes each one in sequence. The response it generates is also built token by token, each one predicted from everything that came before.
A rough rule of thumb: one token is approximately four characters, or about three quarters of a word. A typical sentence of fifteen words is roughly twenty tokens. A detailed prompt of five hundred words is somewhere around six hundred and fifty tokens.
It adds up quickly. And every token processed — whether going in or coming out — carries a price.
Tokens Are a Meter, Not a Currency
There is a phrase doing the rounds right now. Tokens are the new currency of AI.
It is a neat soundbite. It is also wrong in all the ways that matter if you are trying to build serious AI capability.
Saying tokens are a currency is like saying you paid your electricity bill in kilowatt hours. You didn’t. You consumed kilowatt hours. You paid in money. The kilowatt hour is a unit of consumption — a meter reading, not a medium of exchange.
Tokens are exactly the same. They measure how much work a model is doing. They are the unit on which vendors calculate your bill. But they are not currency. They are consumption — and like every unit of consumption in IT, they carry a cost that needs understanding, governing, and optimising.
The organisations that treat tokens as a vanity metric — “we consumed X billion tokens last quarter!” — are optimising for the wrong number entirely.
The AI Factory and the Cost Behind Every Token
Dell and NVIDIA use the term AI Factory deliberately — because building AI capability at scale really does look like industrial infrastructure. Data pipelines, compute clusters, model serving layers, orchestration, guardrails. A factory for producing AI output at volume.
And like any factory, every unit of output carries a cost of production.
In an AI Factory, the token is the unit of output. And behind every token sits a cost stack most organisations never fully account for.
Infrastructure — GPU and accelerator time, CPU, RAM, networking, storage, cooling, power. Whether you see this directly or it is baked into a vendor’s price per thousand tokens, it is always there.
Model and platform — licensing for proprietary models, platform margin, optional add-ons for latency, SLAs, and private endpoints. Every provider has a margin sitting in the background of every token.
Data and training — models don’t appear from nowhere. Data acquisition, cleaning, fine-tuning, retrieval pipelines, continuous evaluation. All of it is part of the cost of making your tokens useful in your specific context, not just smart in general.
People — ML engineers, platform teams, application developers, security, compliance, prompt engineers. Labour is amortised over output. From a factory lens, every token carries a share of your people cost.
Guardrails and control — orchestration, content filters, safety checks, observability, caching, A/B testing. These are the conveyor belts and safety systems of your AI Factory. They rarely appear on a per-token price card. They always appear on your balance sheet.
The vendor gives you a clean price per thousand tokens. Your real cost per thousand tokens is considerably messier — and considerably higher.
From Token Cost to Outcome Cost
Here is where the conversation needs to move.
A token is a unit of cost. It is not a unit of value. And on its own, cost per token tells you almost nothing about whether your AI investment is working.
The number that actually matters is cost per outcome.
Swap abstract token consumption for something real: tokens per resolved support ticket. Tokens per sales proposal generated. Tokens per code review completed. Tokens per knowledge worker hour saved. Now you can build a unit economic view that means something.
Cost per outcome = (Tokens per outcome × fully loaded cost per thousand tokens) + overheads
Unit margin = Value per outcome − Cost per outcome
Once you see it this way, the conversations become sharper. A cheaper model per thousand tokens that requires three times the tokens per outcome is not a saving. A use case that looks expensive in tokens but delivers enormous value per outcome is not a problem. A system regenerating the same content repeatedly because nobody implemented caching is a straightforward fix hiding in plain sight.
The Levers: Token Productivity in the AI Factory
If tokens are the output of your AI Factory, token productivity is your primary optimisation lever.
Use the right model for the job. Not everything needs your largest, most capable model. Smaller, cheaper models handle classification, routing, and simple transforms well. Reserve the heavy models for genuinely complex reasoning. A tiered approach — cheap model first, escalate only when needed — can dramatically change your cost per outcome without touching quality.
Optimise prompts and context. Long system prompts and bloated context windows feel powerful. They are also expensive. Strip repetition, keep only relevant context, use structured inputs where possible. Every unnecessary sentence in a prompt is scrap material on the factory floor — and in a high-volume system, scrap accumulates fast.
Cache intelligently. A significant proportion of enterprise AI workloads are repetitive — similar questions, standard documents, known sub-tasks. Response caching, retrieval caching, and partial caching of intermediate steps reduce tokens per outcome without any loss of quality. It is one of the highest-return optimisations available and one of the most consistently overlooked.
Design around outcomes, not demos. Demos optimise for the impressive moment. Factories optimise for throughput and margin. Start from the business outcome, the current human cost of achieving it, and the target cost with AI. Then design the system backwards from that constraint — not forwards from whatever the latest model happens to be capable of.
Token Cost as a Governance Question
This is familiar territory for anyone who has managed cloud costs or software licensing.
Token consumption is a shared resource. Different business units, different applications, and different use cases will consume it at different rates and generate very different outcomes per token. Without visibility into that consumption — tracked by application, by business unit, by use case — you have no basis for budgeting, no mechanism for chargeback, and no way to identify where usage is growing faster than the value it is generating.
A note on agentic AI: if your organisation is moving into agentic deployments — systems that reason across multiple steps, use tools, retrieve information, and check their own work — the token cost profile changes significantly. A standard chatbot interaction might consume a few hundred tokens. An agentic workflow handling the same underlying task can consume tens of thousands. Model it separately. Budget it separately. The capability gain can be substantial, but the consumption profile is a different order of magnitude.
Optimising for the Mission
Back at that budget spreadsheet, the discipline was always the same. Know what you consume. Know what it costs. Know who is consuming it. And know what value it is generating.
Tokens deserve exactly that discipline. Not because they are a currency. Because they are a cost — the most visible signal of the underlying economics of your AI Factory.
The token line on the bill matters. But the executives asking “what is our token budget this year?” are asking the wrong question.
The right questions are these: Which AI-enabled outcomes matter for our business? What is our target cost per outcome? What mix of models, infrastructure, and data do we need to get there? And how do we measure value per outcome — not just tokens consumed?
Tokens are how you keep score in the background. Outcomes are why you are playing.
If your AI strategy stops at tokens, you are optimising for the meter, not the mission.

Leave a Reply