Handed a trillion tokens to deploy in a month, could you do it productively? That capacity, not the raw spend, is an honest read on how agentic you are.

Agentic AIAI StrategyToken EconomicsClaude CodeCodex
Share:
A TRILLION TOKENS

A TRILLION TOKENS

By Amir H. Jalali6 min read
I have been turning a thought over for a few weeks. "Can you deploy tokens at scale?" It started in my head as half a joke, lifted straight from the gym, from "Do you even lift?" It carries the same mix of swagger and quiet insecurity. But the longer I sat with it, the more I think there is a real question underneath, and it might be one of the more honest reads on where a company actually stands.

Here is the version I keep coming back to. If someone handed you a trillion tokens and told you to deploy them in a month, could you? Not could you afford it. Could you actually route that much model output through real work, without most of it ending up in a demo nobody opens.

A trillion used to be a number you reached for to sound absurd. It stopped feeling that way this month. Last week the world got its first trillionaire, Elon Musk, after the SpaceX listing, and the word is suddenly in the air. So it is a fair unit to think in.

I want to be careful here, because I argued something that sounds like the opposite not long ago. I wrote that token consumption is the wrong number to chase, that the tokenmaxxing leaderboards were measuring an input and calling it a score, that the flex was never how many tokens you could burn but how few you turned out to need. I still believe that.

So let me draw the line clearly. The trillion token question is not asking you to burn a trillion tokens for the bragging rights. It is asking whether you could deploy them productively if you wanted to. Those are completely different things. One is vanity. The other is capacity.

And capacity is the part most companies quietly fail. Not because they are being frugal. Because they have not wired enough of their actual work to a model to absorb that kind of volume. The constraint was never access. You can get an API key in five minutes. The constraint is surface area. Deploying a trillion tokens means you have a trillion tokens worth of work a model is trusted to touch, and most organizations have connected almost none of their operations to one yet.

So the first bar is simply whether you can do it at all. This already separates the companies that talk about AI from the ones that have changed how work moves through the building.

The second bar is the one that matters, and it lives in the word productively. Spending a trillion tokens is trivial if you do not care what comes out. You could generate a trillion tokens of summaries nobody reads by Friday. Productive deployment means the tokens are doing work that would otherwise need a person, or work no person had time for, and the output feeds back into something real. That gap, between spending and spending well, is where the whole conversation actually lives.

This is why I think the question works as a proxy. How agentic a company is turns out to be very hard to measure from the outside, and the usual metrics lie to you. Number of pilots, number of licenses, how many slides have the word AI on them. None of it tells you much. Real productive token deployment is harder to fake. It reflects how many workflows have an agent in the loop, how many people have actually handed a task to a model and trusted the result, and whether the plumbing exists for an agent to do something rather than just say something.

The goal of running an agentic company is one many are working toward right now, and the instinct is right, because the payoff is real. But agentic is not a switch. It is the slow accumulation of trusted surface area, one workflow at a time, until a meaningful share of the company's work can be done by something that does not sleep.

There is a parallel in how the best companies already think about growth. The ones that really have it figured out know their cost to acquire a customer, and they know the return on that spend inside a tight range. Once you know that number, scaling stops being scary. You pour more in because you know what comes back. Productive token deployment is heading to the same place. When you know what a few million tokens of agent work returns, deploying a trillion of them stops looking reckless and starts looking obvious.

The economics are the third part, and this is where my earlier argument comes back rather than contradicts it. Being able to deploy at scale and being able to deploy efficiently are two different capabilities, and you build them in order. First you have to be able to do it at all. Then you have to do it productively. Only then is it worth obsessing over the cost per unit of value, the prompt caching, the routing of routine work to a small model, all the things that turn an expensive habit into a sustainable one. You cannot optimize a ratio for a thing you are not yet doing. Capacity first, efficiency second. They are sequential, not opposed.

I have watched my own consumption climb in a way that would have looked absurd to me a year ago. As I wired more of my own work to agents, running coding loops, handing whole tasks to Claude Code and Codex instead of doing them by hand, the constraint quietly stopped being cost. It became imagination. The bottleneck is no longer how many tokens I can afford. It is how much of my work I have figured out how to hand off.

That is the part I did not expect. The trillion token question sounds like it is about spend, but it is really about readiness. It asks how much of what you do has been made legible enough for a machine to take a real run at it.

The companies that can honestly answer yes are going to look very different in a couple of years from the ones that cannot. And the uncomfortable thing about the thought is that it is a little bit true. The number is not vanity, as long as you keep the word productively attached to it. It is a rough read on how much of your work you have actually been willing to let go of.
Generated withclaude-opus-4-8+GPT Image 2.0