SPEND TOKENS ONCE

I had tens of thousands of unread emails and a plan that felt clever for about an hour. Give Claude access to the inbox, let it read every message, decide what was junk, delete the junk. It would have worked. It would also have been one of the more wasteful things I have ever paid for.

Reading every email means pulling every email into the model's context. Every newsletter, every receipt, every years-old thread, each one costing input tokens, each one making the model stop and deliberate. You would pay for the whole archive once, watch the inbox fill back up the next week, and pay for it again.

The better version is almost boring. You do not have the model read your mail. You have a conversation with it about your mail. Which senders do I never open. Which ones are receipts I want kept but out of sight. What does the spam I actually get look like. Out of that conversation you build filters, the plain Gmail kind, and those filters run on every message from now on for nothing.

That is the whole move. Use the expensive thing briefly to produce a cheap thing that does the work.

Someone on r/ClaudeAI did exactly this and wrote it up. Instead of having the model grind through the inbox, they sat with it and built rules that prune the mail going backward and forward, at a fraction of the cost. The reframe is the important part. You are not paying Claude to clean your inbox. You are paying Claude, for one afternoon, to help you write the thing that cleans your inbox.

This is the opposite of where the culture went for a stretch this spring. In early April the word was tokenmaxxing. The Information reported on an internal Meta leaderboard, Claudeonomics, ranking something like 85,000 employees by how many tokens they consumed, handing out titles like Token Legend and Session Immortal, with the top user reportedly burning 281 billion tokens in a single month. There was even a public leaderboard that logged your coding sessions and ranked you globally by spend. For a few weeks, burning tokens was a flex.

It was always measuring the wrong thing. Token consumption is an input. It tells you how much you spent, not how much you got back. Make it the score and you reward the one behavior you should be trying to avoid.

Then the invoices arrived. By late May the same outlets that hyped it were writing the obituary. Amazon got flagged for gamifying usage. Uber reportedly burned through its entire 2026 token budget in the first four months. Microsoft pulled Claude Code from some teams. Fortune ran a piece titled, more or less, tokenmaxxing is over. The companies cheering the leaderboard in April spent May trying to get the number down.

What the cost-cutters are doing now is the inbox trick at company scale. Prompt caching so the system prompt is not re-billed on every call. Routing the routine work to a small model like Haiku and saving Opus for the parts that are genuinely hard. Pruning context so the whole history is not re-sent on every tool call. One team I saw written up took an agent bill from $87,000 in April to $24,000 in May doing the same work. None of that is a smarter model. It is the same intelligence, spent with intent.

Here is the part the leaderboard had exactly backward. Agents make this worse by default, not better. An agent re-sends its growing context on every step, so a loop that runs all day can cost many times what the same question costs once in a chat window. Left alone, an agent will happily spend tokens forever. The actual skill is keeping it from doing that.

I keep returning to the inbox because it holds the whole idea in one chore. The lazy version and the careful version end at the same clean inbox. One bills you every month and gets worse as your mail grows. The other costs an afternoon of talking and then runs free. The difference was never the model. It was whether you asked it to do the work, or to build the thing that does the work.

So before I point a model at anything repetitive now, I ask one question. Am I about to pay for this once, or every time. If the answer is every time, I usually have not finished thinking. The good version of this work tends to end the same way. The model writes a filter, a script, a small durable thing I own, and then steps out of the loop.

The leaderboards were so confident about the wrong number that it took a quarter of bills to notice. The flex was never how many tokens you could burn. It was how few you turned out to need.

thoughts

SPEND TOKENS ONCE

Related Articles

A TRILLION TOKENS

THE SKILL LAYER

CLAUDE CODE