Token cost is real cost

Token cost is real cost, however apply this level of thinking to real human cost and it's not so much different. Whether you're paying for a graduate or a senior engineer, you wou…

Token cost is real cost, however apply this level of thinking to real human cost and it's not so much different. Whether you're paying for a graduate or a senior engineer, you would expect different quality of thinking and output based on their experience.

If you want better work with AI, the lever isn't to argue about the cost. It's to spend the budget you have on effort, deliberately.

Anthropic's recent postmortem is consistent with this. They lowered default reasoning effort to fix latency, called it the wrong tradeoff, and under public scrutiny/feedback they reverted settings.

If you want higher quality output with AI there are four places to explore: model, configuration, prompting, agents.

On model. Opus is still the strongest choice for critical decisions and architectural reasoning. Sonnet is usually good enough for coding and simple repetitive tasks. Use the right model for the task at hand. If you cheap out on the model, you can't expect quality on the output.

On configuration. /effort runs from low to max. Opus 4.7's default is xhigh. Set the level to fit the work, a quick edit doesn't need max, an architectural decision does. The cheapest move and the one most people skip.

On prompting: three patterns I find the most effective.

  1. "Ask questions if unsure." Without this you're not giving the model an out, which closes off the possible solutions even when there's no clean answer and tradeoffs need to be surfaced.

  2. "Time and cost are not factors here. Prefer robust, sustainable, scalable solutions, do not leave tech debt." This inverts the implicit optimisation pressure for the duration/cost of the task.

  3. "Reflect on this session and encode via claude.md, or skills what you learned, so the next iteration doesn't repeat the same mistakes." This is a pattern worth capturing as a skill and iterating for yourself to see what works for you. Without this every session starts from zero, potentially repeating the same mistakes you've corrected within the current session.

On agents. Without going into extensive details as this is a whole post in itself, the pattern that works for me is using agents to separate concerns. One agent does spec review paired against the code (code is source of truth). A separate agent does code review after implementation.

Engineering and product teams have always navigated the tricky nature of balancing speed to market with time, cost, and quality. AI is no different.

The difference is what levers you choose to utilize. Spend the budget on effort deliberately, and the work comes back at the level you actually want.