Token Efficiency2025· 5 min read

Use Less Tokens, Get Better Results

Bloated prompts don't just cost money — they dilute Claude's focus. Here's how to say more with less and get sharper outputs every time.

There's a common misconception that longer prompts produce better results. More context, more examples, more explanation — surely Claude will understand better? In practice, the opposite is often true. Noise degrades signal. Padding buries the instruction that actually matters.

Token efficiency isn't just about cost (though that matters too). It's about clarity. The best prompts are precise, structured, and ruthlessly edited.

The Token Efficiency Mindset

Before writing a prompt, ask: what does Claude actually need to know to do this well? Everything else is overhead. You're not writing a document — you're giving instructions to an expert who can infer a lot from a little.

💡 If you can't explain what you want in three sentences, you probably don't know what you want yet. Clarify your own thinking first, then prompt.

Practical Techniques

1. Front-load the instruction

Put the most important directive at the top, not buried in context. Claude reads sequentially — the first thing it sees shapes how it interprets everything after.

❌ Verbose

"I have a Node.js project and I've been working on it for a while and I need some help with the authentication module. I was thinking maybe you could look at the code and tell me what's wrong with the error handling..."

✅ Efficient

"Review the error handling in src/auth/login.js. Flag any cases where errors are swallowed without logging."

2. Use CLAUDE.md instead of repeating context

If you find yourself explaining your project setup, conventions, or tech stack at the start of every session — put it in CLAUDE.md. Claude reads it automatically. You stop repeating yourself. Tokens saved, every session.

3. Reference files, don't paste them

Instead of pasting 200 lines of code into your prompt, say "look at src/utils/parser.js" and let Claude read it. This keeps your prompt clean and lets Claude see the file in its full context rather than as a decontextualized snippet.

4. Use prompt caching for long context

If you're building with the Claude API and have long system prompts or large document contexts that don't change between requests, use prompt caching. Cached tokens cost 90% less and respond faster. For RAG pipelines and document-heavy workflows, this is a significant saving.

5. Delegate to sub-agents

For complex tasks, instead of one massive prompt, break it into a pipeline of focused sub-agents. Each agent gets only the context it needs for its specific job. Less bloat, better results, easier to debug when something goes wrong.

6. Be specific about output format

Vague output requests generate verbose responses. "Explain X" produces essays. "List 5 bullet points about X, max 15 words each" produces exactly what you need. Format constraints reduce output tokens too.

What to Cut

Pleasantries — "Please could you kindly help me with..." → just say what you need
Redundant context — anything Claude can infer from the code or CLAUDE.md
Hedge qualifiers — "maybe", "perhaps", "if possible" — just state the requirement
Re-explanations — trust that Claude read your previous message; don't summarize it back

🚀 The compound effect: A 30% reduction in prompt tokens across 100 daily Claude Code sessions saves hours of context window and hundreds of dollars monthly at scale. Efficiency compounds.

ClaudeTokensPrompt EngineeringEfficiencyAPI

👨‍💻

Mayur Rele

Senior Director, IT & Information Security · Parachute Health

15+ years in DevOps, cloud, and cybersecurity. 700+ research citations. Scientist of the Year 2024.

← Back to all posts