Tamp Compression Proxy Cuts Claude Code Token Usage 52% — Zero Code Changes

Run a local proxy that automatically compresses Claude Code's API calls, cutting token usage in half without modifying your workflow.

AAAla SMITH & AI Research Desk·Mar 25, 2026·3 min read··237 views·AI-Generated·Report error

Source: github.comvia hn_claude_code, medium_claudeMulti-Source

What It Does — Automatic Token Compression for Coding Agents

Tamp is a local HTTP proxy that sits between your coding agent (Claude Code, Aider, Cursor, etc.) and the AI provider's API. It automatically compresses tool_result blocks in API requests before forwarding them upstream, achieving 52.6% fewer input tokens on average. The key insight: coding agents send massive amounts of structured data (JSON, arrays, line-numbered output) that can be aggressively compressed without losing meaning.

Source code and error messages pass through untouched — only the verbose metadata gets optimized. This happens transparently; you use Claude Code exactly as before.

Setup — Two Minutes to Start Saving Tokens

Install and run Tamp with a single command:

npx @sliday/tamp

Or install globally:

curl -fsSL https://tamp.dev/setup.sh | bash

On first launch, Tamp shows an interactive prompt letting you toggle compression methods. Use -y to skip this in CI/scripts:

npx @sliday/tamp -y

Tamp runs on localhost:7778 and auto-detects your agent's API format (Anthropic, OpenAI, Gemini).

Configure Your Agent — One Environment Variable

For Claude Code:

export ANTHROPIC_BASE_URL=http://localhost:7778
claude

For Aider:

export OPENAI_API_BASE=http://localhost:7778
aider

For Cursor/Cline/Windsurf: Set the API base URL to http://localhost:7778 in your editor's settings.

That's it. Tamp compresses silently in the background while you work.

How The Compression Works — Five Optimization Stages

Tamp applies multiple compression techniques, all enabled by default:

JSON minify — Removes unnecessary whitespace from JSON structures
TOON columnar encoding — Optimizes arrays by deduplicating repeated structures
Strip line-number prefixes — Removes 1:, 2:, etc. from numbered output
General whitespace reduction — Compresses other structured text
LLMLingua integration — Advanced semantic compression (requires Python)

You can disable specific stages via environment variables:

# Skip LLMLingua (no Python dependency)
TAMP_STAGES=minify,toon,strip-lines,whitespace npx @sliday/tamp -y

Why This Matters — Compounding Savings

Because each API call resends the full conversation history uncompressed, Tamp's compression compounds with every turn. An in-memory cache ensures identical content is only compressed once per session. This means longer conversations save proportionally more tokens.

What About Codex?

Tamp originally supported Codex CLI but pulled support because Codex uses OpenAI's Responses API (POST /v1/responses) with a different request shape than Chat Completions. Codex also sends zstd-compressed bodies, adding another layer of complexity. The developers plan to revisit support once the Responses API format stabilizes.

Advanced Configuration

All configuration happens through environment variables:

TAMP_STAGES — Control which compression methods run
TAMP_PORT — Change from default 7778
TAMP_LOG_LEVEL — Debug, info, warn, error

Run from source if you want to modify the compression logic:

git clone https://github.com/sliday/tamp.git
cd tamp && npm install
node bin/tamp.js

The Bottom Line

Tamp delivers immediate token savings with zero workflow disruption. For teams running Claude Code at scale, this could translate to significant cost reduction. For individual developers, it extends context window effectiveness without changing how you prompt.

Source: gentic.news · Mar 25, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

**Immediate Action:** Run `npx @sliday/tamp` in a separate terminal tab today. Set `ANTHROPIC_BASE_URL=http://localhost:7778` before your next Claude Code session. The setup takes 60 seconds and works immediately. **Workflow Change:** No change to how you use Claude Code. Tamp operates transparently at the network layer. The only difference is your API calls now route through localhost:7778 instead of directly to Anthropic. **When It Shines Most:** Tamp provides maximum benefit during long, iterative debugging sessions where Claude Code repeatedly analyzes large JSON outputs, file listings, or command results. Each back-and-forth compounds the savings. **Configuration Tip:** If you don't have Python installed or want minimal dependencies, use `TAMP_STAGES=minify,toon,strip-lines,whitespace` to disable LLMLingua. You'll still get ~40% compression from the structural optimizations.

#tokens #workflow #optimization #claude-code #tools

Compare side-by-side

Claude Code vs Tamp

→

Mentioned in this article

Claude Code Tamp Cursor Aider

Enjoyed this article?