A new research framework called TACO (Trajectory-based Automatic Context Optimization) proposes a solution to a fundamental bottleneck in long-horizon AI agents: context overload. Instead of relying on hand-tuned heuristics to decide what information to keep or discard from a growing context window, TACO allows the agent itself to learn optimal compression rules directly from its interaction history. Initial testing shows it reduces token consumption by roughly 10% while maintaining or improving accuracy on coding and terminal benchmarks.
Key Takeaways
- Researchers introduced TACO, a framework that enables terminal agents to automatically discover and refine context compression rules from their own interaction trajectories.
- This approach cuts token overhead by approximately 10% on benchmarks like TerminalBench and SWE-Bench Lite while preserving task accuracy.
What TACO Solves: The Context Bottleneck

Long-horizon terminal agents—AI systems that perform extended sequences of actions in command-line or development environments—continuously accumulate observations, commands, and outputs. This ever-expanding context is costly (increasing API latency and expense) and can drown the agent's own reasoning in noise. Manually designing rules to compress this history (e.g., "summarize every 10 steps") is brittle and doesn't generalize across different tasks or environments.
TACO addresses this by treating context compression as a learnable component of the agent's policy. The core idea is self-evolution: the agent analyzes its own successful and unsuccessful interaction trajectories to discover patterns about which pieces of context are truly necessary for future decision-making.
How It Works: Learning Compression from Trajectories
TACO functions as a wrapper that can be integrated with existing terminal agents. Its operation is a continuous loop:
- Trajectory Collection: The agent operates in an environment (e.g., a TerminalBench task), producing a trajectory of states, actions, and outcomes.
- Rule Discovery: TACO analyzes these trajectories, specifically looking for correlations between the presence (or absence) of certain context elements and the agent's eventual success or failure. It uses this to hypothesize compression rules. For example, it might learn that detailed
ls -laoutput can be summarized to just file names after the agent has viewed them once, or that error messages from failed compilation attempts are critical to keep until the error is resolved. - Rule Refinement & Application: These learned rules are refined over time and applied to filter and compress the agent's active context window before it is fed into the model for the next reasoning step. This process reduces the number of tokens processed in each step.
The framework is designed to be task-adaptive. The rules it learns for a software engineering benchmark (SWE-Bench Lite) will differ from those it learns for a system administration benchmark (TerminalBench), because the salient information differs.
Key Results: 10% Token Reduction with Preserved Accuracy
The paper presents results from testing TACO with the MiniMax-2.5 model on three benchmarks:
- TerminalBench: A benchmark for evaluating agents on real-world command-line tasks.
- SWE-Bench Lite: A subset of the popular software engineering benchmark for fixing GitHub issues.
- CompileBench: A benchmark focused on code compilation and build tasks.
The primary result is a consistent ~10% reduction in token overhead across these evaluations. Crucially, this reduction did not come at the cost of performance degradation; the paper reports that task accuracy was either preserved or improved. This suggests the learned compression rules are effectively discarding redundant or irrelevant information rather than critical clues.
TerminalBench ~10% fewer tokens Lower cost & latency for CLI automation SWE-Bench Lite ~10% fewer tokens, stable/improved pass rate More efficient long-horizon coding agents CompileBench ~10% fewer tokens Efficient context handling in multi-step buildsWhy It Matters: A Path Beyond Hand-Tuning

For developers building production AI agents, managing context is a pressing engineering challenge. TACO offers a concrete, automated alternative to the painstaking process of manual prompt engineering and context window management. By making the agent responsible for learning what matters, it points toward more robust and generalizable systems.
The ~10% token savings is a direct cost reduction for API-based models. For long-running agents, this compounds significantly. Perhaps more importantly, it demonstrates the viability of meta-learning for agent efficiency—where the agent optimizes not just its task policy, but also its own cognitive overhead.
gentic.news Analysis
TACO arrives as the AI community shifts focus from raw model capability to practical efficiency and cost reduction in agentic systems. This aligns with a trend we've tracked closely, including Cohere's Command R 8.5B release targeting enterprise ROI and the rise of speculative decoding techniques to speed up inference. TACO tackles the complementary problem of context efficiency, which becomes the dominant cost driver in long-horizon tasks.
The framework's wrapper-based design is strategically pragmatic. It doesn't require retraining foundation models, making it immediately applicable to the current ecosystem of models powering terminal agents, like Claude 3.5 Sonnet, GPT-4o, or the MiniMax-2.5 used in the paper. This positions TACO as a potential near-term tool for developers, similar to how retrieval-augmented generation (RAG) was adopted as a plug-in enhancement.
However, the research leaves open questions for practitioners. The 10% improvement is meaningful but not revolutionary; the real test will be its performance on truly extended, multi-session agent deployments beyond benchmark settings. Furthermore, the computational overhead of the rule discovery and refinement process itself is not detailed—if it's substantial, it could offset the token savings. TACO represents an important step toward self-optimizing agents, but its ultimate impact hinges on these practical engineering considerations and its ability to scale beyond the 10% efficiency gain.
Frequently Asked Questions
What is TACO in AI?
TACO (Trajectory-based Automatic Context Optimization) is a research framework that enables AI agents, particularly those operating in terminal environments, to automatically learn and apply rules for compressing their growing context history. It reduces the amount of information the agent needs to process at each step by identifying and retaining only the most crucial details from past interactions.
How does TACO reduce token usage?
TACO reduces token usage by analyzing an agent's past interaction trajectories to discover patterns about which pieces of context are essential for success. It then uses these learned rules to filter and summarize the agent's active context window before each reasoning step, removing redundant or irrelevant tokens. In tested benchmarks, this method achieved approximately a 10% reduction in overall token consumption.
Can I use TACO with existing AI agents like Claude or GPT?
Yes, based on the paper's description. TACO is designed as a wrapper that can be integrated with existing terminal agents. It operates by intercepting and processing the agent's context before it is passed to the underlying language model (like MiniMax-2.5, Claude, or GPT). This means it should be model-agnostic in principle, though integration effort would be required.
What are the main benefits of self-evolving compression over hand-tuned rules?
Hand-tuned compression rules are brittle and require expert knowledge for each new task or environment. They often fail to generalize. Self-evolving compression, as implemented by TACO, allows the agent to adapt its compression strategy directly from experience, leading to rules that are specifically tailored to the task at hand. This results in more robust performance across diverse environments and eliminates the need for manual rule engineering.









