context window
30 articles about context window in AI news
How Claude Code's Tool Search Saves 90% of Your Context Window
Tool search automatically defers MCP tool definitions, replacing them with a single search tool that loads tools on-demand, preserving your context window for actual work.
The Cognitive Divergence: AI Context Windows Expand as Human Attention Declines, Creating a Delegation Feedback Loop
A new arXiv paper documents the exponential growth of AI context windows (512 tokens in 2017 to 2M in 2026) alongside a measured decline in human sustained-attention capacity. It introduces the 'Delegation Feedback Loop' hypothesis, where easier AI delegation may further erode human cognitive practice. This is a foundational study on human-AI interaction dynamics.
Memory Sparse Attention (MSA) Enables 100M Token Context Windows with Minimal Performance Loss
Memory Sparse Attention (MSA) is a proposed architecture that allows AI models to store and reason over massive long-term memory directly within their attention mechanism, eliminating the need for external retrieval systems. The approach reportedly enables context windows of up to 100 million tokens with minimal performance degradation.
Claude Code's 1M Context Window is Now Free: How to Use It Today
Claude Opus 4.6 and Sonnet 4.6 now include the full 1 million token context window at standard pricing by default in Claude Code. No premium, no extra flags.
AI Giants Poised for Breakthrough: 1 Trillion Parameter Models with Million-Token Context Windows
Industry insiders hint at imminent releases of AI models with unprecedented scale—1 trillion parameters and 1 million token context windows. This represents a quantum leap in AI capability that could transform how we interact with technology.
Qwen's 9B Base Model Breaks Language Barriers with 1M Context Window
Alibaba's Qwen team has released Qwen3.5-9B-Base, a multimodal foundation model supporting 201 languages with a massive 1 million token context window. The model features a hybrid DeltaNet-MoE architecture designed for efficient inference.
OpenAI's GPT-5.4: The Million-Token Context Window That Changes Everything
OpenAI's upcoming GPT-5.4 will feature a groundbreaking 1 million token context window, matching competitors like Gemini and Claude. The model introduces an 'Extreme reasoning mode' for complex tasks and represents a shift toward monthly updates.
Anthropic's Sonnet 4.6 Emerges: Mid-Tier Model with 1M Token Context Window Confirms Leaks
Anthropic's newly revealed Sonnet 4.6 model features impressive evaluations for a mid-tier AI and a groundbreaking 1M token context window, validating earlier leaks about the company's development roadmap.
How to Run Claude Code 24/7 Without Burning Your Context Window
Implement a hard 50K token session cap and a three-tier memory system (daily notes, MEMORY.md, PARA knowledge graph) to prevent context bloat and memory decay in long-running Claude Code agents.
Claude Code's 1M Context Window Is Now GA — And It's Priced Like Regular Context
Claude Opus 4.6 and Sonnet 4.6 now support 1M tokens with no long-context premium, making massive codebase analysis cheaper than competitors.
Stop Losing Agent Context: Implement Session Memory Files in Your Claude
A simple pattern using structured markdown files to persist session state across context windows, preventing Claude Code agents from redoing work or making inconsistent decisions.
Anthropic's Claude Skills Implements 3-Layer Context Architecture to Manage Hundreds of Skills
Anthropic's Claude Skills framework employs a three-layer context management system that loads only skill metadata by default, enabling support for hundreds of specialized skills without exceeding context window limits.
Meta's QTT Method Fixes Long-Context LLM 'Buried Facts' Problem, Boosts Retrieval Accuracy
Meta researchers identified a failure mode where LLMs with 128K+ context windows miss information buried in the middle of documents. Their Query-only Test-Time Training (QTT) method adapts models at inference, significantly improving retrieval accuracy.
Qwen 3.6 Plus Preview Launches on OpenRouter with Free 1M Token Context, Disrupting API Pricing
Alibaba's Qwen team has released a preview of Qwen 3.6 Plus on OpenRouter with a 1 million token context window, charging $0 for both input and output tokens. This directly undercuts paid long-context offerings from Anthropic and OpenAI.
Context Cartography: Formal Framework Proposes 7 Operators to Govern LLM Context, Moving Beyond 'More Tokens'
Researchers propose 'Context Cartography,' a formal framework for managing LLM context as a structured space, defining 7 operators to move information between zones like 'black fog' and 'visible field.' It argues that simply expanding context windows is insufficient due to transformer attention limitations.
Zhipu AI Announces GLM-5.1 Series, Featuring 1M Context and 128K Output Tokens
Zhipu AI has announced the GLM-5.1 model series, featuring a 1 million token context window and support for 128K output tokens. The update includes multiple model sizes and API availability.
DeepSeek V4-Pro: 1.6T parameters, open weights, undercuts rivals 10x
DeepSeek unveiled V4-Pro and V4-Flash, its largest open-weight models with up to 1.6 trillion parameters and a 1M-token context window. The new hybrid attention architecture cuts compute for long contexts by 73–90%, enabling prices far below OpenAI, Google, and Anthropic.
Why Zed's Parallel Agents Won't Fix Your Real Bottleneck (And What Will)
Zed's parallel agents cut refactoring time 60% on independent modules but introduced conflicts on shared dependencies. The bottleneck isn't speed — it's context window limits.
PerfectSquashBench Tests Image Model Anchoring Bias vs. Text Models
Wharton professor Ethan Mollick released PerfectSquashBench, a test showing image generation models exhibit stronger anchoring bias than text models, getting 'stuck' on initial directions and requiring context window clearing.
Developer Builds LLM Wiki 'Second Brain' for AI Coding Agents
A developer built an 'LLM Wiki' that feeds an AI coding agent's context window with a living knowledge base of a specific codebase. This aims to solve the agent's short-term memory problem, leading to more consistent and informed code generation.
How to Decode Anthropic's Press Releases for Better Claude Code Updates
Claude Code users should learn to filter Anthropic's technical announcements for actionable updates on model capabilities, context windows, and API pricing that affect daily development.
Atomic Chat Integrates Google TurboQuant for Local Qwen3.5-9B, Claims 3x Speed Boost on M4 MacBook Air
Atomic Chat now runs Qwen3.5-9B with Google's TurboQuant locally, claiming a 3x processing speed increase and support for 100k+ context windows on consumer hardware like the M4 MacBook Air.
Claude Code's Hidden Token Cap: How to Work Around It and Stay Productive
Anthropic is silently reducing effective context window via token inflation. Here's how Claude Code users can adapt their workflows to maintain productivity.
Alibaba's Qwen3-Coder-Next: The 80B Parameter Coding Agent That Only Uses 3B at Inference
Alibaba has unveiled Qwen3-Coder-Next, an 80B parameter coding agent that activates just 3B parameters during inference. It achieves competitive performance on SWE-Bench and Terminal-Bench while supporting a 256K context window.
Neural Paging: The Memory Management Breakthrough for Next-Gen AI Agents
Researchers propose Neural Paging, a hierarchical architecture that decouples symbolic reasoning from information management in AI agents. This approach dramatically reduces computational complexity for long-horizon reasoning tasks, moving from quadratic to linear scaling with context window size.
Qwen 3.5 Small Models Defy Expectations, Outperforming Giants in Key AI Benchmarks
Alibaba's Qwen 3.5 small models (4B and 9B parameters) are reportedly outperforming much larger competitors like GPT-OSS-120B on several metrics. These compact models feature a 262K context window, early-fusion vision-language training, and hybrid architecture, achieving impressive scores on MMLU-Pro and other benchmarks.
OpenAI Unleashes Real-Time Coding Revolution with GPT-5.3-Codex-Spark
OpenAI has launched GPT-5.3-Codex-Spark, its first real-time coding model offering 15x faster generation and 128k context window. Currently in research preview for ChatGPT Pro users, this breakthrough promises to transform software development workflows.
MIT's RLM Handles 10M+ Tokens, Outperforms RAG on Long-Context Benchmarks
MIT researchers introduced Recursive Language Models (RLMs), which treat long documents as an external environment and use code to search, slice, and filter data, achieving 58.00 on a hard long-context benchmark versus 0.04 for standard models.
Codex 'Chronicle' Research Preview Adds Memory for Daily Developer Context
A research preview of 'Chronicle' for Codex has been released. It enables the AI coding assistant to accumulate memories from a developer's daily workflow to improve context.
How Claude Code's 'Conversational Context' Beats One-Off Codex Generations
Claude Code's ability to maintain context across a coding session makes iterative development and debugging significantly faster than switching to a model optimized for single-turn completions.