long context
30 articles about long context in AI news
New Research Diagnoses LLMs' Struggle with Multiple Knowledge Updates in Context
A new arXiv paper reveals a persistent bias in LLMs when facts are updated multiple times within a long context. Models increasingly favor the earliest version, failing to track the latest state—a critical flaw for dynamic knowledge tasks.
The Hidden Cost of Mixture-of-Experts: New Research Reveals Why MoE Models Struggle at Inference
A groundbreaking paper introduces the 'qs inequality,' revealing how Mixture-of-Experts architectures suffer a 'double penalty' during inference that can make them 4.5x slower than dense models. The research shows training efficiency doesn't translate to inference performance, especially with long contexts.
λ-RLM: 8B Parameter Model Using Typed λ-Calculus Beats 405B Performance on Long-Context Tasks
Researchers developed λ-RLM, an 8B parameter model that outperforms 405B models on long-context tasks by replacing recursive code with typed λ-calculus combinators. This approach guarantees termination and reduces latency by up to 4.1x.
Sakana AI's Doc-to-LoRA: A Hypernetwork Breakthrough for Efficient Long-Context Processing
Sakana AI introduces Doc-to-LoRA, a lightweight hypernetwork that meta-learns to compress long documents into efficient LoRA adapters, dramatically reducing the computational costs of processing lengthy text. This innovation addresses the quadratic attention bottleneck that makes long-context AI models expensive and slow.
Meta's QTT Method Fixes Long-Context LLM 'Buried Facts' Problem, Boosts Retrieval Accuracy
Meta researchers identified a failure mode where LLMs with 128K+ context windows miss information buried in the middle of documents. Their Query-only Test-Time Training (QTT) method adapts models at inference, significantly improving retrieval accuracy.
Anthropic Surpasses Google in Extended Context AI, Redefining Long-Form Reasoning
Anthropic's Claude has reportedly outperformed Google's models in maintaining attention and reasoning across extended contexts, marking a significant shift in the AI landscape where context length has become a critical competitive frontier.
Beyond the Token Limit: How Claude Opus 4.6's Architectural Breakthrough Enables True Long-Context Reasoning
Anthropic's Claude Opus 4.6 represents a fundamental shift in large language model architecture, moving beyond simple token expansion to create genuinely autonomous reasoning systems. The breakthrough enables practical use of million-token contexts through novel memory management and hierarchical processing.
How to Run Claude Code 24/7 Without Burning Your Context Window
Implement a hard 50K token session cap and a three-tier memory system (daily notes, MEMORY.md, PARA knowledge graph) to prevent context bloat and memory decay in long-running Claude Code agents.
The Cognitive Divergence: AI Context Windows Expand as Human Attention Declines, Creating a Delegation Feedback Loop
A new arXiv paper documents the exponential growth of AI context windows (512 tokens in 2017 to 2M in 2026) alongside a measured decline in human sustained-attention capacity. It introduces the 'Delegation Feedback Loop' hypothesis, where easier AI delegation may further erode human cognitive practice. This is a foundational study on human-AI interaction dynamics.
Qwen 3.6 Plus Preview Launches on OpenRouter with Free 1M Token Context, Disrupting API Pricing
Alibaba's Qwen team has released a preview of Qwen 3.6 Plus on OpenRouter with a 1 million token context window, charging $0 for both input and output tokens. This directly undercuts paid long-context offerings from Anthropic and OpenAI.
MemoryCD: New Benchmark Tests LLM Agents on Real-World, Lifelong User Memory for Personalization
Researchers introduce MemoryCD, the first large-scale benchmark for evaluating LLM agents' long-context memory using real Amazon user data across 12 domains. It reveals current methods are far from satisfactory for lifelong personalization.
Context Graph for Agentic Coding: A New Abstraction for LLM-Powered Development
A new "context graph" abstraction is emerging for AI coding agents, designed to manage project state and memory across sessions. It aims to solve the persistent context problem in long-running development tasks.
Memory Sparse Attention (MSA) Enables 100M Token Context Windows with Minimal Performance Loss
Memory Sparse Attention (MSA) is a proposed architecture that allows AI models to store and reason over massive long-term memory directly within their attention mechanism, eliminating the need for external retrieval systems. The approach reportedly enables context windows of up to 100 million tokens with minimal performance degradation.
Anthropic's Pricing Revolution: Million-Token Context Now Standard for Claude AI
Anthropic has eliminated the 5x surcharge for million-token contexts in Claude 3 Opus and Claude 3.5 Sonnet, making long-context AI dramatically more affordable. This pricing overhaul removes barriers for developers analyzing large documents, codebases, and datasets.
Claude Code's 1M Context Window Is Now GA — And It's Priced Like Regular Context
Claude Opus 4.6 and Sonnet 4.6 now support 1M tokens with no long-context premium, making massive codebase analysis cheaper than competitors.
VSPrefill: The Vertical-Slash Breakthrough That Makes 128K Contexts Practical
Researchers have developed VSPrefill, a novel sparse attention mechanism that dramatically accelerates long-context processing in LLMs. Using lightweight indexing of vertical columns and slash diagonals, it achieves 4.95x speedup while maintaining 98.35% accuracy at 128k context lengths.
DeepSeek's HISA: Hierarchical Sparse Attention Cuts 64K Context Indexing Cost
DeepSeek researchers introduced HISA, a hierarchical sparse attention method that replaces flat token scanning. It removes a computational bottleneck at 64K context lengths without requiring any model retraining.
Alibaba Launches Qwen3.6-Plus with 1M-Token Context, Targeting AI Agent and Coding Workloads
Alibaba Cloud has launched Qwen3.6-Plus, a new multimodal large language model featuring a 1 million-token context length. The release is a strategic move to capture developer mindshare in the competitive AI agent and coding assistant market.
Cognition Labs Launches 'Canvas for Agents': First Shared Workspace Where AI Agents Code Alongside Humans
Cognition Labs has unveiled a collaborative workspace where AI agents like Codex and Claude Code operate visibly alongside human developers. This marks a shift from AI as a tool to a visible, real-time collaborator in the creative coding process.
Memory Sparse Attention (MSA) Achieves 100M Token Context with Near-Linear Complexity
A new attention architecture, Memory Sparse Attention (MSA), breaks the 100M token context barrier while maintaining 94% accuracy at 1M tokens. It uses document-wise RoPE and end-to-end sparse attention to outperform RAG systems and frontier models.
Claude Code v2.1.86 Fixes /compact Failures, Adds Context Usage Tracking
Latest update fixes critical /compact bug, adds getContextUsage() for token monitoring, and improves Edit reliability with seed_read_state.
OpenResearcher Paper Released: Method for Synthesizing Long-Horizon Research Trajectories for AI
The OpenResearcher paper has been released, exploring methods to synthesize long-horizon research trajectories for deep learning. This work aims to provide structured guidance for navigating complex, multi-step AI research problems.
Context Cartography: Formal Framework Proposes 7 Operators to Govern LLM Context, Moving Beyond 'More Tokens'
Researchers propose 'Context Cartography,' a formal framework for managing LLM context as a structured space, defining 7 operators to move information between zones like 'black fog' and 'visible field.' It argues that simply expanding context windows is insufficient due to transformer attention limitations.
Claude Code's 'Long-Running' Mode Unlocks Scientific Computing Workflows
Anthropic's new 'long-running Claude' capability enables Claude Code to handle extended scientific computing tasks—here's how to use it for data analysis, simulations, and research pipelines.
PlayerZero Launches AI Context Graph for Production Systems, Claims 80% Fewer Support Escalations
AI startup PlayerZero has launched a context graph that connects code, incidents, telemetry, and tickets into a single operational model. The system, backed by CEOs of Figma, Dropbox, and Vercel, aims to predict failures, trace root causes, and generate fixes before code reaches production.
RAG Fails at Boundaries, Not Search: A Critical Look at Chunking and Context Limits
An analysis argues that RAG system failures are often due to fundamental data boundary issues—chunking, context limits, and source segmentation—rather than search algorithm performance. This reframes the primary challenge for AI practitioners implementing knowledge retrieval.
Supermemory Claims ~99% on LongMemEval_s with Experimental ASMR Technique, Plans Open-Source Release
An experimental AI technique called ASMR (Agentic Search and Memory Retrieval) reportedly achieved near-perfect performance (~99%) on the LongMemEval_s benchmark. The method replaces vector search with parallel observer agents and will be open-sourced in 11 days.
Zhipu AI Announces GLM-5.1 Series, Featuring 1M Context and 128K Output Tokens
Zhipu AI has announced the GLM-5.1 model series, featuring a 1 million token context window and support for 128K output tokens. The update includes multiple model sizes and API availability.
New Research Reveals LLM-Based Recommender Agents Are Vulnerable to Contextual Bias
A new benchmark, BiasRecBench, demonstrates that LLMs used as recommendation agents in workflows like e-commerce are easily swayed by injected contextual biases, even when they can identify the correct choice. This exposes a critical reliability gap in high-stakes applications.
OpenAI Codex Gains Subagents, Anthropic Ships 1M Context at Standard Pricing
OpenAI added parallel subagents to Codex to combat 'context pollution,' while Anthropic made 1M context generally available for Claude Opus/Sonnet 4.6 with no price premium, achieving 78.3% on MRCR v2. These incremental upgrades reshape practical agentic workflows.