long context

30 articles about long context in AI news

New Research Diagnoses LLMs' Struggle with Multiple Knowledge Updates in Context

A new arXiv paper reveals a persistent bias in LLMs when facts are updated multiple times within a long context. Models increasingly favor the earliest version, failing to track the latest state—a critical flaw for dynamic knowledge tasks.

78% relevant

The Hidden Cost of Mixture-of-Experts: New Research Reveals Why MoE Models Struggle at Inference

A groundbreaking paper introduces the 'qs inequality,' revealing how Mixture-of-Experts architectures suffer a 'double penalty' during inference that can make them 4.5x slower than dense models. The research shows training efficiency doesn't translate to inference performance, especially with long contexts.

75% relevant

λ-RLM: 8B Parameter Model Using Typed λ-Calculus Beats 405B Performance on Long-Context Tasks

Researchers developed λ-RLM, an 8B parameter model that outperforms 405B models on long-context tasks by replacing recursive code with typed λ-calculus combinators. This approach guarantees termination and reduces latency by up to 4.1x.

99% relevant

Sakana AI's Doc-to-LoRA: A Hypernetwork Breakthrough for Efficient Long-Context Processing

Sakana AI introduces Doc-to-LoRA, a lightweight hypernetwork that meta-learns to compress long documents into efficient LoRA adapters, dramatically reducing the computational costs of processing lengthy text. This innovation addresses the quadratic attention bottleneck that makes long-context AI models expensive and slow.

85% relevant

Meta's QTT Method Fixes Long-Context LLM 'Buried Facts' Problem, Boosts Retrieval Accuracy

Meta researchers identified a failure mode where LLMs with 128K+ context windows miss information buried in the middle of documents. Their Query-only Test-Time Training (QTT) method adapts models at inference, significantly improving retrieval accuracy.

85% relevant

Anthropic Surpasses Google in Extended Context AI, Redefining Long-Form Reasoning

Anthropic's Claude has reportedly outperformed Google's models in maintaining attention and reasoning across extended contexts, marking a significant shift in the AI landscape where context length has become a critical competitive frontier.

87% relevant

Beyond the Token Limit: How Claude Opus 4.6's Architectural Breakthrough Enables True Long-Context Reasoning

Anthropic's Claude Opus 4.6 represents a fundamental shift in large language model architecture, moving beyond simple token expansion to create genuinely autonomous reasoning systems. The breakthrough enables practical use of million-token contexts through novel memory management and hierarchical processing.

70% relevant

How to Run Claude Code 24/7 Without Burning Your Context Window

Implement a hard 50K token session cap and a three-tier memory system (daily notes, MEMORY.md, PARA knowledge graph) to prevent context bloat and memory decay in long-running Claude Code agents.

100% relevant

The Cognitive Divergence: AI Context Windows Expand as Human Attention Declines, Creating a Delegation Feedback Loop

A new arXiv paper documents the exponential growth of AI context windows (512 tokens in 2017 to 2M in 2026) alongside a measured decline in human sustained-attention capacity. It introduces the 'Delegation Feedback Loop' hypothesis, where easier AI delegation may further erode human cognitive practice. This is a foundational study on human-AI interaction dynamics.

84% relevant

Qwen 3.6 Plus Preview Launches on OpenRouter with Free 1M Token Context, Disrupting API Pricing

Alibaba's Qwen team has released a preview of Qwen 3.6 Plus on OpenRouter with a 1 million token context window, charging $0 for both input and output tokens. This directly undercuts paid long-context offerings from Anthropic and OpenAI.

97% relevant

MemoryCD: New Benchmark Tests LLM Agents on Real-World, Lifelong User Memory for Personalization

Researchers introduce MemoryCD, the first large-scale benchmark for evaluating LLM agents' long-context memory using real Amazon user data across 12 domains. It reveals current methods are far from satisfactory for lifelong personalization.

74% relevant

Context Graph for Agentic Coding: A New Abstraction for LLM-Powered Development

A new "context graph" abstraction is emerging for AI coding agents, designed to manage project state and memory across sessions. It aims to solve the persistent context problem in long-running development tasks.

89% relevant

Memory Sparse Attention (MSA) Enables 100M Token Context Windows with Minimal Performance Loss

Memory Sparse Attention (MSA) is a proposed architecture that allows AI models to store and reason over massive long-term memory directly within their attention mechanism, eliminating the need for external retrieval systems. The approach reportedly enables context windows of up to 100 million tokens with minimal performance degradation.

85% relevant

Anthropic's Pricing Revolution: Million-Token Context Now Standard for Claude AI

Anthropic has eliminated the 5x surcharge for million-token contexts in Claude 3 Opus and Claude 3.5 Sonnet, making long-context AI dramatically more affordable. This pricing overhaul removes barriers for developers analyzing large documents, codebases, and datasets.

100% relevant

Claude Code's 1M Context Window Is Now GA — And It's Priced Like Regular Context

Claude Opus 4.6 and Sonnet 4.6 now support 1M tokens with no long-context premium, making massive codebase analysis cheaper than competitors.

90% relevant

VSPrefill: The Vertical-Slash Breakthrough That Makes 128K Contexts Practical

Researchers have developed VSPrefill, a novel sparse attention mechanism that dramatically accelerates long-context processing in LLMs. Using lightweight indexing of vertical columns and slash diagonals, it achieves 4.95x speedup while maintaining 98.35% accuracy at 128k context lengths.

80% relevant

DeepSeek's HISA: Hierarchical Sparse Attention Cuts 64K Context Indexing Cost

DeepSeek researchers introduced HISA, a hierarchical sparse attention method that replaces flat token scanning. It removes a computational bottleneck at 64K context lengths without requiring any model retraining.

85% relevant

Alibaba Launches Qwen3.6-Plus with 1M-Token Context, Targeting AI Agent and Coding Workloads

Alibaba Cloud has launched Qwen3.6-Plus, a new multimodal large language model featuring a 1 million-token context length. The release is a strategic move to capture developer mindshare in the competitive AI agent and coding assistant market.

100% relevant

Cognition Labs Launches 'Canvas for Agents': First Shared Workspace Where AI Agents Code Alongside Humans

Cognition Labs has unveiled a collaborative workspace where AI agents like Codex and Claude Code operate visibly alongside human developers. This marks a shift from AI as a tool to a visible, real-time collaborator in the creative coding process.

87% relevant

Memory Sparse Attention (MSA) Achieves 100M Token Context with Near-Linear Complexity

A new attention architecture, Memory Sparse Attention (MSA), breaks the 100M token context barrier while maintaining 94% accuracy at 1M tokens. It uses document-wise RoPE and end-to-end sparse attention to outperform RAG systems and frontier models.

95% relevant

Claude Code v2.1.86 Fixes /compact Failures, Adds Context Usage Tracking

Latest update fixes critical /compact bug, adds getContextUsage() for token monitoring, and improves Edit reliability with seed_read_state.

95% relevant

OpenResearcher Paper Released: Method for Synthesizing Long-Horizon Research Trajectories for AI

The OpenResearcher paper has been released, exploring methods to synthesize long-horizon research trajectories for deep learning. This work aims to provide structured guidance for navigating complex, multi-step AI research problems.

85% relevant

Context Cartography: Formal Framework Proposes 7 Operators to Govern LLM Context, Moving Beyond 'More Tokens'

Researchers propose 'Context Cartography,' a formal framework for managing LLM context as a structured space, defining 7 operators to move information between zones like 'black fog' and 'visible field.' It argues that simply expanding context windows is insufficient due to transformer attention limitations.

80% relevant

Claude Code's 'Long-Running' Mode Unlocks Scientific Computing Workflows

Anthropic's new 'long-running Claude' capability enables Claude Code to handle extended scientific computing tasks—here's how to use it for data analysis, simulations, and research pipelines.

70% relevant

PlayerZero Launches AI Context Graph for Production Systems, Claims 80% Fewer Support Escalations

AI startup PlayerZero has launched a context graph that connects code, incidents, telemetry, and tickets into a single operational model. The system, backed by CEOs of Figma, Dropbox, and Vercel, aims to predict failures, trace root causes, and generate fixes before code reaches production.

87% relevant

RAG Fails at Boundaries, Not Search: A Critical Look at Chunking and Context Limits

An analysis argues that RAG system failures are often due to fundamental data boundary issues—chunking, context limits, and source segmentation—rather than search algorithm performance. This reframes the primary challenge for AI practitioners implementing knowledge retrieval.

100% relevant

Supermemory Claims ~99% on LongMemEval_s with Experimental ASMR Technique, Plans Open-Source Release

An experimental AI technique called ASMR (Agentic Search and Memory Retrieval) reportedly achieved near-perfect performance (~99%) on the LongMemEval_s benchmark. The method replaces vector search with parallel observer agents and will be open-sourced in 11 days.

95% relevant

Zhipu AI Announces GLM-5.1 Series, Featuring 1M Context and 128K Output Tokens

Zhipu AI has announced the GLM-5.1 model series, featuring a 1 million token context window and support for 128K output tokens. The update includes multiple model sizes and API availability.

85% relevant

New Research Reveals LLM-Based Recommender Agents Are Vulnerable to Contextual Bias

A new benchmark, BiasRecBench, demonstrates that LLMs used as recommendation agents in workflows like e-commerce are easily swayed by injected contextual biases, even when they can identify the correct choice. This exposes a critical reliability gap in high-stakes applications.

82% relevant

OpenAI Codex Gains Subagents, Anthropic Ships 1M Context at Standard Pricing

OpenAI added parallel subagents to Codex to combat 'context pollution,' while Anthropic made 1M context generally available for Claude Opus/Sonnet 4.6 with no price premium, achieving 78.3% on MRCR v2. These incremental upgrades reshape practical agentic workflows.

85% relevant