long context

30 articles about long context in AI news

New Research Diagnoses LLMs' Struggle with Multiple Knowledge Updates in Context

A new arXiv paper reveals a persistent bias in LLMs when facts are updated multiple times within a long context. Models increasingly favor the earliest version, failing to track the latest state—a critical flaw for dynamic knowledge tasks.

Mar 16, 202678% relevant

DeepSeek V4-Pro: 1.6T parameters, open weights, undercuts rivals 10x

DeepSeek unveiled V4-Pro and V4-Flash, its largest open-weight models with up to 1.6 trillion parameters and a 1M-token context window. The new hybrid attention architecture cuts compute for long contexts by 73–90%, enabling prices far below OpenAI, Google, and Anthropic.

Apr 24, 2026100% relevant

The Hidden Cost of Mixture-of-Experts: New Research Reveals Why MoE Models Struggle at Inference

A groundbreaking paper introduces the 'qs inequality,' revealing how Mixture-of-Experts architectures suffer a 'double penalty' during inference that can make them 4.5x slower than dense models. The research shows training efficiency doesn't translate to inference performance, especially with long contexts.

Mar 11, 202675% relevant

ServiceNow's SynthDocBench Teases Apart VLM Long-Context Failure Modes

ServiceNow releases SynthDocBench, a controlled synthetic benchmark for long-context visual document understanding that varies length, layout, modality, and reasoning to diagnose VLM failures.

Jul 15, 202685% relevant

Grouped Query Experts cuts long-context attention cost 44%

GQE speeds long-context attention prefill 1.7–1.8× by routing tokens to 9 of 16 query heads, matching baseline accuracy at 56.04.

Jun 27, 202685% relevant

MIT's RLM Handles 10M+ Tokens, Outperforms RAG on Long-Context Benchmarks

MIT researchers introduced Recursive Language Models (RLMs), which treat long documents as an external environment and use code to search, slice, and filter data, achieving 58.00 on a hard long-context benchmark versus 0.04 for standard models.

Apr 23, 202695% relevant

λ-RLM: 8B Parameter Model Using Typed λ-Calculus Beats 405B Performance on Long-Context Tasks

Researchers developed λ-RLM, an 8B parameter model that outperforms 405B models on long-context tasks by replacing recursive code with typed λ-calculus combinators. This approach guarantees termination and reduces latency by up to 4.1x.

Mar 24, 202699% relevant

Sakana AI's Doc-to-LoRA: A Hypernetwork Breakthrough for Efficient Long-Context Processing

Sakana AI introduces Doc-to-LoRA, a lightweight hypernetwork that meta-learns to compress long documents into efficient LoRA adapters, dramatically reducing the computational costs of processing lengthy text. This innovation addresses the quadratic attention bottleneck that makes long-context AI models expensive and slow.

Feb 27, 202685% relevant

Meta's QTT Method Fixes Long-Context LLM 'Buried Facts' Problem, Boosts Retrieval Accuracy

Meta researchers identified a failure mode where LLMs with 128K+ context windows miss information buried in the middle of documents. Their Query-only Test-Time Training (QTT) method adapts models at inference, significantly improving retrieval accuracy.

Mar 31, 202685% relevant

Anthropic Surpasses Google in Extended Context AI, Redefining Long-Form Reasoning

Anthropic's Claude has reportedly outperformed Google's models in maintaining attention and reasoning across extended contexts, marking a significant shift in the AI landscape where context length has become a critical competitive frontier.

Mar 14, 202687% relevant

Beyond the Token Limit: How Claude Opus 4.6's Architectural Breakthrough Enables True Long-Context Reasoning

Anthropic's Claude Opus 4.6 represents a fundamental shift in large language model architecture, moving beyond simple token expansion to create genuinely autonomous reasoning systems. The breakthrough enables practical use of million-token contexts through novel memory management and hierarchical processing.

Feb 15, 202670% relevant

How One Developer Achieved a 46:1 Context Cache Ratio to Manage 39 Projects

The key takeaway is that maximizing Claude Code's prompt cache through long, context-dense sessions is the most effective way to scale individual productivity across multiple projects.

Apr 17, 2026100% relevant

How Claude Code's 3-Tier Compaction System Saves You Money and Keeps Context

Learn how Claude Code's intelligent, three-tiered compaction system works to manage long conversations efficiently, preserving key context while optimizing for token usage and cost.

Apr 7, 202696% relevant

How to Run Claude Code 24/7 Without Burning Your Context Window

Implement a hard 50K token session cap and a three-tier memory system (daily notes, MEMORY.md, PARA knowledge graph) to prevent context bloat and memory decay in long-running Claude Code agents.

Apr 3, 202695% relevant

The Cognitive Divergence: AI Context Windows Expand as Human Attention Declines, Creating a Delegation Feedback Loop

A new arXiv paper documents the exponential growth of AI context windows (512 tokens in 2017 to 2M in 2026) alongside a measured decline in human sustained-attention capacity. It introduces the 'Delegation Feedback Loop' hypothesis, where easier AI delegation may further erode human cognitive practice. This is a foundational study on human-AI interaction dynamics.

Mar 31, 202684% relevant

Qwen 3.6 Plus Preview Launches on OpenRouter with Free 1M Token Context, Disrupting API Pricing

Alibaba's Qwen team has released a preview of Qwen 3.6 Plus on OpenRouter with a 1 million token context window, charging $0 for both input and output tokens. This directly undercuts paid long-context offerings from Anthropic and OpenAI.

Mar 30, 202697% relevant

MemoryCD: New Benchmark Tests LLM Agents on Real-World, Lifelong User Memory for Personalization

Researchers introduce MemoryCD, the first large-scale benchmark for evaluating LLM agents' long-context memory using real Amazon user data across 12 domains. It reveals current methods are far from satisfactory for lifelong personalization.

Mar 30, 202674% relevant

Context Graph for Agentic Coding: A New Abstraction for LLM-Powered Development

A new "context graph" abstraction is emerging for AI coding agents, designed to manage project state and memory across sessions. It aims to solve the persistent context problem in long-running development tasks.

Mar 23, 202689% relevant

Memory Sparse Attention (MSA) Enables 100M Token Context Windows with Minimal Performance Loss

Memory Sparse Attention (MSA) is a proposed architecture that allows AI models to store and reason over massive long-term memory directly within their attention mechanism, eliminating the need for external retrieval systems. The approach reportedly enables context windows of up to 100 million tokens with minimal performance degradation.

Mar 21, 202685% relevant

Anthropic's Pricing Revolution: Million-Token Context Now Standard for Claude AI

Anthropic has eliminated the 5x surcharge for million-token contexts in Claude 3 Opus and Claude 3.5 Sonnet, making long-context AI dramatically more affordable. This pricing overhaul removes barriers for developers analyzing large documents, codebases, and datasets.

Mar 13, 202695% relevant

Claude Code's 1M Context Window Is Now GA — And It's Priced Like Regular Context

Claude Opus 4.6 and Sonnet 4.6 now support 1M tokens with no long-context premium, making massive codebase analysis cheaper than competitors.

Mar 13, 202690% relevant

VSPrefill: The Vertical-Slash Breakthrough That Makes 128K Contexts Practical

Researchers have developed VSPrefill, a novel sparse attention mechanism that dramatically accelerates long-context processing in LLMs. Using lightweight indexing of vertical columns and slash diagonals, it achieves 4.95x speedup while maintaining 98.35% accuracy at 128k context lengths.

Mar 6, 202680% relevant

Why Claude Code's 80.8% SWE-Bench Score and 1M Context Window Beat Codex

Claude Code's 80.8% SWE-Bench score, 1M token context, and local execution make it the top choice for senior devs—use `claude code` in your terminal for complex codebase work.

Jul 12, 202685% relevant

Meituan Open-Sources 1.6T-Parameter LongCat-2.0 Trained on Domestic Chips

Meituan open-sourced 1.6T-parameter LongCat-2.0 trained on 50,000 domestic ASICs, claiming China's first full-process domestic-chip trillion-parameter model.

Jun 30, 2026100% relevant

DeepSeek-V4 Hits 500K Context with 90% Less KV Cache via FlashMemory

DeepSeek-V4 achieves 500K context with 90% less KV cache via FlashMemory's lookahead sparse attention, keeping only 13.5% of cache in GPU memory without retraining.

Jun 9, 202698% relevant

MiniMax M3: Sparse Attention, 1M Context, Multimodal via Together

MiniMax M3 uses sparse attention for 1M context and multimodality, with Together AI serving fast inference.

Jun 3, 202695% relevant

Microsoft Paper Probes Long-Horizon Agent Generalization Gap

Microsoft Research paper on long-horizon agent generalization identifies failure modes and proposes improvements for extended tasks.

May 6, 202675% relevant

Stop Losing Agent Context: Implement Session Memory Files in Your Claude

A simple pattern using structured markdown files to persist session state across context windows, preventing Claude Code agents from redoing work or making inconsistent decisions.

Apr 22, 2026100% relevant

Codex 'Chronicle' Research Preview Adds Memory for Daily Developer Context

A research preview of 'Chronicle' for Codex has been released. It enables the AI coding assistant to accumulate memories from a developer's daily workflow to improve context.

Apr 20, 202693% relevant

Researchers Achieve Ultra-Long-Horizon Agentic Science with Cohesive AI Agents

A research team has developed AI agents capable of executing and maintaining coherent, long-horizon scientific research workflows. This addresses a core challenge in creating autonomous systems for complex discovery.

Apr 20, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety