Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

long context models

30 articles about long context models in AI news

MIT's RLM Handles 10M+ Tokens, Outperforms RAG on Long-Context Benchmarks

MIT researchers introduced Recursive Language Models (RLMs), which treat long documents as an external environment and use code to search, slice, and filter data, achieving 58.00 on a hard long-context benchmark versus 0.04 for standard models.

95% relevant

λ-RLM: 8B Parameter Model Using Typed λ-Calculus Beats 405B Performance on Long-Context Tasks

Researchers developed λ-RLM, an 8B parameter model that outperforms 405B models on long-context tasks by replacing recursive code with typed λ-calculus combinators. This approach guarantees termination and reduces latency by up to 4.1x.

99% relevant

Sakana AI's Doc-to-LoRA: A Hypernetwork Breakthrough for Efficient Long-Context Processing

Sakana AI introduces Doc-to-LoRA, a lightweight hypernetwork that meta-learns to compress long documents into efficient LoRA adapters, dramatically reducing the computational costs of processing lengthy text. This innovation addresses the quadratic attention bottleneck that makes long-context AI models expensive and slow.

85% relevant

Meta's QTT Method Fixes Long-Context LLM 'Buried Facts' Problem, Boosts Retrieval Accuracy

Meta researchers identified a failure mode where LLMs with 128K+ context windows miss information buried in the middle of documents. Their Query-only Test-Time Training (QTT) method adapts models at inference, significantly improving retrieval accuracy.

85% relevant

New Research Diagnoses LLMs' Struggle with Multiple Knowledge Updates in Context

A new arXiv paper reveals a persistent bias in LLMs when facts are updated multiple times within a long context. Models increasingly favor the earliest version, failing to track the latest state—a critical flaw for dynamic knowledge tasks.

78% relevant

Anthropic Surpasses Google in Extended Context AI, Redefining Long-Form Reasoning

Anthropic's Claude has reportedly outperformed Google's models in maintaining attention and reasoning across extended contexts, marking a significant shift in the AI landscape where context length has become a critical competitive frontier.

87% relevant

Beyond the Token Limit: How Claude Opus 4.6's Architectural Breakthrough Enables True Long-Context Reasoning

Anthropic's Claude Opus 4.6 represents a fundamental shift in large language model architecture, moving beyond simple token expansion to create genuinely autonomous reasoning systems. The breakthrough enables practical use of million-token contexts through novel memory management and hierarchical processing.

70% relevant

Memory Sparse Attention (MSA) Enables 100M Token Context Windows with Minimal Performance Loss

Memory Sparse Attention (MSA) is a proposed architecture that allows AI models to store and reason over massive long-term memory directly within their attention mechanism, eliminating the need for external retrieval systems. The approach reportedly enables context windows of up to 100 million tokens with minimal performance degradation.

85% relevant

The Cognitive Divergence: AI Context Windows Expand as Human Attention Declines, Creating a Delegation Feedback Loop

A new arXiv paper documents the exponential growth of AI context windows (512 tokens in 2017 to 2M in 2026) alongside a measured decline in human sustained-attention capacity. It introduces the 'Delegation Feedback Loop' hypothesis, where easier AI delegation may further erode human cognitive practice. This is a foundational study on human-AI interaction dynamics.

84% relevant

Qwen 3.6 Plus Preview Launches on OpenRouter with Free 1M Token Context, Disrupting API Pricing

Alibaba's Qwen team has released a preview of Qwen 3.6 Plus on OpenRouter with a 1 million token context window, charging $0 for both input and output tokens. This directly undercuts paid long-context offerings from Anthropic and OpenAI.

97% relevant

MemoryCD: New Benchmark Tests LLM Agents on Real-World, Lifelong User Memory for Personalization

Researchers introduce MemoryCD, the first large-scale benchmark for evaluating LLM agents' long-context memory using real Amazon user data across 12 domains. It reveals current methods are far from satisfactory for lifelong personalization.

74% relevant

Context Graph for Agentic Coding: A New Abstraction for LLM-Powered Development

A new "context graph" abstraction is emerging for AI coding agents, designed to manage project state and memory across sessions. It aims to solve the persistent context problem in long-running development tasks.

89% relevant

Anthropic's Pricing Revolution: Million-Token Context Now Standard for Claude AI

Anthropic has eliminated the 5x surcharge for million-token contexts in Claude 3 Opus and Claude 3.5 Sonnet, making long-context AI dramatically more affordable. This pricing overhaul removes barriers for developers analyzing large documents, codebases, and datasets.

95% relevant

Claude Code's 1M Context Window Is Now GA — And It's Priced Like Regular Context

Claude Opus 4.6 and Sonnet 4.6 now support 1M tokens with no long-context premium, making massive codebase analysis cheaper than competitors.

90% relevant

VSPrefill: The Vertical-Slash Breakthrough That Makes 128K Contexts Practical

Researchers have developed VSPrefill, a novel sparse attention mechanism that dramatically accelerates long-context processing in LLMs. Using lightweight indexing of vertical columns and slash diagonals, it achieves 4.95x speedup while maintaining 98.35% accuracy at 128k context lengths.

80% relevant

Engramme Building 'Large Memory Models' to Surface Personal Context

Engramme, founded by Gabriel Kreiman, is developing 'Large Memory Models' (LMMs) designed to connect to a user's digital life and surface relevant context without explicit prompting. The goal is to augment human memory by making personal data available at the right moment.

85% relevant

AI Giants Poised for Breakthrough: 1 Trillion Parameter Models with Million-Token Context Windows

Industry insiders hint at imminent releases of AI models with unprecedented scale—1 trillion parameters and 1 million token context windows. This represents a quantum leap in AI capability that could transform how we interact with technology.

85% relevant

The Hidden Cost of Mixture-of-Experts: New Research Reveals Why MoE Models Struggle at Inference

A groundbreaking paper introduces the 'qs inequality,' revealing how Mixture-of-Experts architectures suffer a 'double penalty' during inference that can make them 4.5x slower than dense models. The research shows training efficiency doesn't translate to inference performance, especially with long contexts.

75% relevant

DeepSeek V4-Pro: 1.6T parameters, open weights, undercuts rivals 10x

DeepSeek unveiled V4-Pro and V4-Flash, its largest open-weight models with up to 1.6 trillion parameters and a 1M-token context window. The new hybrid attention architecture cuts compute for long contexts by 73–90%, enabling prices far below OpenAI, Google, and Anthropic.

100% relevant

NVIDIA Nemotron 3 Super: 120B Hybrid Mamba-Transformer MoE with 1M Context

NVIDIA has released Nemotron 3 Super, a 120B parameter open hybrid Mamba-Transformer Mixture of Experts model with 12B active parameters and 1M token context length. The company claims it delivers up to 7.5x higher throughput than similar open models.

95% relevant

SLSREC: A New Self-Supervised Model for Disentangling Long- and Short-Term User Interests in Recommendations

A new arXiv preprint introduces SLSREC, a self-supervised model that disentangles long-term user preferences from short-term intentions using contrastive learning and adaptive fusion. It outperforms state-of-the-art models on three benchmark datasets, addressing a core challenge in dynamic user modeling.

88% relevant

Memory Sparse Attention (MSA) Achieves 100M Token Context with Near-Linear Complexity

A new attention architecture, Memory Sparse Attention (MSA), breaks the 100M token context barrier while maintaining 94% accuracy at 1M tokens. It uses document-wise RoPE and end-to-end sparse attention to outperform RAG systems and frontier models.

95% relevant

Sam Altman Envisions AI That Thinks for Days: The Dawn of Super-Long-Term Reasoning

OpenAI CEO Sam Altman predicts future AI models will perform "super long-term reasoning," spending days or weeks analyzing complex, high-stakes problems. This represents a fundamental shift from today's rapid-response systems toward deliberate, extended cognitive processes.

85% relevant

DeepSeek-V4 Hits 500K Context with 90% Less KV Cache via FlashMemory

DeepSeek-V4 achieves 500K context with 90% less KV cache via FlashMemory's lookahead sparse attention, keeping only 13.5% of cache in GPU memory without retraining.

98% relevant

MiniMax M3: Sparse Attention, 1M Context, Multimodal via Together

MiniMax M3 uses sparse attention for 1M context and multimodality, with Together AI serving fast inference.

95% relevant

Researchers Achieve Ultra-Long-Horizon Agentic Science with Cohesive AI Agents

A research team has developed AI agents capable of executing and maintaining coherent, long-horizon scientific research workflows. This addresses a core challenge in creating autonomous systems for complex discovery.

85% relevant

Gemini CLI Launches Subagents with Isolated Context & Custom Instructions

The Gemini CLI tool has launched a 'Subagents' feature, allowing users to run multiple specialized AI agents concurrently, each with its own isolated context and system prompt. This enables more complex, modular workflows by preventing instruction bleed between tasks.

85% relevant

HORIZON Benchmark Diagnoses Long-Horizon Failures in GPT-5 and Claude Agents

A new benchmark called HORIZON systematically analyzes where and why LLM agents like GPT-5 and Claude fail on long-horizon tasks. The study collected over 3100 agent trajectories and provides a scalable method for failure attribution, offering practical guidance for building more reliable agents.

100% relevant

MemPalace Hits 96.6% on LongMemEval, Beats Paid AI Memory Tools

MemPalace, an open-source AI memory system built by actress Milla Jovovich and developer Ben Sigman, achieved 96.6% on the LongMemEval benchmark—the highest local-only score ever recorded—using a memory palace architecture that stores all conversations verbatim.

95% relevant

Anthropic Launches Managed Agents for Long-Running AI Workflows

Anthropic has launched Managed Agents, a hosted service for creating and running long-running AI agents. This addresses core system design challenges for persistent AI workflows that operate beyond single API calls.

95% relevant