Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

model limitations

30 articles about model limitations in AI news

Andrej Karpathy: AI Agent Failures Are 'Skill Issues,' Not Model Capability Problems

Andrej Karpathy argues most AI agent failures stem from poor user instructions and tooling, not model limitations. He advocates delegating 20-minute 'macro actions' to parallel agents and reviewing their work.

85% relevant

LLM Multi-Agent Framework 'Shared Workspace' Proposed to Improve Complex Reasoning via Task Decomposition

A new research paper proposes a multi-agent framework where LLMs split complex reasoning tasks across specialized agents that collaborate via a shared workspace. This approach aims to overcome single-model limitations in planning and tool use.

85% relevant

MIT and Anthropic Release New Benchmark Revealing AI Coding Limitations

Researchers from MIT and Anthropic have developed a new benchmark that systematically identifies significant limitations in current AI coding assistants. The benchmark reveals specific categories of coding tasks where large language models consistently fail, providing concrete data on their weaknesses.

78% relevant

Agentic BI Limitations in Enterprise

An analysis critiques the push for fully autonomous AI agents in business intelligence, highlighting their limitations in enterprise contexts. It proposes a practical hybrid architecture where AI augments, rather than replaces, human analysts and existing BI tools.

80% relevant

The Energy-Constrained AI Revolution: How Power Grid Limitations Are Shaping Artificial Intelligence's Future

Morgan Stanley predicts massive AI breakthroughs driven by computing power spikes, but warns of an impending energy crisis. Developers are repurposing Bitcoin mining infrastructure to bypass grid limitations as AI approaches autonomous self-improvement.

95% relevant

New Research Reveals Fundamental Limitations of Vector Embeddings for Retrieval

A new theoretical paper demonstrates that embedding-based retrieval systems have inherent limitations in representing complex relevance relationships, even with simple queries. This challenges the assumption that better training data alone can solve all retrieval problems.

97% relevant

The Human Bottleneck: Why AI Can't Outgrow Our Limitations

New research reveals that persistent errors in AI systems stem not from insufficient scale, but from fundamental limitations in human supervision itself. The study presents a unified theory showing human feedback creates an inescapable 'error floor' that scaling alone cannot overcome.

75% relevant

The Reasoning Transparency Gap: AI Models Can't Control Their Own Thought Processes

New research reveals AI models can control their final answers 62% of the time but only control their reasoning chains 3% of the time, exposing fundamental limitations in how these systems monitor their own thought processes.

85% relevant

When AI Gets Stumped: Study Reveals Language Models' 'Brain Activity' Collapses Under Pressure

New research shows that when large language models encounter difficult questions, their internal representations dramatically shrink and simplify. This 'activity collapse' reveals fundamental limitations in how current AI processes complex reasoning tasks.

85% relevant

StaTS AI Model Revolutionizes Time Series Forecasting with Adaptive Noise Schedules

Researchers introduce StaTS, a diffusion model that learns adaptive noise schedules and uses frequency guidance for superior time series forecasting. The approach addresses key limitations in existing methods while maintaining efficiency.

75% relevant

SPARROW: A New Method for Precise Object Tracking in Video AI Models

Researchers introduce SPARROW, a technique that improves how AI models track and identify objects in videos with greater spatial precision and temporal consistency. This addresses critical limitations in current video understanding systems.

84% relevant

LeCun's Critique: Why Large Language Models Fall Short of True Intelligence

Meta's Chief AI Scientist Yann LeCun argues that LLMs lack real-world understanding despite massive training data. He highlights fundamental architectural limitations that prevent true reasoning and proposes alternative approaches to artificial intelligence.

85% relevant

Brain-OF: The First Unified AI Model That Reads Multiple Brain Signals Simultaneously

Researchers have developed Brain-OF, the first omnifunctional foundation model that jointly processes fMRI, EEG, and MEG brain signals. This unified approach overcomes previous single-modality limitations by integrating complementary spatiotemporal data through innovative architecture and pretraining techniques.

80% relevant

AI Teaches Itself to See: Adversarial Self-Play Forges Unbreakable Vision Models

Researchers propose AOT, a revolutionary self-play framework where AI models generate their own adversarial training data through competitive image manipulation. This approach overcomes the limitations of finite datasets to create multimodal models with unprecedented perceptual robustness.

75% relevant

CDNet: A New Dual-View Architecture for More Accurate Click-Through Rate Prediction

Researchers propose CDNet, a novel CTR prediction model that bridges sequential user behavior and contextual item features using fine-grained core-behavior and coarse-grained global interest views. This addresses key limitations in traditional models, balancing detail with computational efficiency.

95% relevant

StyleGallery: A Training-Free, Semantic-Aware Framework for Personalized Image Style Transfer

Researchers propose StyleGallery, a novel diffusion-based framework for image style transfer that addresses key limitations: semantic gaps, reliance on extra constraints, and rigid feature alignment. It enables personalized customization from arbitrary reference images without requiring model training.

95% relevant

New Research Proposes 'Level-2 Inverse Games' to Infer Agents' Conflicting Beliefs About Each Other

MIT researchers propose a 'level-2' inverse game theory framework to infer what each agent believes about other agents' objectives, addressing limitations of current methods that assume perfect knowledge. This has implications for modeling complex multi-agent interactions.

75% relevant

GPT-5 Shows Promise as Clinical Assistant but Can't Replace Specialized Medical AI

New research evaluates GPT-5's clinical reasoning capabilities, finding significant improvements over GPT-4o in medical text analysis but limitations in specialized imaging tasks. The study reveals generalist AI models are advancing toward integrated clinical reasoning but still trail domain-specific systems in critical diagnostic areas.

75% relevant

Multimodal Knowledge Graphs Unlock Next-Generation AI Training Data

Researchers have developed MMKG-RDS, a novel framework that synthesizes high-quality reasoning training data by mining multimodal knowledge graphs. The system addresses critical limitations in existing data synthesis methods and improves model reasoning accuracy by 9.2% with minimal training samples.

80% relevant

Wikipedia Navigation Challenge Exposes Critical Gaps in AI Planning Abilities

Researchers introduce LLM-WikiRace, a benchmark testing how well AI models navigate Wikipedia links between concepts. While top models like Gemini-3 show superhuman performance on easy tasks, success rates plummet to just 23% on hard challenges, revealing fundamental limitations in long-term planning.

70% relevant

The Text-Crutch Conundrum: How VLMs' Spatial Reasoning Depends on Reading, Not Seeing

New research reveals vision-language models struggle with basic spatial tasks when visual elements lack text labels. Three leading models performed dramatically worse identifying filled squares versus text symbols in identical grid patterns, exposing fundamental limitations in their visual processing capabilities.

70% relevant

New Benchmark Exposes Critical Gaps in AI's Ability to Navigate the Visual Web

Researchers unveil BrowseComp-V³, a challenging new benchmark testing multimodal AI's ability to perform deep web searches combining text and images. Even top models score only 36%, revealing fundamental limitations in visual-text integration and complex reasoning.

75% relevant

ESGLens: A New RAG Framework for Automated ESG Report Analysis and Score

ESGLens combines RAG with prompt engineering to extract structured ESG data, answer questions, and predict scores. Evaluated on ~300 reports, it achieved a Pearson correlation of 0.48 against LSEG scores. The paper highlights promise but also significant limitations.

82% relevant

IPCCF: A New Graph-Based Approach to Disentangle User Intent for Better

A new research paper introduces Intent Propagation Contrastive Collaborative Filtering (IPCCF), a method designed to improve recommendation systems by more accurately disentangling the underlying intents behind user-item interactions. It addresses limitations in existing methods by incorporating broader graph structure and using contrastive learning for direct supervision, showing superior performance in experiments.

84% relevant

Dual-Enhancement Product Bundling

Researchers propose a dual-enhancement method for product bundling that integrates interactive graph learning with LLM-based semantic understanding. Their graph-to-text paradigm with Dynamic Concept Binding Mechanism addresses cold-start problems and graph comprehension limitations, showing significant performance gains on benchmarks.

71% relevant

Walmart Research Proposes Unified Training for Sponsored Search Retrieval

A new arXiv preprint details Walmart's novel bi-encoder training framework for sponsored search retrieval. It addresses the limitations of using user engagement as a sole training signal by combining graded relevance labels, retrieval priors, and engagement data. The method outperformed the production system in offline and online tests.

91% relevant

Ethan Mollick: Gemma 4 Impressive On-Device, But Agentic Workflows Doubted

Wharton professor Ethan Mollick finds Google's Gemma 4 powerful for on-device use but is skeptical about its ability to execute true agentic workflows, citing limitations in judgment and self-correction.

75% relevant

Context Cartography: Formal Framework Proposes 7 Operators to Govern LLM Context, Moving Beyond 'More Tokens'

Researchers propose 'Context Cartography,' a formal framework for managing LLM context as a structured space, defining 7 operators to move information between zones like 'black fog' and 'visible field.' It argues that simply expanding context windows is insufficient due to transformer attention limitations.

80% relevant

AIGQ: Taobao's End-to-End Generative Architecture for E-commerce Query Recommendation

Alibaba researchers propose AIGQ, a hybrid generative framework for pre-search query recommendations. It uses list-level fine-tuning, a novel policy optimization algorithm, and a hybrid deployment architecture to overcome traditional limitations, showing substantial online improvements on Taobao.

100% relevant

The Compute Crunch: How Processing Power Shortages Are Shaping AI's Workplace Revolution

New analysis reveals that AI's job impact is being constrained by compute limitations, particularly for agentic AI applications. This scarcity makes AI expensive, forcing companies to prioritize high-value tasks while leaving many roles to humans who remain more cost-effective.

85% relevant