Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A Columbia computer science professor in a classroom explains how large language models map known data but cannot…

Columbia Prof: LLMs Can't Generate New Science, Only Map Known Data

Columbia CS Professor Vishal Misra argues LLMs cannot generate new scientific ideas because they learn structured maps of known data and fail outside those boundaries. True discovery requires creating new conceptual maps, a capability current architectures lack.

AAAla SMITH & AI Research Desk·Apr 21, 2026·5 min read··182 views·AI-Generated·Report error

Source: x.comvia @rohanpaul_aiCorroborated

TL;DR

A Columbia CS professor argues LLMs are fundamentally limited to interpolating within known data manifolds, incapable of true scientific discovery.

Columbia CS Professor Argues LLMs Are Fundamentally Limited for Scientific Discovery

A concise argument from Columbia University Computer Science Professor Vishal Misra is gaining traction on social media, positing a fundamental limitation of large language models (LLMs) in the realm of scientific innovation. In a post summarized by AI commentator Rohan Paul, Misra states that LLMs are incapable of generating genuinely new scientific ideas because their core operation is confined to the statistical manifold of their training data.

Key Takeaways

Columbia CS Professor Vishal Misra argues LLMs cannot generate new scientific ideas because they learn structured maps of known data and fail outside those boundaries.
True discovery requires creating new conceptual maps, a capability current architectures lack.

The Core Argument: Interpolation vs. New Map-Making

Decoding the Controversy: ChatGPT vs LLMS - Fusion Chat

Professor Misra's thesis, as relayed, is that LLMs learn a structured, Bayesian manifold of all known data they are trained on. They perform exceptionally well at tasks that require operating within this pre-existing map—interpolating between known points, recombining concepts, and generating outputs that are plausible given the training distribution. This explains their proficiency in coding, summarizing known literature, or answering questions based on established facts.

However, true scientific discovery, Misra argues, requires stepping outside this known manifold. It involves creating entirely new conceptual maps, positing theories or relationships not implied by the existing data corpus. This act of abductive reasoning or paradigm-shifting conjecture is something current LLM architectures, which are fundamentally next-token predictors operating on historical data, cannot perform. They are tools for exploring and exploiting the known world, not for charting unknown territories.

Context and Counterpoints

This argument taps into a long-running debate in AI research about the difference between intelligence and mere pattern matching. It echoes earlier critiques from figures like Gary Marcus, who have long argued that pure statistical learning on text lacks the causal, model-based reasoning required for robust understanding and innovation.

The timing is notable, as the AI research community is intensely focused on using LLMs for scientific acceleration. Projects like Google's AlphaFold 3 and various AI-for-science initiatives demonstrate powerful applications of AI within structured scientific domains. However, these often combine LLMs with specialized symbolic reasoning, simulation, or reinforcement learning in closed environments—architectures that may circumvent the pure "manifold interpolation" limitation Misra describes.

What This Means in Practice

LLMs 101. Introduction | by Divyesh Bhatt | Medium

For AI engineers and researchers, Misra's perspective is a crucial reminder to temper expectations for autonomous AI scientists. It suggests that the most impactful near-term applications of LLMs in science will be as augmentation tools—hypothesis assistants, literature synthesizers, and experimental planners operating under human guidance and within well-defined frameworks—rather than as independent originators of novel theory.

gentic.news Analysis

Professor Misra's critique aligns with a growing body of expert skepticism about the ultimate ceiling of autoregressive, token-prediction models. This perspective directly contrasts with the optimistic narratives from some AI labs, like Anthropic's recent claims about Claude 3.5 Sonnet demonstrating "advanced reasoning," or OpenAI's o1 model family, which aims to enhance reasoning through search and process supervision. The core question Misra raises is whether these advances represent genuine steps toward new map-making or are merely more sophisticated interpolation within a vastly expanded data manifold.

This debate is central to understanding the trajectory of AI capabilities. If Misra is correct, achieving true scientific discovery would require a fundamental architectural shift beyond the transformer-based LLM paradigm, perhaps toward hybrid neuro-symbolic systems or AI that actively engages in physical experimentation and causal world-modeling. This connects to our previous coverage on Meta's Chameleon model, which sought to blend different modalities natively, and the ongoing research into "world models" in robotics and reinforcement learning. The limitation isn't necessarily permanent for AI as a field, but it may be a defining constraint for the current LLM era.

Frequently Asked Questions

Can LLMs help with scientific discovery at all?

Yes, but primarily as powerful tools for augmentation, not as autonomous discoverers. They can review vast literatures to surface overlooked connections, generate experimental code, propose potential hypotheses within a constrained space defined by human experts, and manage complex data. They act as force multipliers for human scientists rather than replacements.

What's the difference between a "Bayesian manifold" and creating a "new map"?

A Bayesian manifold, in this context, is a probabilistic representation of all the relationships and concepts learned from training data. The LLM navigates this map. Creating a "new map" refers to conceptualizing a framework or theory that is not a probabilistic combination of existing map points—like proposing quantum mechanics when only classical physics data exists. It requires a leap not justified by the existing statistical structure.

Are there any AI systems that can create "new maps"?

There are no consensus examples of AI achieving paradigm-shifting scientific discovery akin to a human genius. However, systems like DeepMind's AlphaGo (which discovered novel Go strategies) and AlphaFold (which solved protein folding) did create new knowledge within a tightly bounded, rule-based domain. These systems often combine learning with search, simulation, and explicit objective functions, suggesting pathways that differ from pure LLM pretraining.

Does this mean LLMs will never achieve true reasoning?

Not necessarily, but it suggests that scaling current architectures with more data and compute alone may not be sufficient. Breakthroughs may require integrating LLMs with other paradigms—symbolic reasoning engines, reinforcement learning in simulated environments, or mechanisms for iterative, real-world experimentation and theory revision that go beyond text prediction.

Source: gentic.news · Apr 21, 2026 · author=Ala SMITH · citation.json

AI-assisted reporting. Generated by gentic.news from multiple verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

Professor Misra's argument is a sharp, necessary corrective to the often-overhyped discourse around LLMs as general reasoning engines. It correctly identifies the core operation of an LLM as sophisticated interpolation within a high-dimensional statistical manifold of training data. This is why they excel at tasks with abundant examples (code generation, text summarization) but struggle with tasks requiring true abstraction beyond the training distribution, like novel puzzle-solving or, as argued, groundbreaking science. This perspective is supported by empirical results. Benchmarks like **SWE-Bench** (software engineering) or **GPQA** (expert-level QA) show LLMs performing well on problems that have parallels in their training corpus, but their performance often plateaus or becomes unreliable on genuinely novel, out-of-distribution challenges. The development of **chain-of-thought** and **reasoning** prompting is essentially an attempt to guide the model to better traverse its existing manifold, not to step outside of it. For practitioners, the implication is clear: design systems with an awareness of this boundary. Use LLMs as components within larger, structured pipelines where their interpolation power is invaluable—for example, parsing research papers into a knowledge graph—but where the "leap" of hypothesis generation or experimental design is governed by other algorithms or human oversight. The next architectural frontier, hinted at by projects like **OpenAI's o1** or research into **LLM-based simulators**, may be systems that can actively query and test their predictions against external models of the world, slowly expanding their own effective manifold through interaction, not just passive training.

#llms #scientific ai #ai research #opinion

Mentioned in this article

Vishal Misra large language models

Enjoyed this article?

Get the weekly AI intelligence briefing

✨AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Opinion & Analysis

CLAUDE.md Wastes 7K+ Tokens Per Turn; Skills Cut to 50

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

Original research · EUMAS 2026

MNEMA — A Witness Lattice for Multi-Agent AI Memory

Cryptographic memory units · 1−α detection floor · 15 pp PDF

Field framework · v1.0

Epistemic Infrastructure

12 pillars · 11-stage knowledge metabolism · pathology catalog

More in Opinion & Analysis

View all

Opinion & Analysis

Fchollet: Future AI Will Be 'Incredibly Cheap' to Train

Chollet claims frontier AI training will become incredibly cheap, challenging the assumption of permanent high costs. The post offers no evidence, making it a speculative provocation.

x.com/2d ago/3 min read

debateindustry analysisai research

A line graph showing a steep upward curve quickly reaching a flat ceiling, with a person pointing at the saturation…

Opinion & Analysis

gdb: Benchmarks Saturate Too Fast for Reliable AI Progress Tracking

@gdb notes benchmarks saturate quickly. This undermines AI progress tracking and may force shift to dynamic evaluations.

x.com/5d ago/3 min read

industry-analysisanthropicbenchmarks

Two businesspeople shaking hands in a modern office, symbolizing a partnership for deploying AI systems in enterprises

Opinion & Analysis

100

Anthropic, Blackstone Launch $1.5B AI Implementation Venture Ode

Anthropic and Blackstone launched Ode, a $1.5B AI implementation venture, embedding engineers in enterprises. It mirrors OpenAI's The Deployment Company, signaling a shift from model sales to services.

techcrunch.com/6d ago/3 min read/Widely Reported

servicesenterprise-aianthropic

Key Takeaways

The Core Argument: Interpolation vs. New Map-Making

Context and Counterpoints

What This Means in Practice

gentic.news Analysis

Frequently Asked Questions

Can LLMs help with scientific discovery at all?

What's the difference between a "Bayesian manifold" and creating a "new map"?

Are there any AI systems that can create "new maps"?

Does this mean LLMs will never achieve true reasoning?

AI Analysis

✨AI Toolslive

Related Articles

Anthropic, Blackstone Launch $1.5B AI Implementation Venture Ode

Why Traditional Retail Metrics Break Down in Agentic Commerce

6 MCP Server Design Lessons from Anthropic's Co-Creator — Stop Wrapping

Fable 5: Claude's Biggest Leap Since Opus 4.5, Says Beta Tester

How Claude Code scales to 500K+ line monorepos

CLAUDE.md Wastes 7K+ Tokens Per Turn; Skills Cut to 50

The framework underneath this story

More in Opinion & Analysis

Fchollet: Future AI Will Be 'Incredibly Cheap' to Train

gdb: Benchmarks Saturate Too Fast for Reliable AI Progress Tracking

Anthropic, Blackstone Launch $1.5B AI Implementation Venture Ode