In a discussion at a recent event, OpenAI CEO Sam Altman made a significant prediction about the future trajectory of artificial intelligence research. He posited that the field is on the cusp of discovering a new foundational architecture that could deliver a leap in capability comparable to the shift from Long Short-Term Memory (LSTM) networks to the transformer architecture.
What Altman Said
Speaking at an event, Altman stated: “I bet there is another new architecture to find that is gonna be like as big of a gain as transformers were over LSTMs. And I think you finally have models that are smart enough to help do that kind of research.”
This brief comment, shared via social media, encapsulates two key claims:
- A Pending Paradigm Shift: Altman is betting that a yet-to-be-discovered AI model architecture exists and will offer performance gains as substantial as those realized when transformers superseded LSTMs as the dominant architecture for sequence modeling.
- AI-Assisted AI Research: He asserts that the current generation of large language models (LLMs) and other AI systems have reached a level of intelligence where they can be active collaborators in the fundamental research required to uncover this new architecture.
The Historical Precedent: From LSTMs to Transformers
To understand the scale of improvement Altman is referencing, one must recall the transition in the late 2010s. LSTM networks, a type of recurrent neural network (RNN), were the state-of-the-art for tasks like machine translation and text generation. However, they struggled with long-range dependencies and were notoriously difficult to train efficiently due to their sequential nature.
The introduction of the transformer architecture in the 2017 paper "Attention Is All You Need" was revolutionary. By replacing recurrence with a self-attention mechanism, transformers enabled massive parallelization during training, could handle much longer context windows effectively, and dramatically improved performance on benchmark tasks. This architectural innovation directly enabled the subsequent explosion in scale that led to models like GPT-3, BERT, and today's frontier LLMs. The gain was not incremental; it was foundational, reshaping the entire field.
The Implications of AI-Assisted Discovery
The second part of Altman's statement may be as consequential as the first. He suggests that the tools for this discovery are now at hand—in the form of the very AI systems we've built. This points to a rapidly emerging field sometimes called "AI for Science" or "AI-designed AI." Instead of human researchers solely hypothesizing and testing new architectures through trial and error, advanced models could be used to:
- Simulate and evaluate potential architectural designs.
- Search vast spaces of possible model configurations.
- Generate novel code or schematic designs for neural networks.
- Parse and synthesize insights from the entire corpus of machine learning literature to propose new directions.
This creates a potential feedback loop: today's transformer-based models help design their own successors, which could then accelerate progress even further.
gentic.news Analysis
Altman's prediction is a strategic signal, not just a technical opinion. It aligns with OpenAI's established trajectory of betting big on scaling existing paradigms while simultaneously exploring the next one. This follows OpenAI's previous emphasis on superalignment research—the study of how to control and align AI systems much smarter than humans. If a "post-transformer" architecture emerges with capabilities far beyond today's models, the alignment problem becomes even more acute, making foundational research into control mechanisms a parallel necessity.
This statement also contextualizes the intense competition and investment in frontier model R&D. When the CEO of the company behind ChatGPT and GPT-4 publicly bets on a new architectural leap, it underscores that the current transformer scaling curve, while still productive, is viewed by insiders as having a horizon. It contradicts any narrative that the architecture problem is "solved." This aligns with trends we've covered, such as increased research into efficient architectures (like Mamba's state-space models) and hybrid neuro-symbolic systems, which seek to move beyond pure next-token prediction.
Furthermore, Altman's comment about models being "smart enough to help" directly connects to our coverage of projects like OpenAI's "Superalignment Fast Grants" and Google DeepMind's work on using AI for mathematical discovery. The entity relationship is clear: the leading labs building the most capable models (OpenAI, Anthropic, Google DeepMind) are the ones most incentivized and equipped to use those models as research co-pilots to breach the next frontier. The race is no longer just about scale and data; it's increasingly about the meta-race to discover the new engine that will replace the transformer.
Frequently Asked Questions
What were the main advantages of transformers over LSTMs?
Transformers introduced a self-attention mechanism that allowed the model to weigh the importance of all words in a sequence simultaneously, regardless of their distance from each other. This solved the long-range dependency problem of LSTMs and, crucially, enabled massive parallelization during training. This made it feasible to train on vastly larger datasets, leading directly to the era of large language models.
Is anyone currently researching potential successor architectures to transformers?
Yes, this is an active area of academic and industrial research. Examples include:
- State-space models (e.g., Mamba): Designed for efficient long-sequence processing.
- Retentive Networks (RetNet): Proposed as a foundation for large language models with training parallelism, low-cost inference, and strong performance.
- Monarch Mixer (M2): A new architecture combining simple and structured matrices for high efficiency.
- Hybrid models: Combining neural networks with symbolic reasoning or search algorithms.
What does "AI-assisted AI research" mean in practice?
In practice, this means using current AI models as tools in the research pipeline. This could involve using a large language model to read and summarize thousands of machine learning papers to identify underexplored ideas, using AI coding assistants to rapidly prototype new model architectures, or employing AI optimization algorithms to search for high-performing configurations within a defined design space. It automates and augments the iterative, experimental process of research.
Does this mean transformer-based models like GPT-4 are becoming obsolete?
Not immediately. Altman's prediction is about a future discovery. Transformer-based models continue to show improved performance with scaling and architectural refinements (like Mixture of Experts). The transition from LSTMs to transformers took years, and a new architecture would need to prove itself not only in raw performance but also in training stability, efficiency, and scalability before it could displace the deeply entrenched transformer ecosystem of models, tools, and developer knowledge.







