transformers
30 articles about transformers in AI news
8 AI Model Architectures Visually Explained: From Transformers to CNNs and VAEs
A visual guide maps eight foundational AI model architectures, including Transformers, CNNs, and VAEs, providing a clear reference for understanding specialized models beyond LLMs.
Graph Tokenization: A New Method to Apply Transformers to Graph Data
Researchers propose a framework that converts graph-structured data into sequences using reversible serialization and BPE tokenization. This enables standard Transformers like BERT to achieve state-of-the-art results on graph benchmarks, outperforming specialized graph models.
WiT: Waypoint Diffusion Transformers Achieve FID 2.09 on ImageNet 256×256 in 265 Epochs, Matching JiT-L/16 Efficiency
Researchers introduced WiT, a diffusion transformer that uses semantic waypoints from pretrained vision models to resolve trajectory conflicts in pixel-space flow matching. It matches the performance of JiT-L/16 at 600 epochs in just 265 epochs, achieving an FID of 2.09 on ImageNet 256×256.
Sam Altman Teases 'Massive Upgrade' AI Architecture, Compares Impact to Transformers vs. LSTM
OpenAI CEO Sam Altman said a new AI architecture is coming that represents a 'massive upgrade' comparable to the Transformer's leap over LSTM. He also stated current frontier models are now powerful enough to help research these next breakthroughs.
Meta's Sapiens2: 1B Human Image ViTs for Pose, Segmentation, Normals
Meta open-sourced Sapiens2 on Hugging Face, a family of vision transformers pretrained on 1 billion human images for pose estimation, segmentation, normal estimation, and point maps. The models target high-resolution human-centric perception.
Google's Memory Caching Bridges RNN-Transformer Gap with O(NL) Complexity
Google's 'Memory Caching' method saves RNN memory states at segment boundaries, allowing tokens to reference past checkpoints. This O(NL) approach significantly improves RNN performance on recall tasks, narrowing the gap with Transformers.
SteerViT Enables Natural Language Control of Vision Transformer Attention Maps
Researchers introduced SteerViT, a method that modifies Vision Transformers to accept natural language instructions, enabling users to steer the model's visual attention toward specific objects or concepts while maintaining representation quality.
Sam Altman Predicts Next 'Transformer-Level' Architecture Breakthrough, Says AI Models Are Now Smart Enough to Help Find It
OpenAI CEO Sam Altman stated he believes a new AI architecture, offering gains as significant as transformers over LSTMs, is yet to be discovered. He argues current advanced models are now sufficiently capable of assisting in that foundational research.
ViTRM: Vision Tiny Recursion Model Achieves Competitive CIFAR Performance with 84x Fewer Parameters Than ViT
Researchers propose ViTRM, a parameter-efficient vision model that replaces a multi-layer ViT encoder with a single 3-layer block applied recursively. It uses up to 84x fewer parameters than Vision Transformers while maintaining competitive accuracy on CIFAR-10 and CIFAR-100.
Vision AI Breakthrough: Automated Multi-Label Annotation Unlocks ImageNet's True Potential
Researchers have developed an automated pipeline to convert ImageNet's single-label training set into a multi-label dataset without human annotation. Using self-supervised Vision Transformers, the method improves model accuracy and transfer learning capabilities, addressing long-standing limitations in computer vision benchmarks.
Kimi Team's 'Attention Residuals' Replace Fixed Summation with Softmax Attention, Boosts GPQA-Diamond by +7.5%
Researchers propose Attention Residuals, a content-dependent alternative to standard residual connections in Transformers. The method improves scaling laws, matches a baseline trained with 1.25x more compute, and adds under 2% inference overhead.
PartRAG Revolutionizes 3D Generation with Retrieval-Augmented Part-Level Control
Researchers introduce PartRAG, a breakthrough framework that combines retrieval-augmented generation with diffusion transformers for precise part-level 3D creation and editing from single images. The system achieves superior geometric accuracy while enabling localized modifications without regenerating entire objects.
MLLM Raters Show Central Tendency Bias in Clinical Scoring
Study finds GPT-5 and other MLLMs show central tendency bias in clinical scoring, compressing predictions toward scale midpoint despite prompt modifications.
Collider-Bench Tests LLM Agents on LHC Analysis Reproduction
Collider-Bench tests LLM agents on reproducing LHC analyses from papers. No agent beats physicist-in-the-loop, highlighting gaps in scientific reasoning.
Tariffs Threaten $200B AI Data Center Buildout, CSIS Warns
CSIS warns tariffs could raise AI data center costs 20-30%, threatening $200B US hyperscaler buildout through 2028.
AI Data Centers Face 4-Year Post-Approval Delays, PJM Data Shows
PJM data shows AI data centers face 4-year post-approval delays, longer than the queue, threatening $700B CapEx plans.
PJM Data: AI Datacenter Delays Shift to Post-Approval Phase
PJM data shows AI datacenter projects now average 7+ years to go live, with post-approval delays of 4 years outpacing queue times, driven by transmission and supply chain bottlenecks.
Perplexity Claims 3x Blackwell Inference Throughput for 70B Models
Perplexity AI claims 3x inference throughput for 70B models on Nvidia Blackwell GPUs via FP4 and custom scheduling. The gain exceeds Nvidia's own 2x marketing claim.
Nvidia Blackwell CLC Boosts GEMM Tile Scheduling by 15% Over Static Persistence
Nvidia Blackwell CLC delivers up to 15% higher GEMM throughput via dynamic persistent tile scheduling, fixing load imbalance without startup overhead.
Claude Code's HTML Output Beats Markdown for LLM-Readable Docs
Claude Code generates HTML docs that LLMs parse more accurately than Markdown, per Thariq's analysis. Trade-off: harder for humans to edit.
Unsloth × NVIDIA Cut LLM Fine-Tuning ~25% — Three Glue-Code Wins on Blackwell
Daniel & Michael Han at Unsloth, in collaboration with NVIDIA, published a joint guide quantifying three glue-code optimizations that combine for ~25% faster LLM training on B200 Blackwell hardware. The wins target overhead around the main kernels — caching packed-sequence metadata, double-buffered gradient checkpoint reloads, and a cheaper GPT-OSS MoE router using argsort + bincount. All three are merged via public PRs.
ByteDance GenLIP: ViT Predicts Language Tokens Directly with 8B Samples
ByteDance's GenLIP trains ViTs to predict language tokens directly with a single autoregressive objective, outperforming baselines on 8B samples.
RoundPipe: Full Fine-Tune 32B Models on a Single 24GB GPU
RoundPipe fine-tunes 32B models on a single 24GB GPU with 1.5-2.2× speedups via round-robin pipeline dispatch.
AI Data Centers Face 220GW Grid Jam, Power Infrastructure Becomes Bottleneck
PJM's 220GW interconnection queue shows AI data center growth is now constrained by power grid capacity, not compute. Hyperscalers face 3-7 year delays.
40-Author Survey Unveils 'Levels × Laws' Framework for Agent World Models
A 40-author survey introduces a 'levels × laws' framework for world models in AI agents, spanning 3 capability levels and 4 law regimes, synthesizing 400+ works. It provides a shared vocabulary for designing and evaluating world models across traditionally siloed research communities.
AI Writes New Virus DNA: Stanford and Arc Institute's DNA Language Model
A tweet reports that researchers fed a language model a DNA sequence and asked it to generate a new virus, which it did. This highlights both the power and risk of generative AI in synthetic biology.
Building a Real-World Fraud Detection System: Beyond Just Training a Model
The article provides a practical breakdown of how to build a production-ready fraud detection system, emphasizing the integration of payment models, sequence models, and shadow mode deployment. It moves beyond pure model training to focus on the operational ML system.
Sam Altman: AI inference costs dropped 1000x from o1 to GPT-5.4
Sam Altman stated AI inference costs for solving a fixed hard problem dropped ~1000x from o1 to GPT-5.4 in ~16 months, crediting cross-layer engineering optimizations, not a single breakthrough.
GenRobot Launches 6-Camera Wearable for Embodied AI Data Capture
GenRobot launched DAS Ego, a wearable with six 2MP cameras for capturing zero-distortion, 270° FOV data. They also open-sourced the 'Gen Ego Data' dataset covering 200+ skills to train models on perception-action causality.
Microsoft's Fairwater AI Data Center Launches Early, Boosts Azure Capacity
Microsoft has launched its Fairwater AI data center ahead of schedule. The facility adds significant high-performance computing capacity to Azure's AI infrastructure, crucial for training and running large models.