Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

transformers

30 articles about transformers in AI news

8 AI Model Architectures Visually Explained: From Transformers to CNNs and VAEs

A visual guide maps eight foundational AI model architectures, including Transformers, CNNs, and VAEs, providing a clear reference for understanding specialized models beyond LLMs.

85% relevant

Graph Tokenization: A New Method to Apply Transformers to Graph Data

Researchers propose a framework that converts graph-structured data into sequences using reversible serialization and BPE tokenization. This enables standard Transformers like BERT to achieve state-of-the-art results on graph benchmarks, outperforming specialized graph models.

70% relevant

WiT: Waypoint Diffusion Transformers Achieve FID 2.09 on ImageNet 256×256 in 265 Epochs, Matching JiT-L/16 Efficiency

Researchers introduced WiT, a diffusion transformer that uses semantic waypoints from pretrained vision models to resolve trajectory conflicts in pixel-space flow matching. It matches the performance of JiT-L/16 at 600 epochs in just 265 epochs, achieving an FID of 2.09 on ImageNet 256×256.

85% relevant

Sam Altman Teases 'Massive Upgrade' AI Architecture, Compares Impact to Transformers vs. LSTM

OpenAI CEO Sam Altman said a new AI architecture is coming that represents a 'massive upgrade' comparable to the Transformer's leap over LSTM. He also stated current frontier models are now powerful enough to help research these next breakthroughs.

87% relevant

Meta's Sapiens2: 1B Human Image ViTs for Pose, Segmentation, Normals

Meta open-sourced Sapiens2 on Hugging Face, a family of vision transformers pretrained on 1 billion human images for pose estimation, segmentation, normal estimation, and point maps. The models target high-resolution human-centric perception.

92% relevant

Google's Memory Caching Bridges RNN-Transformer Gap with O(NL) Complexity

Google's 'Memory Caching' method saves RNN memory states at segment boundaries, allowing tokens to reference past checkpoints. This O(NL) approach significantly improves RNN performance on recall tasks, narrowing the gap with Transformers.

95% relevant

SteerViT Enables Natural Language Control of Vision Transformer Attention Maps

Researchers introduced SteerViT, a method that modifies Vision Transformers to accept natural language instructions, enabling users to steer the model's visual attention toward specific objects or concepts while maintaining representation quality.

85% relevant

Sam Altman Predicts Next 'Transformer-Level' Architecture Breakthrough, Says AI Models Are Now Smart Enough to Help Find It

OpenAI CEO Sam Altman stated he believes a new AI architecture, offering gains as significant as transformers over LSTMs, is yet to be discovered. He argues current advanced models are now sufficiently capable of assisting in that foundational research.

87% relevant

ViTRM: Vision Tiny Recursion Model Achieves Competitive CIFAR Performance with 84x Fewer Parameters Than ViT

Researchers propose ViTRM, a parameter-efficient vision model that replaces a multi-layer ViT encoder with a single 3-layer block applied recursively. It uses up to 84x fewer parameters than Vision Transformers while maintaining competitive accuracy on CIFAR-10 and CIFAR-100.

89% relevant

Vision AI Breakthrough: Automated Multi-Label Annotation Unlocks ImageNet's True Potential

Researchers have developed an automated pipeline to convert ImageNet's single-label training set into a multi-label dataset without human annotation. Using self-supervised Vision Transformers, the method improves model accuracy and transfer learning capabilities, addressing long-standing limitations in computer vision benchmarks.

78% relevant

Kimi Team's 'Attention Residuals' Replace Fixed Summation with Softmax Attention, Boosts GPQA-Diamond by +7.5%

Researchers propose Attention Residuals, a content-dependent alternative to standard residual connections in Transformers. The method improves scaling laws, matches a baseline trained with 1.25x more compute, and adds under 2% inference overhead.

97% relevant

PartRAG Revolutionizes 3D Generation with Retrieval-Augmented Part-Level Control

Researchers introduce PartRAG, a breakthrough framework that combines retrieval-augmented generation with diffusion transformers for precise part-level 3D creation and editing from single images. The system achieves superior geometric accuracy while enabling localized modifications without regenerating entire objects.

70% relevant

MLLM Raters Show Central Tendency Bias in Clinical Scoring

Study finds GPT-5 and other MLLMs show central tendency bias in clinical scoring, compressing predictions toward scale midpoint despite prompt modifications.

70% relevant

Collider-Bench Tests LLM Agents on LHC Analysis Reproduction

Collider-Bench tests LLM agents on reproducing LHC analyses from papers. No agent beats physicist-in-the-loop, highlighting gaps in scientific reasoning.

92% relevant

Tariffs Threaten $200B AI Data Center Buildout, CSIS Warns

CSIS warns tariffs could raise AI data center costs 20-30%, threatening $200B US hyperscaler buildout through 2028.

85% relevant

AI Data Centers Face 4-Year Post-Approval Delays, PJM Data Shows

PJM data shows AI data centers face 4-year post-approval delays, longer than the queue, threatening $700B CapEx plans.

88% relevant

PJM Data: AI Datacenter Delays Shift to Post-Approval Phase

PJM data shows AI datacenter projects now average 7+ years to go live, with post-approval delays of 4 years outpacing queue times, driven by transmission and supply chain bottlenecks.

80% relevant

Perplexity Claims 3x Blackwell Inference Throughput for 70B Models

Perplexity AI claims 3x inference throughput for 70B models on Nvidia Blackwell GPUs via FP4 and custom scheduling. The gain exceeds Nvidia's own 2x marketing claim.

85% relevant

Nvidia Blackwell CLC Boosts GEMM Tile Scheduling by 15% Over Static Persistence

Nvidia Blackwell CLC delivers up to 15% higher GEMM throughput via dynamic persistent tile scheduling, fixing load imbalance without startup overhead.

95% relevant

Claude Code's HTML Output Beats Markdown for LLM-Readable Docs

Claude Code generates HTML docs that LLMs parse more accurately than Markdown, per Thariq's analysis. Trade-off: harder for humans to edit.

92% relevant

Unsloth × NVIDIA Cut LLM Fine-Tuning ~25% — Three Glue-Code Wins on Blackwell

Daniel & Michael Han at Unsloth, in collaboration with NVIDIA, published a joint guide quantifying three glue-code optimizations that combine for ~25% faster LLM training on B200 Blackwell hardware. The wins target overhead around the main kernels — caching packed-sequence metadata, double-buffered gradient checkpoint reloads, and a cheaper GPT-OSS MoE router using argsort + bincount. All three are merged via public PRs.

87% relevant

ByteDance GenLIP: ViT Predicts Language Tokens Directly with 8B Samples

ByteDance's GenLIP trains ViTs to predict language tokens directly with a single autoregressive objective, outperforming baselines on 8B samples.

85% relevant

RoundPipe: Full Fine-Tune 32B Models on a Single 24GB GPU

RoundPipe fine-tunes 32B models on a single 24GB GPU with 1.5-2.2× speedups via round-robin pipeline dispatch.

85% relevant

AI Data Centers Face 220GW Grid Jam, Power Infrastructure Becomes Bottleneck

PJM's 220GW interconnection queue shows AI data center growth is now constrained by power grid capacity, not compute. Hyperscalers face 3-7 year delays.

85% relevant

40-Author Survey Unveils 'Levels × Laws' Framework for Agent World Models

A 40-author survey introduces a 'levels × laws' framework for world models in AI agents, spanning 3 capability levels and 4 law regimes, synthesizing 400+ works. It provides a shared vocabulary for designing and evaluating world models across traditionally siloed research communities.

85% relevant

AI Writes New Virus DNA: Stanford and Arc Institute's DNA Language Model

A tweet reports that researchers fed a language model a DNA sequence and asked it to generate a new virus, which it did. This highlights both the power and risk of generative AI in synthetic biology.

85% relevant

Building a Real-World Fraud Detection System: Beyond Just Training a Model

The article provides a practical breakdown of how to build a production-ready fraud detection system, emphasizing the integration of payment models, sequence models, and shadow mode deployment. It moves beyond pure model training to focus on the operational ML system.

92% relevant

Sam Altman: AI inference costs dropped 1000x from o1 to GPT-5.4

Sam Altman stated AI inference costs for solving a fixed hard problem dropped ~1000x from o1 to GPT-5.4 in ~16 months, crediting cross-layer engineering optimizations, not a single breakthrough.

85% relevant

GenRobot Launches 6-Camera Wearable for Embodied AI Data Capture

GenRobot launched DAS Ego, a wearable with six 2MP cameras for capturing zero-distortion, 270° FOV data. They also open-sourced the 'Gen Ego Data' dataset covering 200+ skills to train models on perception-action causality.

97% relevant

Microsoft's Fairwater AI Data Center Launches Early, Boosts Azure Capacity

Microsoft has launched its Fairwater AI data center ahead of schedule. The facility adds significant high-performance computing capacity to Azure's AI infrastructure, crucial for training and running large models.

92% relevant