pretraining

30 articles about pretraining in AI news

Alignment Pretraining Could Backfire, LessWrong Post Warns

LessWrong post warns synthetic alignment pretraining data could backfire in capable LLMs, leading to rebel personas.

Jun 17, 202680% relevant

SemiAnalysis: Pretraining Dead for All but Frontier Labs

@SemiAnalysis_ declares pretraining dead for non-frontier labs, citing 'Pretrainitis' as vanity-driven waste. Prompt engineering offers higher ROI.

Jun 11, 202685% relevant

GPT-5.5 'Spud' Prioritizes Pretraining Over Chain-of-Thought

A new OpenAI model, Spud (GPT-5.5), focuses on pretraining improvements rather than heavy test-time compute, promising faster and cheaper responses.

Apr 23, 202685% relevant

OpenAI Finishes GPT-5.5 'Spud' Pretraining, Halts Sora for Compute

OpenAI has finished pretraining its next major model, codenamed 'Spud' (likely GPT-5.5), built on a new architecture and data mix. The company reportedly halted its Sora video generation project entirely, sacrificing a $1B Disney investment, to prioritize compute for Spud's launch.

Apr 5, 202695% relevant

Why Deduplication Is the Most Underestimated Step in LLM Pretraining

A technical article on Medium argues that data deduplication is a critical, often overlooked step in LLM pretraining, directly impacting model performance and training cost. This is a foundational engineering concern for any team building or fine-tuning custom models.

Mar 29, 202686% relevant

Cursor Trains GPT-Size Model with 10-20x Compute

Cursor trained a GPT-size model from scratch with 10-20x more compute, announced at Compile. The move shifts from fine-tuning to pretraining for code generation.

Jun 21, 202691% relevant

Karpathy Joins Anthropic to Lead Recursive Self-Improvement Team

Andrej Karpathy joins Anthropic to lead a new recursive self-improvement team using Claude to accelerate pretraining, per @kimmonismus. The move signals a bet on synthetic data loops over brute-force scaling.

May 21, 202692% relevant

Genesis AI Reveals GENE-26.5: Humanoid Robot Cooks Stir-Fry, Solves Rubik's Cube

Genesis AI released GENE-26.5, a foundation model enabling a humanoid robot to autonomously cook stir-fry, solve Rubik's cubes, and organize cables. The approach uses human data pretraining and simulation closed-loop evaluation.

May 8, 202668% relevant

Geometric Latent Diffusion (GLD) Achieves SOTA Novel View Synthesis, Trains 4.4× Faster Than VAE

GLD repurposes features from geometric foundation models like Depth Anything 3 as a latent space for multi-view diffusion. It trains significantly faster than VAE-based approaches and achieves state-of-the-art novel view synthesis without text-to-image pretraining.

Mar 28, 202695% relevant

Tencent's Penguin-VL: A New Approach to Compact Multimodal AI

Tencent has launched Penguin-VL, a compact vision-language model that replaces traditional CLIP/SigLIP pretraining with an LLM-initialized vision encoder. The model achieves strong multimodal reasoning capabilities with just 2B and 8B parameter versions, potentially changing how smaller AI systems process images and text.

Mar 9, 202685% relevant

Brain-OF: The First Unified AI Model That Reads Multiple Brain Signals Simultaneously

Researchers have developed Brain-OF, the first omnifunctional foundation model that jointly processes fMRI, EEG, and MEG brain signals. This unified approach overcomes previous single-modality limitations by integrating complementary spatiotemporal data through innovative architecture and pretraining techniques.

Mar 2, 202680% relevant

Alibaba's Qwen-RobotNav Unifies Robot Navigation in One 2B-8B Model

Alibaba's Qwen-RobotNav unifies VLN, ObjectNav, tracking, and autonomous driving in a 2B-8B model, deploying zero-shot to quadruped robots via a configurable observation protocol.

Jul 5, 202687% relevant

Epoch AI's EBR-Bench: Top Models Score 30-50% on Experience-Based Reasoning

Epoch AI's EBR-Bench tests experience-based reasoning. Top models score 30-50%, with Google Gemini 3 Pro leading at 48.2%, revealing a gap between pattern matching and true learning.

Jul 1, 2026100% relevant

NanoEuler: GPT-2-Scale 116M Model Built in Pure C/CUDA From Scratch

NanoEuler is a 116M-parameter GPT-2-scale model built in pure C/CUDA from scratch. It provides a complete educational training pipeline for understanding LLMs at the lowest level.

Jun 28, 202675% relevant

OpenAI shows small doses of beneficial-trait RL improve 44 of 53 safety benchmarks — and the gains generalize

OpenAI researchers Jagadeesh, Saab, Singhal et al. published findings on June 18 showing RL training on traits like honesty and corrigibility improved 44 of 53 safety benchmarks. Gains generalized across domains not used in training, and the model resisted harmful fine-tuning better than the baselin

Jun 19, 202695% relevant

AI Generates Chest X-Rays Clinicians Cannot Tell Apart From Real Ones

RadiT XL, a 1.3B-parameter rectified flow transformer trained on 1.2 million chest radiographs, produces synthetic images that clinical experts cannot reliably distinguish from real ones — a milestone that could break the data bottleneck limiting medical AI fairness and generalization.

Jun 19, 202685% relevant

Qwen 2.5 7B Expresses Near-Constant Confidence Whether It Is Right or Wrong, Study Finds

A June 2026 arXiv preprint from University of Minnesota researchers tested Qwen 2.5 7B on structured clinical prediction data and found its verbalized confidence scores are essentially uninformative -- clustering between 0.856 and 0.937 no matter how well or badly the model performs. Combining SHAP-

Jun 19, 202692% relevant

NVIDIA Blackwell Sweeps MLPerf Training 6.0, GB300 Hits 1.6x Speedup

NVIDIA Blackwell swept MLPerf Training 6.0 across all seven benchmarks. GB300 NVL72 delivered 1.6x speedup over GB200 NVL72 using NVFP4 and 8,192 GPUs.

Jun 16, 2026100% relevant

Google Titan: A New Architecture That Could Dethrone Transformers

Google's Titan architecture claims to surpass Transformers on long-context tasks via neural long-term memory, achieving 1.2x-2.5x speedups on benchmarks.

Jun 6, 202687% relevant

Prithvi-EO Fails Cross-Country Crop Yield Generalization, Paper Shows

Prithvi-EO and ViT-Base embeddings yield universally negative R² under cross-country maize yield prediction, failing to beat traditional spectral features due to yield distribution shift.

May 12, 202672% relevant

Sakana AI 7B Conductor Hits SOTA on GPQA-Diamond via Orchestration

Sakana AI's 7B Conductor model achieves SOTA on GPQA-Diamond and LiveCodeBench via orchestration of specialized sub-models, accepted at ICLR 2026.

May 5, 202685% relevant

ByteDance GenLIP: ViT Predicts Language Tokens Directly with 8B Samples

ByteDance's GenLIP trains ViTs to predict language tokens directly with a single autoregressive objective, outperforming baselines on 8B samples.

May 4, 202685% relevant

Meta's Sapiens2: 1B Human Image ViTs for Pose, Segmentation, Normals

Meta open-sourced Sapiens2 on Hugging Face, a family of vision transformers pretrained on 1 billion human images for pose estimation, segmentation, normal estimation, and point maps. The models target high-resolution human-centric perception.

Apr 23, 202692% relevant

Columbia Prof: LLMs Can't Generate New Science, Only Map Known Data

Columbia CS Professor Vishal Misra argues LLMs cannot generate new scientific ideas because they learn structured maps of known data and fail outside those boundaries. True discovery requires creating new conceptual maps, a capability current architectures lack.

Apr 21, 202687% relevant

FeCoSR: A Federated Framework for Cross-Market Sequential Recommendation

A new arXiv paper introduces FeCoSR, a federated collaboration framework for cross-market sequential recommendation. It tackles data isolation and market heterogeneity by enabling many-to-many collaborative training with a novel loss function, showing advantages over traditional transfer approaches.

Apr 16, 202682% relevant

LLM Schema-Adaptive Method Enables Zero-Shot EHR Transfer

Researchers propose Schema-Adaptive Tabular Representation Learning, an LLM-driven method that transforms structured variables into semantic statements. It enables zero-shot alignment across unseen EHR schemas and outperforms clinical baselines, including neurologists, on dementia diagnosis tasks.

Apr 15, 202699% relevant

Indian Factory Workers Wear Head Cams to Gather Embodied AI Training Data

To overcome the high cost of robot fleet data collection, companies are deploying head cameras on human factory workers. This first-person video captures the sequencing, posture, and micro-adjustments of real work, serving as a proxy for expensive robotic action data.

Apr 12, 202695% relevant

Kronos AI Outperforms Leading Time Series Models by 93% on Candlestick Data

Researchers from Tsinghua University released Kronos, an open-source foundation model trained on 12 billion candlestick records from 45 exchanges. It reportedly achieves 93% higher accuracy than leading time series models for price and volatility forecasting, requiring no fine-tuning.

Apr 11, 202695% relevant

Benchmark Shadows Study: Data Alignment Limits LLM Generalization

A controlled study finds that data distribution, not just volume, dictates LLM capability. Benchmark-aligned training inflates scores but creates narrow, brittle models, while coverage-expanding data leads to more distributed parameter adaptation and better generalization.

Apr 10, 2026100% relevant

Google's TimesFM: 200M-Param Foundation Model for Zero-Shot Time Series

Google released TimesFM, a 200M-parameter foundation model for time series forecasting that works without training on user data. It's now available open-source and as a product inside BigQuery.

Apr 9, 202697% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety