Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

pretraining

30 articles about pretraining in AI news

GPT-5.5 'Spud' Prioritizes Pretraining Over Chain-of-Thought

A new OpenAI model, Spud (GPT-5.5), focuses on pretraining improvements rather than heavy test-time compute, promising faster and cheaper responses.

85% relevant

OpenAI Finishes GPT-5.5 'Spud' Pretraining, Halts Sora for Compute

OpenAI has finished pretraining its next major model, codenamed 'Spud' (likely GPT-5.5), built on a new architecture and data mix. The company reportedly halted its Sora video generation project entirely, sacrificing a $1B Disney investment, to prioritize compute for Spud's launch.

95% relevant

Why Deduplication Is the Most Underestimated Step in LLM Pretraining

A technical article on Medium argues that data deduplication is a critical, often overlooked step in LLM pretraining, directly impacting model performance and training cost. This is a foundational engineering concern for any team building or fine-tuning custom models.

86% relevant

Karpathy Joins Anthropic to Lead Recursive Self-Improvement Team

Andrej Karpathy joins Anthropic to lead a new recursive self-improvement team using Claude to accelerate pretraining, per @kimmonismus. The move signals a bet on synthetic data loops over brute-force scaling.

90% relevant

Genesis AI Reveals GENE-26.5: Humanoid Robot Cooks Stir-Fry, Solves Rubik's Cube

Genesis AI released GENE-26.5, a foundation model enabling a humanoid robot to autonomously cook stir-fry, solve Rubik's cubes, and organize cables. The approach uses human data pretraining and simulation closed-loop evaluation.

68% relevant

Geometric Latent Diffusion (GLD) Achieves SOTA Novel View Synthesis, Trains 4.4× Faster Than VAE

GLD repurposes features from geometric foundation models like Depth Anything 3 as a latent space for multi-view diffusion. It trains significantly faster than VAE-based approaches and achieves state-of-the-art novel view synthesis without text-to-image pretraining.

95% relevant

Tencent's Penguin-VL: A New Approach to Compact Multimodal AI

Tencent has launched Penguin-VL, a compact vision-language model that replaces traditional CLIP/SigLIP pretraining with an LLM-initialized vision encoder. The model achieves strong multimodal reasoning capabilities with just 2B and 8B parameter versions, potentially changing how smaller AI systems process images and text.

85% relevant

Brain-OF: The First Unified AI Model That Reads Multiple Brain Signals Simultaneously

Researchers have developed Brain-OF, the first omnifunctional foundation model that jointly processes fMRI, EEG, and MEG brain signals. This unified approach overcomes previous single-modality limitations by integrating complementary spatiotemporal data through innovative architecture and pretraining techniques.

80% relevant

Prithvi-EO Fails Cross-Country Crop Yield Generalization, Paper Shows

Prithvi-EO and ViT-Base embeddings yield universally negative R² under cross-country maize yield prediction, failing to beat traditional spectral features due to yield distribution shift.

72% relevant

Sakana AI 7B Conductor Hits SOTA on GPQA-Diamond via Orchestration

Sakana AI's 7B Conductor model achieves SOTA on GPQA-Diamond and LiveCodeBench via orchestration of specialized sub-models, accepted at ICLR 2026.

85% relevant

ByteDance GenLIP: ViT Predicts Language Tokens Directly with 8B Samples

ByteDance's GenLIP trains ViTs to predict language tokens directly with a single autoregressive objective, outperforming baselines on 8B samples.

85% relevant

Meta's Sapiens2: 1B Human Image ViTs for Pose, Segmentation, Normals

Meta open-sourced Sapiens2 on Hugging Face, a family of vision transformers pretrained on 1 billion human images for pose estimation, segmentation, normal estimation, and point maps. The models target high-resolution human-centric perception.

92% relevant

Columbia Prof: LLMs Can't Generate New Science, Only Map Known Data

Columbia CS Professor Vishal Misra argues LLMs cannot generate new scientific ideas because they learn structured maps of known data and fail outside those boundaries. True discovery requires creating new conceptual maps, a capability current architectures lack.

87% relevant

FeCoSR: A Federated Framework for Cross-Market Sequential Recommendation

A new arXiv paper introduces FeCoSR, a federated collaboration framework for cross-market sequential recommendation. It tackles data isolation and market heterogeneity by enabling many-to-many collaborative training with a novel loss function, showing advantages over traditional transfer approaches.

82% relevant

LLM Schema-Adaptive Method Enables Zero-Shot EHR Transfer

Researchers propose Schema-Adaptive Tabular Representation Learning, an LLM-driven method that transforms structured variables into semantic statements. It enables zero-shot alignment across unseen EHR schemas and outperforms clinical baselines, including neurologists, on dementia diagnosis tasks.

99% relevant

Indian Factory Workers Wear Head Cams to Gather Embodied AI Training Data

To overcome the high cost of robot fleet data collection, companies are deploying head cameras on human factory workers. This first-person video captures the sequencing, posture, and micro-adjustments of real work, serving as a proxy for expensive robotic action data.

95% relevant

Kronos AI Outperforms Leading Time Series Models by 93% on Candlestick Data

Researchers from Tsinghua University released Kronos, an open-source foundation model trained on 12 billion candlestick records from 45 exchanges. It reportedly achieves 93% higher accuracy than leading time series models for price and volatility forecasting, requiring no fine-tuning.

95% relevant

Benchmark Shadows Study: Data Alignment Limits LLM Generalization

A controlled study finds that data distribution, not just volume, dictates LLM capability. Benchmark-aligned training inflates scores but creates narrow, brittle models, while coverage-expanding data leads to more distributed parameter adaptation and better generalization.

100% relevant

Google's TimesFM: 200M-Param Foundation Model for Zero-Shot Time Series

Google released TimesFM, a 200M-parameter foundation model for time series forecasting that works without training on user data. It's now available open-source and as a product inside BigQuery.

97% relevant

Meta's Muse Spark Hits 58% on Humanity's Last Exam, 10x More Efficient Than Llama 4

Meta Superintelligence Labs has released Muse Spark, a natively multimodal reasoning model that powers Meta AI. It scores 58% on Humanity's Last Exam and matches Llama 4 Maverick's capability with over 10x less compute.

99% relevant

ASI-Evolve: This AI Designs Better AI Than Humans Can — 105 New Architectures, Zero Human Guidance

Researchers built an AI that runs the entire research cycle on its own — reading papers, designing experiments, running them, and learning from results. It discovered 105 architectures that beat human-designed models, and invented new learning algorithms. Open-sourced.

98% relevant

Meta Halts Mercor Work After Supply Chain Breach Exposes AI Training Secrets

A supply chain attack via compromised software updates at data-labeling vendor Mercor has forced Meta to pause collaboration, risking exposure of core AI training pipelines and quality metrics used by top labs.

97% relevant

Google Open-Sources TimesFM: A 100B-Point Time Series Foundation Model for Zero-Shot Forecasting

Google has open-sourced TimesFM, a foundation model for time series forecasting trained on 100 billion real-world time points. It requires no dataset-specific training and can generate predictions instantly for domains like traffic, weather, and demand.

95% relevant

How Structured JSON Inputs Eliminated Hallucinations in a Fine-Tuned 7B Code Model

A developer fine-tuned a 7B code model on consumer hardware to generate Laravel PHP files. Hallucinations persisted until prompts were replaced with structured JSON specs, which eliminated ambiguous gap-filling errors and reduced debugging time dramatically.

92% relevant

GLM-5.1 Released by Zhipu AI, Claiming Performance Close to GPT-4o and Claude 3.5

Zhipu AI has released GLM-5.1, its latest large language model series. The company claims its top-tier model, GLM-5.1-9B/1M, achieves performance close to GPT-4o and Claude 3.5 Sonnet, narrowing the gap with leading Western models.

85% relevant

CanViT: First Active-Vision Foundation Model Hits 45.9% mIoU on ADE20K with Sequential Glimpses

Researchers introduce CanViT, the first task- and policy-agnostic Active-Vision Foundation Model (AVFM). It achieves 38.5% mIoU on ADE20K segmentation with a single low-resolution glimpse, outperforming prior active models while using 19.5x fewer FLOPs.

91% relevant

Meta's V-JEPA 2.1 Achieves +20% Robotic Grasp Success with Dense Feature Learning from 1M+ Hours of Video

Meta researchers released V-JEPA 2.1, a video self-supervised learning model that learns dense spatial-temporal features from over 1 million hours of video. The approach improves robotic grasp success by ~20% over previous methods by forcing the model to understand precise object positions and movements.

97% relevant

ReXInTheWild Benchmark Reveals VLMs Struggle with Medical Photos: Gemini-3 Leads at 78%, MedGemma Trails at 37%

Researchers introduced ReXInTheWild, a benchmark of 955 clinician-verified questions based on 484 real medical photographs. Leading multimodal models show wide performance gaps, with Gemini-3 scoring 78% accuracy while the specialized MedGemma model achieved only 37%.

75% relevant

PRISM Study: Mid-Training on 27B Tokens Boosts Math Scores by +15 to +40 Points, Enables Effective RL

A comprehensive study shows mid-training on 27B high-quality tokens consistently improves reasoning in LLMs. This 'retention-aware' phase restructures 90% of weights, creating a configuration where RL can succeed.

88% relevant

Kimi Team's 'Attention Residuals' Replace Fixed Summation with Softmax Attention, Boosts GPQA-Diamond by +7.5%

Researchers propose Attention Residuals, a content-dependent alternative to standard residual connections in Transformers. The method improves scaling laws, matches a baseline trained with 1.25x more compute, and adds under 2% inference overhead.

97% relevant