Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

training

30 articles about training in AI news

xAI Drops JAX, Builds Custom C Training Framework After <10% MFU

xAI dropped JAX for GPU training after <10% MFU, building a custom C framework with Grok Build. NVIDIA's JAX team loses its biggest customer.

89% relevant

LLM-EDT: Dual-Phase Training Boosts Cross-Domain Rec by 12.4%

LLM-EDT improves cross-domain sequential recommendation by up to 12.4% using dual-phase training and LLM-based item generation.

74% relevant

Cerebras WSE-3 Claims 10x Training Speed Over Nvidia H100 on GPT-Scale Model

Cerebras claims 10x training speed over Nvidia H100 for GPT-3-scale models using WSE-3. Benchmark lacks power and cost data, limiting independent verification.

64% relevant

Nebius Claims First NVIDIA GB300 Exemplar Cloud for Training

Nebius becomes first cloud provider validated as NVIDIA Exemplar Cloud on GB300 for training, targeting hyperscale AI workloads.

94% relevant

Vibe Training: SLM Replaces LLM-as-a-Judge, 8x Faster, 50% Fewer Errors

Plurai introduces 'vibe training,' using adversarial agent swarms to distill a small language model (SLM) for evaluating and guarding production AI agents. The SLM outperforms standard LLM-as-a-judge setups with ~8x faster inference and ~50% fewer evaluation errors.

86% relevant

Google Splits TPU Line: 8t for Training, 8i for Inference

At Cloud Next 2026, Google introduced two new AI chips — TPU 8t for training and TPU 8i for inference — splitting its custom silicon for the first time. OpenAI, Anthropic, and Meta are buying multi-gigawatt TPU capacity, signaling a crack in NVIDIA's 81% market share.

100% relevant

GPT-5.5 'Spud' Prioritizes Pretraining Over Chain-of-Thought

A new OpenAI model, Spud (GPT-5.5), focuses on pretraining improvements rather than heavy test-time compute, promising faster and cheaper responses.

85% relevant

Building a Real-World Fraud Detection System: Beyond Just Training a Model

The article provides a practical breakdown of how to build a production-ready fraud detection system, emphasizing the integration of payment models, sequence models, and shadow mode deployment. It moves beyond pure model training to focus on the operational ML system.

92% relevant

Apple Releases DFNDR-12M Dataset, Claims 5x CLIP Training Efficiency

Apple has open-sourced DFNDR-12M, a multimodal dataset of 12.8 million image-text pairs with synthetic captions and pre-computed embeddings. The company claims it enables up to 5x training efficiency over standard CLIP datasets.

85% relevant

Gur Singh Claims 7 M4 MacBooks Match A100, Calls Cloud GPU Training a 'Scam'

Developer Gur Singh posted that seven M4 MacBooks (2.9 TFLOPS each) match an NVIDIA A100's performance, calling cloud GPU training a 'scam' and advocating for distributed, consumer-hardware approaches.

77% relevant

AirTrain Enables Distributed ML Training on MacBooks Over Wi-Fi

Developer @AlexanderCodes_ open-sourced AirTrain, a tool that enables distributed ML training across Apple Silicon MacBooks using Wi-Fi by syncing gradients every 500 steps instead of every step. This makes personal device training feasible for models up to 70B parameters without cloud GPU costs.

95% relevant

Shopify Engineering Teases 'Autoresearch' Beyond Model Training in 2026 Preview

Shopify Engineering has previewed a 2026 perspective suggesting 'autoresearch'—automated research processes—will have applications extending beyond just training AI models. This signals a broader operational automation strategy for the e-commerce giant.

100% relevant

LLM-HYPER: A Training-Free Framework for Cold-Start Ad CTR Prediction

A new arXiv paper introduces LLM-HYPER, a framework that treats large language models as hypernetworks to generate parameters for click-through rate estimators in a training-free manner. It uses multimodal ad content and few-shot prompting to infer feature weights, drastically reducing the cold-start period for new promotional ads and has been deployed on a major U.S. e-commerce platform.

96% relevant

MiniMax Open-Sources M2.7 Model, Details 'Self-Evolution' Training

Chinese AI firm MiniMax has open-sourced its M2.7 model. The key detail from its blog is a 'self-evolution' training process, likened to AlphaGo's self-play, for iterative improvement.

89% relevant

xAI's Grok 4.2 at 0.5T Params, Colossus 2 Training Models up to 10T

A tweet from AI researcher Rohan Paul states xAI's current Grok 4.2 model uses 0.5 trillion parameters. In parallel, the Colossus 2 project is training a suite of seven models ranging from 1 trillion to 10 trillion parameters.

85% relevant

Anthropic Faces Backlash Over Alleged Unauthorized Email Training for Claude

Anthropic is accused of training its Claude AI on a company's private email database without permission. This raises severe data privacy and legal questions for enterprise AI.

89% relevant

Walmart Research Proposes Unified Training for Sponsored Search Retrieval

A new arXiv preprint details Walmart's novel bi-encoder training framework for sponsored search retrieval. It addresses the limitations of using user engagement as a sole training signal by combining graded relevance labels, retrieval priors, and engagement data. The method outperformed the production system in offline and online tests.

99% relevant

Meta's New Training Recipe: Small Models Should Learn from a Single Expert

Meta AI researchers propose a novel training recipe for small language models: instead of learning from many large 'expert' models simultaneously, they should be trained sequentially on one expert at a time. This method, detailed in a new paper, reportedly improves final model performance and training efficiency.

85% relevant

NVIDIA Advances AI Robotics with Simulation-First Training, Isaac & Jetson

NVIDIA showcased AI robotics advances using foundation models and synthetic environments for training, enabling scalable deployment in real-world sectors like agriculture and solar. Key platforms are the Isaac simulator and Jetson edge AI hardware.

85% relevant

Tiny 9M Parameter LLM Tutorial Runs on Colab, Demystifies Transformer Training

A developer shared a complete tutorial for training a ~9M parameter transformer language model from scratch, including tokenizer, training, and inference, all runnable on Google Colab in minutes.

85% relevant

OpenAI Finishes GPT-5.5 'Spud' Pretraining, Halts Sora for Compute

OpenAI has finished pretraining its next major model, codenamed 'Spud' (likely GPT-5.5), built on a new architecture and data mix. The company reportedly halted its Sora video generation project entirely, sacrificing a $1B Disney investment, to prioritize compute for Spud's launch.

95% relevant

Meta Halts Mercor Work After Supply Chain Breach Exposes AI Training Secrets

A supply chain attack via compromised software updates at data-labeling vendor Mercor has forced Meta to pause collaboration, risking exposure of core AI training pipelines and quality metrics used by top labs.

97% relevant

Video of Massive AI Training Lab in China Sparks Debate on Automation's Scale

A social media post showcasing a vast Chinese AI training lab has reignited discussions about job displacement, underscoring the tangible infrastructure powering the current AI surge.

85% relevant

HIVE Framework Introduces Hierarchical Cross-Attention for Vision-Language Pre-Training, Outperforms Self-Attention on MME and GQA

A new paper introduces HIVE, a hierarchical pre-training framework that connects vision encoders to LLMs via cross-attention across multiple layers. It outperforms conventional self-attention methods on benchmarks like MME and GQA, improving vision-language alignment.

84% relevant

VMLOps Launches 'Algorithm Explorer' for Real-Time Visualization of AI Training Dynamics

VMLOps released Algorithm Explorer, an interactive tool that visualizes ML training in real-time, showing gradients, weights, and decision boundaries. It combines math, visuals, and code to aid debugging and education.

85% relevant

Why Deduplication Is the Most Underestimated Step in LLM Pretraining

A technical article on Medium argues that data deduplication is a critical, often overlooked step in LLM pretraining, directly impacting model performance and training cost. This is a foundational engineering concern for any team building or fine-tuning custom models.

86% relevant

Fine-Tuning LLMs While You Sleep: How Autoresearch and Red Hat Training Hub Outperformed the HINT3 Benchmark

Automated fine-tuning tools now let you run hundreds of training experiments overnight for under $50. Here's how Autoresearch and Red Hat's platform outperformed HINT3, and the tools you can use today.

95% relevant

NVIDIA's PivotRL Cuts Agent RL Training Costs 5.5x, Matches Full RL Performance on SWE-Bench

NVIDIA researchers introduced PivotRL, a post-training method that achieves competitive agent performance with end-to-end RL while using 5.5x less wall-clock time. The framework identifies high-signal 'pivot' turns in existing trajectories, avoiding costly full rollouts.

99% relevant

Training-Free Polynomial Graph Filtering: A New Paradigm for Ultra-Fast Multimodal Recommendation

Researchers propose a training-free graph filtering method for multimodal recommendation that fuses text, image, and interaction data without neural network training. It achieves up to 22.25% higher accuracy and runs in under 10 seconds, dramatically reducing computational overhead.

80% relevant

LeWorldModel: Yann LeCun's Team Achieves Stable World Model Training with 15M Parameters, No Training Tricks

Researchers including Yann LeCun introduce LeWorldModel, a 15M-parameter world model that learns scene dynamics from raw pixels without complex training stabilization tricks. It trains in hours on one GPU and plans 48x faster than foundation-model-based alternatives.

87% relevant