sample efficiency
30 articles about sample efficiency in AI news
OPID: Agents Learn From Hindsight Without External Memory
OPID lets agents learn hierarchical skills from hindsight, improving sample efficiency on ALFWorld, WebShop, Search QA without external memory at inference.
UniVidX Generates Video From 1,000 Samples, SIGGRAPH 2026
UniVidX generates omni-directional video from <1,000 training samples, using diffusion priors with stochastic masking, accepted at SIGGRAPH 2026.
Apple Releases DFNDR-12M Dataset, Claims 5x CLIP Training Efficiency
Apple has open-sourced DFNDR-12M, a multimodal dataset of 12.8 million image-text pairs with synthetic captions and pre-computed embeddings. The company claims it enables up to 5x training efficiency over standard CLIP datasets.
WiT: Waypoint Diffusion Transformers Achieve FID 2.09 on ImageNet 256×256 in 265 Epochs, Matching JiT-L/16 Efficiency
Researchers introduced WiT, a diffusion transformer that uses semantic waypoints from pretrained vision models to resolve trajectory conflicts in pixel-space flow matching. It matches the performance of JiT-L/16 at 600 epochs in just 265 epochs, achieving an FID of 2.09 on ImageNet 256×256.
Terence Tao: AI's 'Brute-Test' Approach to Math Research Could Narrow Human Efficiency Gap
Mathematician Terence Tao observes AI can synthesize millions of papers and brute-force test ideas, while humans rely on pattern recognition from few examples. He suggests the gap may narrow as AI systems develop world models, causal reasoning, and active learning.
Goal-Driven Data Optimization: Training Multimodal AI with 95% Less Data
Researchers introduce GDO, a framework that optimizes multimodal instruction tuning by selecting high-utility training samples. It achieves faster convergence and higher accuracy using 5-7% of the data typically required. This addresses compute inefficiency in training vision-language models.
GPT-5.5 Pro Leapfrogs on Epoch Benchmark; Base Model Beats Prior Pro
A tweet from @kimmonismus reveals GPT-5.5 Pro shows significant Epoch benchmark gains, and the non-Pro GPT-5.5 surpasses GPT-5.4 Pro, suggesting major efficiency improvements at OpenAI.
ReCast: A New RL Technique That Fixes Sparse-Hit Learning in Generative
Researchers propose ReCast, a 'repair-then-contrast' framework that fixes a fundamental flaw in group-based RL for generative recommendation: many sampled groups never become learnable. ReCast restores learnability for zero-reward groups and replaces normalization with contrastive updates, achieving up to 36.6% improvement in Pass@1 and 16.6x faster actor updates.
DigitalOcean's Signal Sampling Finds Top Agent Trajectories Without LLM Cost
DigitalOcean's paper introduces lightweight behavioral signals to rank 80k agent-user trajectories, achieving 82% informativeness in sampled reviews compared to 54% for random sampling, with no LLM overhead.
Optimizing Luxury Discovery: A Smarter Pre-Ranking Engine for Personalization
New research tackles inefficiency in recommendation pipelines by intelligently separating 'easy' from 'hard' customer matches. This heterogeneity-aware pre-ranking can boost personalization accuracy while controlling computational costs, directly applicable to luxury product discovery and clienteling.
Multimodal Knowledge Graphs Unlock Next-Generation AI Training Data
Researchers have developed MMKG-RDS, a novel framework that synthesizes high-quality reasoning training data by mining multimodal knowledge graphs. The system addresses critical limitations in existing data synthesis methods and improves model reasoning accuracy by 9.2% with minimal training samples.
Diffusion Models Accelerated: New AI Framework Makes Autonomous Driving Predictions 100x Faster
Researchers have developed cVMDx, a diffusion-based AI model that predicts highway trajectories 100x faster than previous approaches. By using DDIM sampling and Gaussian Mixture Models, it provides multimodal, uncertainty-aware predictions crucial for autonomous vehicle safety. The breakthrough addresses key efficiency and robustness challenges in real-world driving scenarios.
CLIPoint3D Bridges the 3D Reality Gap: How Language Models Are Revolutionizing Point Cloud Adaptation
Researchers have developed CLIPoint3D, a novel framework that leverages frozen CLIP backbones for few-shot unsupervised 3D point cloud domain adaptation. The approach achieves 3-16% accuracy gains over conventional methods while dramatically improving efficiency by avoiding heavy trainable encoders.
10M-Parameter GRAM Model Beats 3x Larger Rivals with Parallel Reasoning
GRAM uses stochastic recursion to explore multiple reasoning paths in parallel, achieving 97% on hard Sudoku with 10M parameters, outperforming deterministic models 3x its size.
CMU Benchmark: Claude Mythos Hits 9.9/16 on V8 Exploits, GPT-5.5 Trails at 5.5
CMU's ExploitBench shows Claude Mythos scores 9.9/16 on V8 exploits vs GPT-5.5's 5.5, but costs $36,428 per run — 12x more. The cost-performance tradeoff is the real story.
DataArc-SynData-Toolkit: Open-Source Framework for Multimodal Synthetic Data
DataArc-SynData-Toolkit is an open-source framework for multimodal synthetic data, aiming to lower technical barriers for LLM training. It features a configuration-driven pipeline with visual interface and modular architecture.
Unsloth × NVIDIA Cut LLM Fine-Tuning ~25% — Three Glue-Code Wins on Blackwell
Daniel & Michael Han at Unsloth, in collaboration with NVIDIA, published a joint guide quantifying three glue-code optimizations that combine for ~25% faster LLM training on B200 Blackwell hardware. The wins target overhead around the main kernels — caching packed-sequence metadata, double-buffered gradient checkpoint reloads, and a cheaper GPT-OSS MoE router using argsort + bincount. All three are merged via public PRs.
How a Custom Multimodal Transformer Beat a Fine-Tuned LLM for Attribute
LeBonCoin's ML team built a custom late-fusion transformer that uses pre-computed visual embeddings and character n-gram text vectors to predict ad attributes. It outperformed a fine-tuned VLM while running on CPU with sub-200ms latency, offering calibrated probabilities and 15-minute retraining cycles.
DeepMind’s New VAE Matches Stable Diffusion at 10x Resolution
DeepMind’s new VAE produces 1024x1024 images with quality comparable to Stable Diffusion’s 256x256 output, potentially replacing the standard VAE in generative pipelines. This cuts the token count by 10x, enabling faster generation and lower memory usage.
AFMRL: Using MLLMs to Generate Attributes for Better Product Retrieval in
AFMRL uses MLLMs to generate product attributes, then uses those attributes to train better multimodal representations for e-commerce retrieval. Achieves SOTA on large-scale datasets.
Pinterest's MIQPS: A Data-Driven Approach to URL Normalization for Content
Pinterest's engineering team details the MIQPS algorithm, which dynamically identifies 'important' vs. 'noise' query parameters per domain by testing if their removal changes a page's visual fingerprint. This solves the costly problem of ingesting and processing duplicate product pages from varied merchant URLs.
Fei-Fei Li Explains Why 'Open the Top Drawer' Is a Hard AI Problem
AI pioneer Fei-Fei Li breaks down why a simple instruction like 'open the top drawer and watch out for the vase' represents a major unsolved challenge in robotics, requiring robust perception, commonsense reasoning, and efficient learning from sparse rewards.
Paper Proposes 'Artificial Scientist' as New AGI Definition
A new paper defines AGI as an 'artificial scientist'—a system that adapts as generally as a human scientist under computational limits. This reframes the goal from passing benchmarks to autonomous planning, causal learning, and exploration.
Principal Engineer: Claude Code Rushes, Codex Deliberate; Guardrails Are Key
A senior engineer with 100 hours in Claude Code and 20 in Codex reports Claude often rushes to patch, while Codex is more deliberate. The real product is the guardrail system—docs and review loops—not the AI itself.
Google's Auto-Diagnose AI Hits 90% Accuracy Debugging Test Failures
Google researchers built Auto-Diagnose, an LLM tool that analyzes failure logs to suggest root causes. It achieved 90.14% accuracy in evaluation and was used on over 52,000 distinct failing tests after company-wide deployment.
Bi-Predictability: A New Real-Time Metric for Monitoring LLM
A new arXiv paper introduces 'bi-predictability' (P), an information-theoretic measure, and a lightweight Information Digital Twin (IDT) architecture to monitor the structural integrity of multi-turn LLM conversations in real-time. It detects a 'silent uncoupling' regime where outputs remain semantically sound but the conversational thread degrades, offering a scalable tool for AI assurance.
SPPO: Sequence-Level PPO Cuts RL Training Time 5.9x for Math Reasoning
Researchers introduced SPPO, a sequence-level PPO algorithm that reformulates reasoning as a contextual bandit. It achieves a 5.9x speedup over GRPO while matching performance on AIME, AMC, and MATH benchmarks at 1.5B and 7B scales.
Tsinghua Researchers Diagnose On-Policy Distillation Failures, Propose Fixes
Researchers from Tsinghua University have pinpointed two necessary conditions for successful on-policy distillation: compatible thinking patterns and novel teacher capabilities. They propose two recovery methods to salvage failing distillation runs.
Pinterest Details 'Request-Level Deduplication' to Scale Massive
Pinterest's engineering team published a detailed technical breakdown of 'request-level deduplication'—a family of techniques that eliminate redundant processing of user data across thousands of candidate items in their recommendation system. This approach was critical to scaling their Foundation Model by 100x while controlling infrastructure costs.
AI Hiring Systems Drive 42.5% Graduate Underemployment, Frustrating Job Seekers
Young graduates face a 42.5% underemployment rate, the highest since 2020, with AI hiring systems creating a frustrating layer of resume optimization before human review. This occurs as broader AI adoption in business is still in its early stages.