model distillation
30 articles about model distillation in AI news
Apple Reportedly Gains Full Internal Access to Google's Gemini for On-Device Model Distillation
A report claims Apple's AI deal with Google includes full internal model access, enabling distillation of Gemini's reasoning into smaller, on-device models. This would allow Apple to build specialized, efficient AI without relying solely on cloud APIs.
OpenAI, Anthropic, Google Form Alliance to Block Chinese Model Distillation
OpenAI, Anthropic, and Google are collaborating through the Frontier Model Forum to share intelligence and prevent Chinese firms from distilling their advanced AI models. This formalizes defensive measures in the US-China AI race.
Anthropic's Distillation Allegations Reveal AI's Uncharted Legal Frontier
Anthropic's claims that Chinese AI firms used thousands of fake accounts to extract capabilities from Claude models highlight the legal grey area of model distillation. The incident coincides with Anthropic relaxing its safety policies amid Pentagon pressure.
Subliminal Transfer Study Shows AI Agents Inherit Unsafe Behaviors Despite
New research demonstrates unsafe behavioral traits in AI agents can transfer subliminally through model distillation, with students inheriting deletion biases despite rigorous keyword filtering. This exposes a critical security flaw in agent training pipelines.
The AI Espionage Era: How Chinese Firms Launched Industrial-Scale Attacks on Claude
Anthropic reveals three massive AI model distillation campaigns by Chinese competitors who used 24,000 fake accounts to extract Claude's capabilities through 16 million exchanges. This industrial-scale intellectual property theft highlights growing tensions in the global AI race.
The AI Espionage Frontier: Anthropic Exposes Systematic Claude Data Extraction by Chinese AI Labs
Anthropic has revealed that Chinese AI companies DeepSeek, Moonshot, and MiniMax allegedly used 24,000 fake accounts to execute 16 million queries against Claude's API, systematically extracting its capabilities through model distillation techniques. This sophisticated operation bypassed access restrictions and targeted Claude's reasoning, programming, and tool usage functions.
Aligning Language Models from User Interactions: A Self-Distillation Method for Continuous Learning
Researchers propose a method to align LLMs using raw, multi-turn user conversations. By applying self-distillation on follow-up messages, models improve without explicit feedback, enabling personalization and continual adaptation from deployment data.
RLSD Unifies Self-Distillation & Verifiable Rewards to Fix RL Leakage
Researchers propose RLSD, a method merging on-policy self-distillation with verifiable rewards to fix information leakage and training instability in language model reinforcement learning.
Zero-Shot Cross-Domain Knowledge Distillation: A YouTube-to-Music Case Study
Google researchers detail a case study transferring knowledge from YouTube's massive video recommender to a smaller music app, using zero-shot cross-domain distillation to boost ranking models without training a dedicated teacher. This offers a practical blueprint for improving low-traffic AI systems.
New Pipeline Enables Lossless Distillation of Transformer LLMs into Hybrid xLSTM Architectures
Researchers developed a distillation pipeline that transfers transformer LLM knowledge into hybrid xLSTM models. The distilled students match or exceed teacher models like Llama, Qwen, and Olmo on downstream tasks.
Tsinghua Researchers Diagnose On-Policy Distillation Failures, Propose Fixes
Researchers from Tsinghua University have pinpointed two necessary conditions for successful on-policy distillation: compatible thinking patterns and novel teacher capabilities. They propose two recovery methods to salvage failing distillation runs.
DRKL: Diversity-Aware Reverse KL Divergence Fixes Overconfidence in LLM Distillation
A new paper proposes Diversity-aware Reverse KL (DRKL), a fix for the overconfidence and reduced diversity caused by the popular Reverse KL divergence in LLM distillation. DRKL consistently outperforms existing objectives across multiple benchmarks.
Embedding Matching Distills Genomic Models 200x, Matches mRNA-Bench Performance
A new distillation framework transfers mRNA representations from a large genomic foundation model to a specialized model 200x smaller. It uses embedding-level distillation, outperforming logit-based methods and competing with larger models on mRNA-bench.
Anthropic Exposes Massive AI Model Theft Operation Targeting Claude
Anthropic has uncovered sophisticated 'distillation' campaigns by Chinese AI firms DeepSeek, Moonshot, and MiniMax, who allegedly used thousands of fraudulent accounts to copy Claude's capabilities. The operation generated over 16 million exchanges to replicate Claude's reasoning and coding strengths.
SDAR: Self-Distilled RL Stabilizes Multi-Turn LLM Agents, +9.4% on ALFWorld
SDAR gates self-distillation within GRPO to stabilize multi-turn LLM agent training, yielding +9.4% on ALFWorld and gains on WebShop and Search-QA across Qwen2.5 and Qwen3 models.
Kuaishou's Dual-Rerank: A New Industrial Framework for High-Stakes
Researchers from Kuaishou introduce Dual-Rerank, a framework designed for industrial-scale generative reranking. It addresses the dual dilemma of structural trade-offs (AR vs. NAR models) and optimization gaps (SL vs. RL) through Sequential Knowledge Distillation and List-wise Decoupled Reranking Optimization. A/B tests on production traffic show significant improvements in user satisfaction and watch time with reduced latency.
DIET: A New Framework for Continually Distilling Streaming Datasets in Recommender Systems
Researchers propose DIET, a framework for streaming dataset distillation in recommender systems. It maintains a compact, evolving dataset (1-2% of original size) that preserves training-critical signals, reducing model iteration costs by up to 60x while maintaining performance trends.
PSAD: A New Framework for Efficient Personalized Reranking in Recommender Systems
Researchers propose PSAD, a novel reranking framework using semi-autoregressive generation and online knowledge distillation to balance ranking quality with low-latency inference. It addresses key deployment challenges for generative reranking models in production systems.
SymTorch Bridges the Gap Between Black Box AI and Human Understanding
Researchers introduce SymTorch, a framework that automatically converts neural network components into interpretable mathematical equations. This symbolic distillation approach could make AI systems more transparent while potentially accelerating inference, with early tests showing 8.3% throughput improvements in language models.
Distilled Agentic Workflow Runs at 100x Lower Inference Cost
A new paper shows agentic workflow distillation achieving 100x lower inference cost, but lacks benchmark details.
NVIDIA Open-Sources Motion Diffusion Model for Humanoid Robots
NVIDIA open-sourced Kimono, a motion diffusion model for humanoid robots, trained on 700 hours of motion capture data. It generates 3D human and robot motions from text prompts, supports keyframe and end-effector control, and runs on Unitree G1.
Modly Desktop App Generates 3D Models from Images, Runs Locally
A developer has launched Modly, a desktop application that creates 3D models from images and processes them entirely on a user's local machine, eliminating cloud dependency.
Why the Best Generative AI Projects Start With the Most Powerful Model —
The article suggests that while initial AI projects leverage the broad capabilities of large foundation models, the most successful implementations eventually transition to smaller, more targeted systems. This reflects a maturation from experimentation to production optimization.
LPM 1.0: 17B-Parameter Diffusion Model Generates 60K-Second AI Avatar Videos
Researchers introduced LPM 1.0, a 17B-parameter real-time diffusion model that generates infinite-length conversational videos with stable identity, achieving over 60,000 seconds of consistent character performance.
Meta's New Training Recipe: Small Models Should Learn from a Single Expert
Meta AI researchers propose a novel training recipe for small language models: instead of learning from many large 'expert' models simultaneously, they should be trained sequentially on one expert at a time. This method, detailed in a new paper, reportedly improves final model performance and training efficiency.
FGR-ColBERT: A New Retrieval Model That Pinpoints Relevant Text Spans Efficiently
A new arXiv paper introduces FGR-ColBERT, a modified ColBERT retrieval model that integrates fine-grained relevance signals distilled from an LLM. It achieves high token-level accuracy while preserving retrieval efficiency, offering a practical alternative to post-retrieval LLM analysis.
Diffusion Recommender Models Fail Reproducibility Test: Study Finds 'Illusion of Progress' in Top-N Recommendation Research
A reproducibility study of nine recent diffusion-based recommender models finds only 25% of reported results are reproducible. Well-tuned simpler baselines outperform the complex models, revealing a conceptual mismatch and widespread methodological flaws in the field.
Google Announces Gemini 3.1 Flash Live: A New Real-Time AI Model
Google has announced Gemini 3.1 Flash Live, a new model variant focused on real-time, low-latency AI interactions. The announcement came via a developer tweet, indicating a potential push for faster, more responsive AI applications.
CanViT: First Active-Vision Foundation Model Hits 45.9% mIoU on ADE20K with Sequential Glimpses
Researchers introduce CanViT, the first task- and policy-agnostic Active-Vision Foundation Model (AVFM). It achieves 38.5% mIoU on ADE20K segmentation with a single low-resolution glimpse, outperforming prior active models while using 19.5x fewer FLOPs.
The Hidden Cost of Mixture-of-Experts: New Research Reveals Why MoE Models Struggle at Inference
A groundbreaking paper introduces the 'qs inequality,' revealing how Mixture-of-Experts architectures suffer a 'double penalty' during inference that can make them 4.5x slower than dense models. The research shows training efficiency doesn't translate to inference performance, especially with long contexts.