model distillation

30 articles about model distillation in AI news

Apple Reportedly Gains Full Internal Access to Google's Gemini for On-Device Model Distillation

A report claims Apple's AI deal with Google includes full internal model access, enabling distillation of Gemini's reasoning into smaller, on-device models. This would allow Apple to build specialized, efficient AI without relying solely on cloud APIs.

Mar 25, 202695% relevant

OpenAI, Anthropic, Google Form Alliance to Block Chinese Model Distillation

OpenAI, Anthropic, and Google are collaborating through the Frontier Model Forum to share intelligence and prevent Chinese firms from distilling their advanced AI models. This formalizes defensive measures in the US-China AI race.

Apr 7, 202685% relevant

Anthropic's Distillation Allegations Reveal AI's Uncharted Legal Frontier

Anthropic's claims that Chinese AI firms used thousands of fake accounts to extract capabilities from Claude models highlight the legal grey area of model distillation. The incident coincides with Anthropic relaxing its safety policies amid Pentagon pressure.

Feb 24, 202675% relevant

Subliminal Transfer Study Shows AI Agents Inherit Unsafe Behaviors Despite

New research demonstrates unsafe behavioral traits in AI agents can transfer subliminally through model distillation, with students inheriting deletion biases despite rigorous keyword filtering. This exposes a critical security flaw in agent training pipelines.

Apr 20, 2026100% relevant

The AI Espionage Era: How Chinese Firms Launched Industrial-Scale Attacks on Claude

Anthropic reveals three massive AI model distillation campaigns by Chinese competitors who used 24,000 fake accounts to extract Claude's capabilities through 16 million exchanges. This industrial-scale intellectual property theft highlights growing tensions in the global AI race.

Feb 24, 202685% relevant

The AI Espionage Frontier: Anthropic Exposes Systematic Claude Data Extraction by Chinese AI Labs

Anthropic has revealed that Chinese AI companies DeepSeek, Moonshot, and MiniMax allegedly used 24,000 fake accounts to execute 16 million queries against Claude's API, systematically extracting its capabilities through model distillation techniques. This sophisticated operation bypassed access restrictions and targeted Claude's reasoning, programming, and tool usage functions.

Feb 23, 202680% relevant

Aligning Language Models from User Interactions: A Self-Distillation Method for Continuous Learning

Researchers propose a method to align LLMs using raw, multi-turn user conversations. By applying self-distillation on follow-up messages, models improve without explicit feedback, enabling personalization and continual adaptation from deployment data.

Mar 16, 202677% relevant

RLSD Unifies Self-Distillation & Verifiable Rewards to Fix RL Leakage

Researchers propose RLSD, a method merging on-policy self-distillation with verifiable rewards to fix information leakage and training instability in language model reinforcement learning.

Apr 6, 202685% relevant

Zero-Shot Cross-Domain Knowledge Distillation: A YouTube-to-Music Case Study

Google researchers detail a case study transferring knowledge from YouTube's massive video recommender to a smaller music app, using zero-shot cross-domain distillation to boost ranking models without training a dedicated teacher. This offers a practical blueprint for improving low-traffic AI systems.

Apr 1, 202696% relevant

New Pipeline Enables Lossless Distillation of Transformer LLMs into Hybrid xLSTM Architectures

Researchers developed a distillation pipeline that transfers transformer LLM knowledge into hybrid xLSTM models. The distilled students match or exceed teacher models like Llama, Qwen, and Olmo on downstream tasks.

Mar 22, 202685% relevant

Tsinghua Researchers Diagnose On-Policy Distillation Failures, Propose Fixes

Researchers from Tsinghua University have pinpointed two necessary conditions for successful on-policy distillation: compatible thinking patterns and novel teacher capabilities. They propose two recovery methods to salvage failing distillation runs.

Apr 15, 202685% relevant

DRKL: Diversity-Aware Reverse KL Divergence Fixes Overconfidence in LLM Distillation

A new paper proposes Diversity-aware Reverse KL (DRKL), a fix for the overconfidence and reduced diversity caused by the popular Reverse KL divergence in LLM distillation. DRKL consistently outperforms existing objectives across multiple benchmarks.

Apr 2, 202680% relevant

Embedding Matching Distills Genomic Models 200x, Matches mRNA-Bench Performance

A new distillation framework transfers mRNA representations from a large genomic foundation model to a specialized model 200x smaller. It uses embedding-level distillation, outperforming logit-based methods and competing with larger models on mRNA-bench.

Apr 13, 202686% relevant

Anthropic Exposes Massive AI Model Theft Operation Targeting Claude

Anthropic has uncovered sophisticated 'distillation' campaigns by Chinese AI firms DeepSeek, Moonshot, and MiniMax, who allegedly used thousands of fraudulent accounts to copy Claude's capabilities. The operation generated over 16 million exchanges to replicate Claude's reasoning and coding strengths.

Feb 23, 202695% relevant

SDAR: Self-Distilled RL Stabilizes Multi-Turn LLM Agents, +9.4% on ALFWorld

SDAR gates self-distillation within GRPO to stabilize multi-turn LLM agent training, yielding +9.4% on ALFWorld and gains on WebShop and Search-QA across Qwen2.5 and Qwen3 models.

May 15, 202685% relevant

Kuaishou's Dual-Rerank: A New Industrial Framework for High-Stakes

Researchers from Kuaishou introduce Dual-Rerank, a framework designed for industrial-scale generative reranking. It addresses the dual dilemma of structural trade-offs (AR vs. NAR models) and optimization gaps (SL vs. RL) through Sequential Knowledge Distillation and List-wise Decoupled Reranking Optimization. A/B tests on production traffic show significant improvements in user satisfaction and watch time with reduced latency.

Apr 10, 202682% relevant

DIET: A New Framework for Continually Distilling Streaming Datasets in Recommender Systems

Researchers propose DIET, a framework for streaming dataset distillation in recommender systems. It maintains a compact, evolving dataset (1-2% of original size) that preserves training-critical signals, reducing model iteration costs by up to 60x while maintaining performance trends.

Mar 27, 202688% relevant

PSAD: A New Framework for Efficient Personalized Reranking in Recommender Systems

Researchers propose PSAD, a novel reranking framework using semi-autoregressive generation and online knowledge distillation to balance ranking quality with low-latency inference. It addresses key deployment challenges for generative reranking models in production systems.

Mar 10, 202685% relevant

SymTorch Bridges the Gap Between Black Box AI and Human Understanding

Researchers introduce SymTorch, a framework that automatically converts neural network components into interpretable mathematical equations. This symbolic distillation approach could make AI systems more transparent while potentially accelerating inference, with early tests showing 8.3% throughput improvements in language models.

Feb 26, 202670% relevant

Distilled Agentic Workflow Runs at 100x Lower Inference Cost

A new paper shows agentic workflow distillation achieving 100x lower inference cost, but lacks benchmark details.

May 22, 202687% relevant

NVIDIA Open-Sources Motion Diffusion Model for Humanoid Robots

NVIDIA open-sourced Kimono, a motion diffusion model for humanoid robots, trained on 700 hours of motion capture data. It generates 3D human and robot motions from text prompts, supports keyframe and end-effector control, and runs on Unitree G1.

Apr 23, 202685% relevant

Modly Desktop App Generates 3D Models from Images, Runs Locally

A developer has launched Modly, a desktop application that creates 3D models from images and processes them entirely on a user's local machine, eliminating cloud dependency.

Apr 20, 202689% relevant

Why the Best Generative AI Projects Start With the Most Powerful Model —

The article suggests that while initial AI projects leverage the broad capabilities of large foundation models, the most successful implementations eventually transition to smaller, more targeted systems. This reflects a maturation from experimentation to production optimization.

Apr 16, 202672% relevant

LPM 1.0: 17B-Parameter Diffusion Model Generates 60K-Second AI Avatar Videos

Researchers introduced LPM 1.0, a 17B-parameter real-time diffusion model that generates infinite-length conversational videos with stable identity, achieving over 60,000 seconds of consistent character performance.

Apr 12, 202695% relevant

Meta's New Training Recipe: Small Models Should Learn from a Single Expert

Meta AI researchers propose a novel training recipe for small language models: instead of learning from many large 'expert' models simultaneously, they should be trained sequentially on one expert at a time. This method, detailed in a new paper, reportedly improves final model performance and training efficiency.

Apr 9, 202685% relevant

FGR-ColBERT: A New Retrieval Model That Pinpoints Relevant Text Spans Efficiently

A new arXiv paper introduces FGR-ColBERT, a modified ColBERT retrieval model that integrates fine-grained relevance signals distilled from an LLM. It achieves high token-level accuracy while preserving retrieval efficiency, offering a practical alternative to post-retrieval LLM analysis.

Apr 2, 202672% relevant

Diffusion Recommender Models Fail Reproducibility Test: Study Finds 'Illusion of Progress' in Top-N Recommendation Research

A reproducibility study of nine recent diffusion-based recommender models finds only 25% of reported results are reproducible. Well-tuned simpler baselines outperform the complex models, revealing a conceptual mismatch and widespread methodological flaws in the field.

Mar 30, 202682% relevant

Google Announces Gemini 3.1 Flash Live: A New Real-Time AI Model

Google has announced Gemini 3.1 Flash Live, a new model variant focused on real-time, low-latency AI interactions. The announcement came via a developer tweet, indicating a potential push for faster, more responsive AI applications.

Mar 26, 202695% relevant

CanViT: First Active-Vision Foundation Model Hits 45.9% mIoU on ADE20K with Sequential Glimpses

Researchers introduce CanViT, the first task- and policy-agnostic Active-Vision Foundation Model (AVFM). It achieves 38.5% mIoU on ADE20K segmentation with a single low-resolution glimpse, outperforming prior active models while using 19.5x fewer FLOPs.

Mar 25, 202691% relevant

The Hidden Cost of Mixture-of-Experts: New Research Reveals Why MoE Models Struggle at Inference

A groundbreaking paper introduces the 'qs inequality,' revealing how Mixture-of-Experts architectures suffer a 'double penalty' during inference that can make them 4.5x slower than dense models. The research shows training efficiency doesn't translate to inference performance, especially with long contexts.

Mar 11, 202675% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety