hugging papers
30 articles about hugging papers in AI news
Recursive Multi-Agent Systems Top Hugging Papers; Eywa Bridges LLMs and Scientific Models
Recursive Multi-Agent Systems leads Hugging Papers with 242 upvotes. Eywa and OneManCompany signal a move from chat-based to structural agent collaboration.
Hugging Face OCRs 27,000 arXiv Papers to Markdown with Open 5B Model
Hugging Face CEO Clement Delangue announced the OCR conversion of 27,000 arXiv papers to Markdown using an open 5B-parameter model and 16 parallel jobs on L40S GPUs. This demonstrates a scalable, open-source pipeline for large-scale academic document processing.
LOCUS-v1: 2.2M US Laws Hit HuggingFace via AI Pipeline
LOCUS-v1, a dataset of 2.2M US laws built via AI pipeline, released on HuggingFace. First comprehensive legal database of its kind, but quality and validation metrics remain undisclosed.
Hugging Face Launches 'Kernels' Hub for GPU Code, Like GitHub for AI Hardware
Hugging Face has launched 'Kernels,' a new section on its Hub for sharing and discovering optimized GPU kernels. This treats performance-critical code as a first-class artifact, similar to AI models.
754B-Parameter AI Model Hits Hugging Face, Weighs 1.51TB
An unidentified 754-billion-parameter AI model has been uploaded to the Hugging Face platform, consuming 1.51TB of space. This represents one of the largest publicly accessible model repositories by size.
Cohere Transcribe: 2B-Parameter Open-Source ASR Model Achieves 5.42% WER, Topping Hugging Face Leaderboard
Cohere released Transcribe, a 2B-parameter open-source speech recognition model. It claims a 5.42% average word error rate, beating OpenAI Whisper v3 and topping the Hugging Face Open ASR Leaderboard.
Black Forest Labs Unleashes FLUX.2 klein: Sub-Second AI Image Generation Hits Hugging Face
Black Forest Labs has released FLUX.2 klein on Hugging Face, delivering state-of-the-art image generation and editing in under a second. The model runs on consumer GPUs with just 13GB VRAM, making high-speed AI art creation dramatically more accessible.
Google Releases Magenta RealTime 2 for Open-Weight Music Generation
Google released Magenta RealTime 2 on Hugging Face, the only open-weights model for real-time continuous music generation on device with ~200ms latency.
30B-A3B Reasoning Model Hits Gold Medal on Physics, Math Olympiads
30B-A3B reasoning model from @stingning achieves gold-medal level on physics and math Olympiads, released on Hugging Face.
Massive Open-Source Dataset of Computer Screen Recordings Released to Train AI Agents
Researchers have released the world's largest open-source dataset of computer-use recordings on Hugging Face. The collection contains 48,478 screen recording videos totaling approximately 12,300 hours of professional software usage, licensed under CC-BY-4.0 for AI training and evaluation.
NVIDIA's Kimi-K2.5 Eagle Head: Supercharging Moonshot's Reasoning with Speculative Decoding
NVIDIA has released the Kimi-K2.5 Eagle head on Hugging Face, implementing Eagle-3 speculative decoding to dramatically accelerate inference for Moonshot's reasoning models. This breakthrough promises blazing-fast performance while maintaining accuracy.
Microsoft's VibeVoice-ASR Shatters Transcription Limits with 60-Minute Single-Pass Processing
Microsoft has released VibeVoice-ASR on Hugging Face, a revolutionary speech recognition model that transcribes 60-minute audio in one pass with speaker diarization, timestamps, and multilingual support across 50+ languages without configuration.
AI Research Breakthroughs: From Video Reasoning to Self-Stopping Models
This week's top AI papers reveal major advances in video understanding, reasoning efficiency, and agent training. Researchers introduced a massive video reasoning dataset, models that know when to stop thinking, and techniques for improving AI agents without full retraining.
Mirage: Microsoft's 10.57x faster video gen skips RGB render loop
Microsoft's Mirage stores 3D scenes as latent tokens, achieving 10.57x faster video generation and 55x less memory, with SOTA WorldScore consistency.
DeepSeek-V4 Hits 500K Context with 90% Less KV Cache via FlashMemory
DeepSeek-V4 achieves 500K context with 90% less KV cache via FlashMemory's lookahead sparse attention, keeping only 13.5% of cache in GPU memory without retraining.
Larger models learn rare skills by forgetting them less, new paper shows
New paper from Stanford, MIT, Harvard, and Anthropic shows larger models learn rare skills because they forget them less during training, tested on OLMo models from 4M to 4B parameters.
dMoE Cuts Active Experts from 69.5 to 14.6, Retains 99.11% Performance
dMoE reduces active experts from 69.5 to 14.6 in diffusion LLMs, retaining 99.11% performance while cutting memory 80% and speeding inference 1.66×.
JetBrains Open-Sources Mellum2: 12B MoE at 2.5B Active Params
JetBrains open-sourced Mellum2, a 12B MoE model with 2.5B active params, trained from scratch for code and reasoning.
SpatialBench: New Benchmark Tests Foundation Models on 3D Tasks
SpatialBench, a new benchmark from ropedia_ai, evaluates spatial foundation models across 7 tasks and 5 datasets, testing depth estimation, surface normal prediction, and 3D object detection.
Microsoft SkillOpt Trains Agent Skills in Text Space, Beats 52/52 Benchmarks
Microsoft's SkillOpt trains agent skills in text space, achieving best or tied-best results in all 52 settings across 6 benchmarks and 7 models.
Alibaba + Nanjing Univ Claim 9.36X Faster Million-Token Prefill vs FlashAttention-2
Alibaba + Nanjing Univ claim 9.36X faster million-token prefill vs FlashAttention-2, targeting the key bottleneck in long-context LLM inference.
ByteDance Lance 3B MoE Beats 7B Models on Multimodal Benchmarks
ByteDance released Lance, a 3B multimodal MoE model that beats 7B+ models on benchmarks through multi-task synergy and specialized pathways.
Hybrid A*+RL Agent Beats Pure End-to-End in Unity SR-71 Sim
A hybrid A* + deep RL agent in Unity, trained over 5M PPO steps, switches between classical path planning and learned evasion to navigate an SR-71 through a maze while dodging missiles.
SDAR: Self-Distilled RL Stabilizes Multi-Turn LLM Agents, +9.4% on ALFWorld
SDAR gates self-distillation within GRPO to stabilize multi-turn LLM agent training, yielding +9.4% on ALFWorld and gains on WebShop and Search-QA across Qwen2.5 and Qwen3 models.
Prithvi-EO Fails Cross-Country Crop Yield Generalization, Paper Shows
Prithvi-EO and ViT-Base embeddings yield universally negative R² under cross-country maize yield prediction, failing to beat traditional spectral features due to yield distribution shift.
Microsoft Paper Probes Long-Horizon Agent Generalization Gap
Microsoft Research paper on long-horizon agent generalization identifies failure modes and proposes improvements for extended tasks.
AllenAI's MolmoAct2: 720-Hour Bimanual Dataset, Beats GPT-5 on Robotics
AllenAI released MolmoAct2, an open robotics model with a 720-hour bimanual dataset, beating GPT-5 and Gemini Robotics on success rate (89.4% vs 82.1%) with 40% lower latency.
Ctx2Skill: Self-Play Framework Lets LMs Discover Skills Without Labels
Ctx2Skill discovers skills from context via multi-agent self-play without labels. Outputs plug into any LM, targeting manual prompt engineering bottlenecks.
New CASIA Benchmark Exposes Fragmented Face Swapping Evaluation
CASIA researchers released a face swapping survey and benchmark on April 27, 2026, aiming to standardize evaluation across fragmented GAN and diffusion model methods.
ByteDance GenLIP: ViT Predicts Language Tokens Directly with 8B Samples
ByteDance's GenLIP trains ViTs to predict language tokens directly with a single autoregressive objective, outperforming baselines on 8B samples.