hugging papers

30 articles about hugging papers in AI news

Recursive Multi-Agent Systems Top Hugging Papers; Eywa Bridges LLMs and Scientific Models

Recursive Multi-Agent Systems leads Hugging Papers with 242 upvotes. Eywa and OneManCompany signal a move from chat-based to structural agent collaboration.

May 3, 202689% relevant

Hugging Face OCRs 27,000 arXiv Papers to Markdown with Open 5B Model

Hugging Face CEO Clement Delangue announced the OCR conversion of 27,000 arXiv papers to Markdown using an open 5B-parameter model and 16 parallel jobs on L40S GPUs. This demonstrates a scalable, open-source pipeline for large-scale academic document processing.

Apr 14, 202685% relevant

LOCUS-v1: 2.2M US Laws Hit HuggingFace via AI Pipeline

LOCUS-v1, a dataset of 2.2M US laws built via AI pipeline, released on HuggingFace. First comprehensive legal database of its kind, but quality and validation metrics remain undisclosed.

Jun 21, 202685% relevant

Hugging Face Launches 'Kernels' Hub for GPU Code, Like GitHub for AI Hardware

Hugging Face has launched 'Kernels,' a new section on its Hub for sharing and discovering optimized GPU kernels. This treats performance-critical code as a first-class artifact, similar to AI models.

Apr 14, 202685% relevant

754B-Parameter AI Model Hits Hugging Face, Weighs 1.51TB

An unidentified 754-billion-parameter AI model has been uploaded to the Hugging Face platform, consuming 1.51TB of space. This represents one of the largest publicly accessible model repositories by size.

Apr 7, 202685% relevant

Cohere Transcribe: 2B-Parameter Open-Source ASR Model Achieves 5.42% WER, Topping Hugging Face Leaderboard

Cohere released Transcribe, a 2B-parameter open-source speech recognition model. It claims a 5.42% average word error rate, beating OpenAI Whisper v3 and topping the Hugging Face Open ASR Leaderboard.

Mar 27, 202695% relevant

Black Forest Labs Unleashes FLUX.2 klein: Sub-Second AI Image Generation Hits Hugging Face

Black Forest Labs has released FLUX.2 klein on Hugging Face, delivering state-of-the-art image generation and editing in under a second. The model runs on consumer GPUs with just 13GB VRAM, making high-speed AI art creation dramatically more accessible.

Mar 14, 202685% relevant

Google Releases Magenta RealTime 2 for Open-Weight Music Generation

Google released Magenta RealTime 2 on Hugging Face, the only open-weights model for real-time continuous music generation on device with ~200ms latency.

Jun 3, 202685% relevant

30B-A3B Reasoning Model Hits Gold Medal on Physics, Math Olympiads

30B-A3B reasoning model from @stingning achieves gold-medal level on physics and math Olympiads, released on Hugging Face.

May 16, 202687% relevant

Massive Open-Source Dataset of Computer Screen Recordings Released to Train AI Agents

Researchers have released the world's largest open-source dataset of computer-use recordings on Hugging Face. The collection contains 48,478 screen recording videos totaling approximately 12,300 hours of professional software usage, licensed under CC-BY-4.0 for AI training and evaluation.

Mar 14, 202697% relevant

NVIDIA's Kimi-K2.5 Eagle Head: Supercharging Moonshot's Reasoning with Speculative Decoding

NVIDIA has released the Kimi-K2.5 Eagle head on Hugging Face, implementing Eagle-3 speculative decoding to dramatically accelerate inference for Moonshot's reasoning models. This breakthrough promises blazing-fast performance while maintaining accuracy.

Mar 12, 202689% relevant

Microsoft's VibeVoice-ASR Shatters Transcription Limits with 60-Minute Single-Pass Processing

Microsoft has released VibeVoice-ASR on Hugging Face, a revolutionary speech recognition model that transcribes 60-minute audio in one pass with speaker diarization, timestamps, and multilingual support across 50+ languages without configuration.

Mar 2, 202685% relevant

AI Research Breakthroughs: From Video Reasoning to Self-Stopping Models

This week's top AI papers reveal major advances in video understanding, reasoning efficiency, and agent training. Researchers introduced a massive video reasoning dataset, models that know when to stop thinking, and techniques for improving AI agents without full retraining.

Mar 1, 202695% relevant

Mirage: Microsoft's 10.57x faster video gen skips RGB render loop

Microsoft's Mirage stores 3D scenes as latent tokens, achieving 10.57x faster video generation and 55x less memory, with SOTA WorldScore consistency.

Jun 9, 202692% relevant

DeepSeek-V4 Hits 500K Context with 90% Less KV Cache via FlashMemory

DeepSeek-V4 achieves 500K context with 90% less KV cache via FlashMemory's lookahead sparse attention, keeping only 13.5% of cache in GPU memory without retraining.

Jun 9, 202698% relevant

Larger models learn rare skills by forgetting them less, new paper shows

New paper from Stanford, MIT, Harvard, and Anthropic shows larger models learn rare skills because they forget them less during training, tested on OLMo models from 4M to 4B parameters.

Jun 8, 202688% relevant

dMoE Cuts Active Experts from 69.5 to 14.6, Retains 99.11% Performance

dMoE reduces active experts from 69.5 to 14.6 in diffusion LLMs, retaining 99.11% performance while cutting memory 80% and speeding inference 1.66×.

Jun 7, 202685% relevant

JetBrains Open-Sources Mellum2: 12B MoE at 2.5B Active Params

JetBrains open-sourced Mellum2, a 12B MoE model with 2.5B active params, trained from scratch for code and reasoning.

Jun 6, 202690% relevant

SpatialBench: New Benchmark Tests Foundation Models on 3D Tasks

SpatialBench, a new benchmark from ropedia_ai, evaluates spatial foundation models across 7 tasks and 5 datasets, testing depth estimation, surface normal prediction, and 3D object detection.

May 27, 202691% relevant

Microsoft SkillOpt Trains Agent Skills in Text Space, Beats 52/52 Benchmarks

Microsoft's SkillOpt trains agent skills in text space, achieving best or tied-best results in all 52 settings across 6 benchmarks and 7 models.

May 25, 202689% relevant

Alibaba + Nanjing Univ Claim 9.36X Faster Million-Token Prefill vs FlashAttention-2

Alibaba + Nanjing Univ claim 9.36X faster million-token prefill vs FlashAttention-2, targeting the key bottleneck in long-context LLM inference.

May 25, 202685% relevant

ByteDance Lance 3B MoE Beats 7B Models on Multimodal Benchmarks

ByteDance released Lance, a 3B multimodal MoE model that beats 7B+ models on benchmarks through multi-task synergy and specialized pathways.

May 19, 202690% relevant

Hybrid A*+RL Agent Beats Pure End-to-End in Unity SR-71 Sim

A hybrid A* + deep RL agent in Unity, trained over 5M PPO steps, switches between classical path planning and learned evasion to navigate an SR-71 through a maze while dodging missiles.

May 16, 202684% relevant

SDAR: Self-Distilled RL Stabilizes Multi-Turn LLM Agents, +9.4% on ALFWorld

SDAR gates self-distillation within GRPO to stabilize multi-turn LLM agent training, yielding +9.4% on ALFWorld and gains on WebShop and Search-QA across Qwen2.5 and Qwen3 models.

May 15, 202685% relevant

Prithvi-EO Fails Cross-Country Crop Yield Generalization, Paper Shows

Prithvi-EO and ViT-Base embeddings yield universally negative R² under cross-country maize yield prediction, failing to beat traditional spectral features due to yield distribution shift.

May 12, 202672% relevant

Microsoft Paper Probes Long-Horizon Agent Generalization Gap

Microsoft Research paper on long-horizon agent generalization identifies failure modes and proposes improvements for extended tasks.

May 6, 202675% relevant

AllenAI's MolmoAct2: 720-Hour Bimanual Dataset, Beats GPT-5 on Robotics

AllenAI released MolmoAct2, an open robotics model with a 720-hour bimanual dataset, beating GPT-5 and Gemini Robotics on success rate (89.4% vs 82.1%) with 40% lower latency.

May 5, 202695% relevant

Ctx2Skill: Self-Play Framework Lets LMs Discover Skills Without Labels

Ctx2Skill discovers skills from context via multi-agent self-play without labels. Outputs plug into any LM, targeting manual prompt engineering bottlenecks.

May 5, 202685% relevant

New CASIA Benchmark Exposes Fragmented Face Swapping Evaluation

CASIA researchers released a face swapping survey and benchmark on April 27, 2026, aiming to standardize evaluation across fragmented GAN and diffusion model methods.

May 5, 202674% relevant

ByteDance GenLIP: ViT Predicts Language Tokens Directly with 8B Samples

ByteDance's GenLIP trains ViTs to predict language tokens directly with a single autoregressive objective, outperforming baselines on 8B samples.

May 4, 202685% relevant

Explore More

AI Agents Large Language Models Claude Code OpenAI RAG MCP Fine-tuning Benchmarks Open Source AI AI Safety