hugging face
30 articles about hugging face in AI news
754B-Parameter AI Model Hits Hugging Face, Weighs 1.51TB
An unidentified 754-billion-parameter AI model has been uploaded to the Hugging Face platform, consuming 1.51TB of space. This represents one of the largest publicly accessible model repositories by size.
Cohere Transcribe: 2B-Parameter Open-Source ASR Model Achieves 5.42% WER, Topping Hugging Face Leaderboard
Cohere released Transcribe, a 2B-parameter open-source speech recognition model. It claims a 5.42% average word error rate, beating OpenAI Whisper v3 and topping the Hugging Face Open ASR Leaderboard.
Baidu's Qianfan-OCR End-to-End Document Intelligence Model Released on Hugging Face
Baidu has released Qianfan-OCR, an end-to-end document intelligence model, on Hugging Face. The model appears to be a unified framework for optical character recognition and document understanding tasks.
Mistral Deletes Magistral, Pixtral, and Devst Models from Hugging Face Hub
Mistral AI has removed three of its models—Magistral (reasoning), Pixtral (multimodal), and Devst—from the Hugging Face Hub. The deletions, confirmed via the platform's commit history, were unannounced, leaving developers to speculate about the company's strategy.
Black Forest Labs Unleashes FLUX.2 klein: Sub-Second AI Image Generation Hits Hugging Face
Black Forest Labs has released FLUX.2 klein on Hugging Face, delivering state-of-the-art image generation and editing in under a second. The model runs on consumer GPUs with just 13GB VRAM, making high-speed AI art creation dramatically more accessible.
NVIDIA Releases NVPanoptix-3D on Hugging Face: Single-Image 3D Indoor Scene Reconstruction
NVIDIA has open-sourced NVPanoptix-3D, a model that reconstructs complete 3D indoor scenes—including panoptic segmentation, depth, and geometry—from a single RGB image in one forward pass.
NVIDIA Releases Brain MRI Generation Model on Hugging Face: 3D Latent Diffusion for T1, FLAIR, T2, and SWI Scans
NVIDIA has open-sourced a 3D latent diffusion model for generating high-resolution brain MRI scans across four modalities. The model claims state-of-the-art FID scores and 33× faster inference than prior methods.
Alibaba's Qwen3.5-Omni Launches with Script-Level Captioning, Audio-Visual Vibe Coding, and Real-Time Web Search
Alibaba's Qwen team has released Qwen3.5-Omni, a multimodal model focused on interpreting images, audio, and video with new capabilities like script-level captioning and 'vibe coding'. It's open-access on Hugging Face but does not generate media.
Massive Open-Source Dataset of Computer Screen Recordings Released to Train AI Agents
Researchers have released the world's largest open-source dataset of computer-use recordings on Hugging Face. The collection contains 48,478 screen recording videos totaling approximately 12,300 hours of professional software usage, licensed under CC-BY-4.0 for AI training and evaluation.
NVIDIA's Kimi-K2.5 Eagle Head: Supercharging Moonshot's Reasoning with Speculative Decoding
NVIDIA has released the Kimi-K2.5 Eagle head on Hugging Face, implementing Eagle-3 speculative decoding to dramatically accelerate inference for Moonshot's reasoning models. This breakthrough promises blazing-fast performance while maintaining accuracy.
Microsoft's VibeVoice-ASR Shatters Transcription Limits with 60-Minute Single-Pass Processing
Microsoft has released VibeVoice-ASR on Hugging Face, a revolutionary speech recognition model that transcribes 60-minute audio in one pass with speaker diarization, timestamps, and multilingual support across 50+ languages without configuration.
HuggingFace Launches Daily Papers SKILL.md for AI Agents to Read, Search, and Fetch Research Papers
HuggingFace released Daily Papers SKILL.md, a tool enabling AI agents to read paper content as markdown, search papers, find linked models/datasets, and fetch papers via API.
How AI Agents Are Learning to Scrape the Web and Fine-Tune Models in One Go
A developer has integrated web scraping capabilities into HuggingFace's fine-tuning skill, enabling AI agents to collect data from protected platforms and automatically train custom models. This breakthrough addresses a major bottleneck in AI development workflows.
Open-Source Code Editor 'Cline' Integrates Claude Opus, GPT-4, and Gemini Pro via Single API
Developer Hasan Tohar announced 'Cline', an open-source code editor that integrates multiple top-tier AI models through a unified interface. The tool allows switching between Claude Opus, GPT-4, and Gemini Pro without managing separate API keys or subscriptions.
Seed1.8 Model Card Released: A 1.8B Parameter Foundation Model for Generalized Real-World AI Agents
Researchers have introduced Seed1.8, a 1.8 billion parameter foundation model designed for generalized real-world agency. It maintains strong LLM and vision-language capabilities while adding unified interfaces for search, code execution, and GUI interaction.
LlamaFactory Enables No-Code Fine-Tuning for 100+ LLMs Including Llama 4, Qwen, and DeepSeek
The LlamaFactory project eliminates traditional fine-tuning complexity with a drag-and-click interface, supporting over 100 models. This reduces setup from hours of boilerplate code and CUDA debugging to a visual workflow.
Open-Source Web UI 'LLM Studio' Enables Local Fine-Tuning of 500+ Models, Including GGUF and Multimodal
LLM Studio, a free and open-source web interface, allows users to fine-tune over 500 large language models locally on their own hardware. It supports GGUF-quantized models, vision, audio, and embedding models across Mac, Windows, and Linux.
OpenBMB Launches VoxCPM 2, an Open-Source TTS Model Rivaling Qwen3-TTS
OpenBMB has launched VoxCPM 2, an open-source text-to-speech AI model from China. The release is positioned as a direct competitor to Alibaba's Qwen3-TTS, expanding the open-source TTS landscape.
Tiny 9M Parameter LLM Tutorial Runs on Colab, Demystifies Transformer Training
A developer shared a complete tutorial for training a ~9M parameter transformer language model from scratch, including tokenizer, training, and inference, all runnable on Google Colab in minutes.
Efficient Universal Perception Encoder (EUPE) Family Challenges DINOv2
Researchers introduced the Efficient Universal Perception Encoder (EUPE), a family of compact vision models that achieve performance rivaling the larger DINOv2. This could enable high-quality visual understanding on resource-constrained devices.
Stanford Releases Free LLM & Transformer Cheatsheets Covering LoRA, RAG, MoE
Stanford University has released a free, open-source collection of cheatsheets covering core LLM concepts from self-attention to RAG and LoRA. This provides a consolidated technical reference for engineers and researchers.
GPT4All Hits 77K GitHub Stars, Adds DeepSeek R1 for Free Local AI
The GPT4All project has surpassed 77,000 GitHub stars as it adds support for distilled DeepSeek R1 models, enabling reasoning-capable AI to run locally on consumer CPUs with zero API costs.
daVinci-LLM 3B Model Matches 7B Performance, Fully Open-Sourced
The daVinci-LLM team has open-sourced a 3 billion parameter model trained on 8 trillion tokens. Its performance matches typical 7B models, challenging the scaling law focus on parameter count.
DeepSeek's HISA: Hierarchical Sparse Attention Cuts 64K Context Indexing Cost
DeepSeek researchers introduced HISA, a hierarchical sparse attention method that replaces flat token scanning. It removes a computational bottleneck at 64K context lengths without requiring any model retraining.
Gemma 4 Ported to MLX-Swift, Runs Locally on Apple Silicon
Google's Gemma 4 language model has been ported to the MLX-Swift framework by a community developer, making it available for local inference on Apple Silicon Macs and iOS devices through the LocallyAI app.
mlx-vlm v0.4.4 Launches with Falcon-Perception 300M, TurboQuant Metal Kernels & 1.9x Decode Speedup
The mlx-vlm library v0.4.4 adds support for TII's Falcon-Perception 300M vision model and introduces TurboQuant Metal kernels, achieving up to 1.9x faster decoding with 89% KV cache savings on Apple Silicon.
VMLOps Launches Free 230+ Lesson AI Engineering Course with Production-Ready Tool Portfolio
VMLOps has launched a free, hands-on AI engineering course spanning 20 phases and 230+ lessons. It uniquely culminates in students building a portfolio of usable tools, agents, and MCP servers, not just theoretical knowledge.
Open-Source AI Assistant Runs Locally on MacBook Air M4 with 16GB RAM, No API Keys Required
A developer showcased a complete AI assistant running entirely on a MacBook Air M4 with 16GB RAM, using open-source models with no cloud API calls. This demonstrates the feasibility of capable local AI on consumer-grade Apple Silicon hardware.
Generative World Renderer: 4M+ RGB/G-Buffer Frames from Cyberpunk 2077 & Black Myth: Wukong Released for Inverse Graphics
A new framework and dataset extracts over 4 million synchronized RGB and G-buffer frames from Cyberpunk 2077 and Black Myth: Wukong, enabling AI models to learn inverse material decomposition and controllable game environment editing.
New Research: Fine-Tuned LLMs Outperform GPT-5 for Probabilistic Supply Chain Forecasting
Researchers introduced an end-to-end framework that fine-tunes large language models (LLMs) to produce calibrated probabilistic forecasts of supply chain disruptions. The model, trained on realized outcomes, significantly outperforms strong baselines like GPT-5 on accuracy, calibration, and precision. This suggests a pathway for creating domain-specific forecasting models that generate actionable, decision-ready signals.