gemma
30 articles about gemma in AI news
Google Gemma 4: 3x Faster Inference with MTP Drafters
Google's Gemma 4 claims up to 3x faster inference via MTP drafters, but released no benchmark numbers or architectural details.
Gemma 4 Hits 50M Downloads in Weeks, Google's Fastest Launch
Gemma 4 downloaded 50M+ times in weeks, fastest Google open model launch, outpacing Gemma 3 by ~3x.
Developer Swaps Dash Cam Analysis for Gemma 4 & Falcon Perception
A developer announced they are replacing their entire dash cam video analysis system with Google's Gemma 4 and Falcon Perception models, signaling a practical shift towards newer, specialized multimodal models for real-time edge applications.
Gemma 4 Integrates SAM 3.1 for Subject-Aware Image Masking
A new demo shows Google's Gemma 4 vision-language model using Meta's SAM 3.1 to identify and segment primary subjects in complex scenes, like a child with dogs. This represents a practical integration of specialized vision models into multimodal reasoning workflows.
Unsloth Offers Free Fine-Tuning for Google Gemma 4 via Colab Notebook
Unsloth has released a Colab notebook enabling free fine-tuning of Google's Gemma 4 model. This simplifies the process of customizing a state-of-the-art open-weight LLM using just a browser.
MedGemma 1.5 Technical Report Released, Details Multimodal Medical AI
Google DeepMind has published the technical report for MedGemma 1.5, detailing the architecture and capabilities of its open-source, multimodal medical AI model. This follows the initial Med-PaLM 2 release and represents a significant step in making specialized medical AI more accessible.
Google's Gemma 4B Model Runs on Nintendo Switch at 1.5 Tokens/Second
A developer successfully ran Google's 4-billion parameter Gemma language model on a Nintendo Switch, achieving 1.5 tokens/second inference. This demonstrates the increasing feasibility of running small LLMs on consumer-grade edge hardware.
MLX-LM v0.9.0 Adds Better Batching, Supports Gemma 4 on Apple Silicon
Apple's MLX-LM framework released version 0.9.0 with enhanced server batching and support for Google's Gemma 4 model, improving local LLM inference efficiency on Apple Silicon. This update addresses a key performance bottleneck for developers running models locally on Mac hardware.
Gemma4 + Falcon Perception Enables Vision-Action Agent Pipeline
A developer shared a pipeline where Gemma4 interprets images, Falcon Perception segments objects with metadata, and Gemma4 reasons to call tools. This demonstrates a modular approach to vision-language-action agents.
Ethan Mollick: Gemma 4 Impressive On-Device, But Agentic Workflows Doubted
Wharton professor Ethan Mollick finds Google's Gemma 4 powerful for on-device use but is skeptical about its ability to execute true agentic workflows, citing limitations in judgment and self-correction.
Gemma 4 Integrated into Android Studio for AI-Assisted App Development
Google has integrated its Gemma 4 language model into Android Studio's Agent mode, providing developers with AI-assisted coding features like refactoring and feature development within the official Android IDE.
Gemma 4 Ported to MLX-Swift, Runs Locally on Apple Silicon
Google's Gemma 4 language model has been ported to the MLX-Swift framework by a community developer, making it available for local inference on Apple Silicon Macs and iOS devices through the LocallyAI app.
Gemma 4 26B A4B Hits 45.7 tokens/sec Decode Speed on MacBook Air via MLX Community
A community benchmark shows the Gemma 4 26B A4B model running at 45.7 tokens/sec decode speed on a MacBook Air using the MLX framework. This highlights rapid progress in efficient local deployment of mid-size language models on consumer Apple Silicon.
Google Launches Fully Open-Source Gemma 4 AI Models Under Apache 2.0 License
Google has released Gemma 4, a new family of open-source AI models available under the permissive Apache 2.0 license. The models are designed to run locally on various devices including servers, phones, and Raspberry Pi, marking Google's renewed commitment to the open-source AI ecosystem.
Google Releases Fully Open-Source Gemma 4 AI Model for Local Device Deployment
Google has launched Gemma 4, a fully open-source AI model family available under the Apache 2.0 license. The release marks Google's re-entry into the competitive open-source AI landscape with models optimized for local deployment, including on mobile devices.
Atomic Chat Launches Hermes Agent: A Free, Local Agent Stack Powered by Gemma 4
Atomic Chat has launched Hermes Agent, an open-source agent stack powered by Google's Gemma 4 model that runs entirely locally and is free to use. This makes advanced AI agent functionality accessible without cloud dependencies or API costs.
Google's Gemma4 Models Lead in Small-Scale Open LLM Performance, According to Developer Analysis
Independent developer analysis indicates Google's Gemma4 models are currently the top-performing open-source small language models, with a significant lead in model behavior over alternatives.
Gemma 4 Demonstrates Self-Terminating Loop Detection in Code Execution, User Reports
A developer shared an observation that Google's Gemma 4 model recognized it was stuck in an infinite loop during a coding task and stopped itself. This represents a potential advance in AI's ability to monitor and control its own execution state.
Google Releases Gemma 4 Family Under Apache 2.0, Featuring 2B to 31B Models with MoE and Multimodal Capabilities
Google has released the Gemma 4 family of open-weight models, derived from Gemini 3 technology. The four models, ranging from 2B to 31B parameters and including a Mixture-of-Experts variant, are available under a permissive Apache 2.0 license and feature multimodal processing.
Google Gemma 4 Model Reportedly in Testing, Signaling Next-Gen Open-Weight LLM Release
A developer reports that Google's Gemma 4 model is 'incoming' and currently being tested. This suggests the next iteration of Google's open-weight language model family is nearing release.
ReXInTheWild Benchmark Reveals VLMs Struggle with Medical Photos: Gemini-3 Leads at 78%, MedGemma Trails at 37%
Researchers introduced ReXInTheWild, a benchmark of 955 clinician-verified questions based on 484 real medical photographs. Leading multimodal models show wide performance gaps, with Gemini-3 scoring 78% accuracy while the specialized MedGemma model achieved only 37%.
MiRA Framework Boosts Gemma3-12B to 43% Success Rate on WebArena-Lite, Surpassing GPT-4 and WebRL
Researchers propose MiRA, a milestone-based RL framework that improves long-horizon planning in LLM agents. It boosts Gemma3-12B's web navigation success from 6.4% to 43%, outperforming GPT-4-Turbo (17.6%) and the previous SOTA WebRL (38.4%).
Fine-Tuning Gemma 3 1B-IT for Financial Reasoning with QLoRA
A technical guide details using QLoRA and reasoning-augmented data to fine-tune Google's Gemma 3 1B-IT model for financial analysis. This demonstrates a method to specialize small language models for complex, domain-specific tasks.
Google's Gemma 4 Emerges: The Next Generation of Open AI Models
Google has announced the upcoming release of Gemma 4, the next iteration of its open-source AI model family. This development signals Google's continued commitment to accessible AI technology and intensified competition in the open model space.
Atomic Chat's TurboQuant Enables Gemma 4 Local Inference on 16GB MacBook Air
Atomic Chat's new TurboQuant algorithm aggressively compresses the KV cache, allowing models requiring 32GB+ RAM to run on 16GB MacBook Airs at 25 tokens/sec, advancing local AI deployment.
SAEs Predict Agent Tool Failures Before Execution, Paper Shows
SAE-based probes predict agent tool failures before execution, tested on GPT-OSS and Gemma 3. Adds internal observability missing from current external methods.
mlx-vlm v0.5.0 Adds Continuous Batching, Distributed Inference for Apple Silicon
mlx-vlm v0.5.0 adds continuous batching, speculative decoding, and distributed inference for Apple Silicon. The release supports Qwen3.5, Kimi K2.5, Gemma 4 video, and new models with 21 contributors.
MLX-VLM Adds Continuous Batching, OpenAI API, and Vision Cache for Apple Silicon
The next release of MLX-VLM will introduce continuous batching, an OpenAI-compatible API, and vision feature caching for multimodal models running locally on Apple Silicon. These optimizations promise up to 228x speedups on cache hits for models like Gemma4.
Google Launches AI Edge Eloquent: Free, Offline-First Dictation App on iOS
Google has quietly launched AI Edge Eloquent, a free, subscription-less dictation app for iOS. It uses a Gemma-based speech recognition model to process audio locally, removing filler words and self-corrections to produce cleaner text.
Google's AI Edge Gallery Arrives on iPhone: A Privacy-First Revolution in On-Device Intelligence
Google AI Edge Gallery has launched on iOS, bringing true on-device function calling to iPhones for the first time. Powered by the compact 270M parameter FunctionGemma model, it enables natural voice commands to trigger phone actions like calendar events and flashlight toggles—completely offline.