kimi
30 articles about kimi in AI news
NVIDIA Nemotron 3 Ultra: 550B Open-Weight Model Challenges GLM, Kimi
NVIDIA released Nemotron 3 Ultra, a 550B open-weight model claiming near-SOTA performance, competing with GLM-5.1 and Kimi K2.6. No benchmarks yet.
Cerebras Hits 981 Tokens/sec on 1T-Parameter Kimi K2.6, Claims 6.7× GPU Cloud Speedup
Cerebras reported 981 tokens/sec on the 1T-parameter Kimi K2.6 model, a 6.7× speedup over the next GPU cloud, validated by an independent third party.
Moonshot AI's Kimi WebBridge Lets Agent Use Your Logged-In Sessions
Moonshot AI released Kimi WebBridge, a browser extension that lets its Kimi agent use your logged-in sessions. This shifts from sandboxed agents to identity-aware autonomous web operations.
CoreWeave Tops Kimi K2.6 Inference Speed
CoreWeave tops 10 other providers on speed and price-performance for Moonshot AI's Kimi K2.6 in Artificial Analysis benchmark.
Kimi 2.6 Thinking Shows Promise as Open Weights Model, Lags Behind Closed SoTA
An initial evaluation of Moonshot AI's Kimi 2.6 Thinking model finds it generates extensive reasoning traces but delivers only 'okay-ish' results on creative and coding tasks, highlighting the persistent open vs. closed model gap.
Moonshot AI's Kimi K2.6 Hits 58.6% on SWE-Bench Pro, Leads Open-Source Coding
Moonshot AI released Kimi K2.6, an open-source coding model achieving 58.6% on SWE-Bench Pro and 54.0% on HLE with tools. This positions it as a top-tier open alternative to proprietary models like Claude 3.5 Sonnet.
Stealth 100B Model Appears on OpenRouter, Possibly DeepSeek or Kimi
A new, unannounced 100-billion-parameter AI model has appeared on the OpenRouter API platform. Its origin is unknown, but observers speculate it could be a variant from DeepSeek or an update to Kimi's code model.
Kimi 2.6 Code Model Teased in Leaked Image, Suggesting Moonshot AI Update
A screenshot circulating online appears to show a 'Kimi 2.6' code model interface, suggesting Moonshot AI is preparing an update to its Kimi Chat platform focused on coding tasks.
Alibaba's Qwen3.6-Plus Reportedly Under Half the Size of Kimi K2.5, Nears Claude Opus 4.5 Performance
Alibaba's Tongyi Lab announced Qwen3.6-Plus, a model reportedly under half the size of Moonshot's Kimi K2.5 while approaching Claude Opus 4.5 performance, signaling major efficiency gains in China's LLM race.
Fireworks AI Launches 'Fire Pass' with Kimi K2.5 Turbo at 250 Tokens/Second
Fireworks AI has launched a new 'Fire Pass' subscription offering access to Kimi K2.5 Turbo at speeds up to 250 tokens/second. The service includes a free trial followed by a $7 weekly subscription.
Moonshot AI Launches Kimi Slides: AI Tool Converts Notes into Investor-Ready Presentations
Moonshot AI has launched Kimi Slides, an AI-powered presentation generator that converts unstructured notes into investor-ready slide decks. The tool is positioned as a direct competitor to high-cost freelance presentation designers.
Kimi Launches 'Kimi Slides' AI Presentation Tool, Claims 5-Minute Investor Deck Creation
Moonshot AI's Kimi chatbot has launched a new feature called Kimi Slides that generates investor-ready presentations from messy notes in 5 minutes, positioning itself against professional design services.
Kimi 2.5's 1T Parameter MoE Model Runs on 96GB Mac Hardware via SSD Streaming
Developers have demonstrated that Kimi 2.5's 1 trillion parameter Mixture-of-Experts model can run on Mac hardware with just 96GB RAM by streaming expert weights from SSD, with only 32B parameters active per token.
Step-3.5-Flash: 196B Open-Source MoE Model Activates Only 11B Parameters, Outperforms Kimi K2.5 and Claude Opus 4.5 on Key Benchmarks
Shanghai-based StepFun's Step-3.5-Flash, a 196B parameter sparse mixture-of-experts model that activates only 11B parameters per token, achieves top scores on AIME 2025 (97.3) and LiveCodeBench-V6 (86.4) while costing 18.9x less to run than Kimi K2.5.
Moonshot AI's Kimi Introduces Attention Residuals to Mitigate Deep-Layer Information Loss in LLMs
Moonshot AI's Kimi team proposes Attention Residuals, a novel mechanism replacing standard residual connections. It allows each layer to attend to and selectively retrieve information from any previous layer, improving performance on long-context reasoning tasks.
Kimi's Selective Layer Communication Improves Training Efficiency by ~25% with Minimal Inference Overhead
Kimi has developed a method that replaces uniform residual connections with selective information routing between layers in deep AI models. This improves training stability and achieves ~25% better compute efficiency with negligible inference slowdown.
NVIDIA's Kimi-K2.5 Eagle Head: Supercharging Moonshot's Reasoning with Speculative Decoding
NVIDIA has released the Kimi-K2.5 Eagle head on Hugging Face, implementing Eagle-3 speculative decoding to dramatically accelerate inference for Moonshot's reasoning models. This breakthrough promises blazing-fast performance while maintaining accuracy.
Cursor AI Meets Kimi K2.5: The Rapid Prototyping Revolution in Software Development
The integration of Cursor AI's code editor with Kimi's K2.5 model enables developers to transform simple prompts into functional applications in under a minute, dramatically accelerating the prototyping phase and lowering barriers to software creation.
Kimi's Meteoric Rise: How Moonshot AI's Chatbot Became China's Fastest $10B Unicorn
Moonshot AI's Kimi chatbot generated more revenue in just 20 days than in all of 2025, achieving a $10 billion valuation in just over two years. This explosive growth signals a major shift in China's AI landscape and global AI competition.
Kimi Launches OpenClaw-Powered Workspace: China's Browser-Based AI Revolution
Kimi has unveiled Kimi Claw, a browser-based AI workspace featuring 24/7 operation, 5,000+ community skills, 40GB cloud storage, and native OpenClaw integration. This development represents China's growing influence in accessible, cloud-native AI tools.
Kimi Team's 'Attention Residuals' Replace Fixed Summation with Softmax Attention, Boosts GPQA-Diamond by +7.5%
Researchers propose Attention Residuals, a content-dependent alternative to standard residual connections in Transformers. The method improves scaling laws, matches a baseline trained with 1.25x more compute, and adds under 2% inference overhead.
Moonshot AI, State Bank Launch First AI-Native Credit Card in China
Moonshot AI's Kimi launches world's first AI-native credit card with state-owned bank, converting spending into compute credits.
mlx-vlm v0.5.0 Adds Continuous Batching, Distributed Inference for Apple Silicon
mlx-vlm v0.5.0 adds continuous batching, speculative decoding, and distributed inference for Apple Silicon. The release supports Qwen3.5, Kimi K2.5, Gemma 4 video, and new models with 21 contributors.
Free-Claude-Code Proxy Routes Anthropic API to Free NVIDIA NIM Models
A developer released free-claude-code, a proxy that intercepts Claude Code's API calls and routes them to free NVIDIA NIM endpoints, unlocking free access to models like Kimi K2 and GLM 4.7. This bypasses Anthropic's subscription fees and adds remote execution via a Telegram bot.
DeepSeek V4 Begins Limited Rollout with Fast, Expert, Vision Modes
DeepSeek V4 is reportedly in limited gray-scale testing with a new interface offering Fast, Expert, and Vision modes. This mirrors competitor Kimi's tiered system and suggests a move towards performance-based rate limiting.
Moonshot AI CEO Yang Zhilin Advocates for Attention Residuals in LLM Architecture
Yang Zhilin, founder of Moonshot AI, argues for the architectural value of attention residuals in large language models. This technical perspective comes from the creator of the popular Kimi Chat model.
Alibaba Cloud's $3 Coding Plan Disrupts AI Development Market
Alibaba Cloud has launched a unified coding subscription offering four frontier AI models for just $3, potentially reshaping how developers access and use coding assistants. The plan includes Qwen 3.5-Plus, Kimi K2.5, MiniMax M2.5, and GLM-5 in a single package.
SWE-Explore: AI coding agents find files but miss 81-86% of critical lines
SWE-Explore benchmark shows Claude Code, Codex cover only 14-19% of critical lines despite finding the right file. Model strength doesn't fix the structural weakness.
Chinese Lab's Free MoE Model Matches GPT-5.5 on Agentic Coding
A Chinese lab released an Apache-2.0 open-weights MoE model matching GPT-5.5 on agentic coding. This free model challenges proprietary AI's lead with sparse MoE architecture.
Cursor's Composer 2.5 matches Opus 4.7, GPT-5.5 at fraction of cost
Cursor's Composer 2.5 scores 79.8% on SWE-Bench Multilingual at $0.50/M tokens, matching Opus 4.7 and GPT-5.5 at 30x lower cost.