Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

kimi

30 articles about kimi in AI news

NVIDIA Nemotron 3 Ultra: 550B Open-Weight Model Challenges GLM, Kimi

NVIDIA released Nemotron 3 Ultra, a 550B open-weight model claiming near-SOTA performance, competing with GLM-5.1 and Kimi K2.6. No benchmarks yet.

87% relevant

Cerebras Hits 981 Tokens/sec on 1T-Parameter Kimi K2.6, Claims 6.7× GPU Cloud Speedup

Cerebras reported 981 tokens/sec on the 1T-parameter Kimi K2.6 model, a 6.7× speedup over the next GPU cloud, validated by an independent third party.

93% relevant

Moonshot AI's Kimi WebBridge Lets Agent Use Your Logged-In Sessions

Moonshot AI released Kimi WebBridge, a browser extension that lets its Kimi agent use your logged-in sessions. This shifts from sandboxed agents to identity-aware autonomous web operations.

92% relevant

CoreWeave Tops Kimi K2.6 Inference Speed

CoreWeave tops 10 other providers on speed and price-performance for Moonshot AI's Kimi K2.6 in Artificial Analysis benchmark.

81% relevant

Kimi 2.6 Thinking Shows Promise as Open Weights Model, Lags Behind Closed SoTA

An initial evaluation of Moonshot AI's Kimi 2.6 Thinking model finds it generates extensive reasoning traces but delivers only 'okay-ish' results on creative and coding tasks, highlighting the persistent open vs. closed model gap.

100% relevant

Moonshot AI's Kimi K2.6 Hits 58.6% on SWE-Bench Pro, Leads Open-Source Coding

Moonshot AI released Kimi K2.6, an open-source coding model achieving 58.6% on SWE-Bench Pro and 54.0% on HLE with tools. This positions it as a top-tier open alternative to proprietary models like Claude 3.5 Sonnet.

100% relevant

Stealth 100B Model Appears on OpenRouter, Possibly DeepSeek or Kimi

A new, unannounced 100-billion-parameter AI model has appeared on the OpenRouter API platform. Its origin is unknown, but observers speculate it could be a variant from DeepSeek or an update to Kimi's code model.

85% relevant

Kimi 2.6 Code Model Teased in Leaked Image, Suggesting Moonshot AI Update

A screenshot circulating online appears to show a 'Kimi 2.6' code model interface, suggesting Moonshot AI is preparing an update to its Kimi Chat platform focused on coding tasks.

85% relevant

Alibaba's Qwen3.6-Plus Reportedly Under Half the Size of Kimi K2.5, Nears Claude Opus 4.5 Performance

Alibaba's Tongyi Lab announced Qwen3.6-Plus, a model reportedly under half the size of Moonshot's Kimi K2.5 while approaching Claude Opus 4.5 performance, signaling major efficiency gains in China's LLM race.

95% relevant

Fireworks AI Launches 'Fire Pass' with Kimi K2.5 Turbo at 250 Tokens/Second

Fireworks AI has launched a new 'Fire Pass' subscription offering access to Kimi K2.5 Turbo at speeds up to 250 tokens/second. The service includes a free trial followed by a $7 weekly subscription.

85% relevant

Moonshot AI Launches Kimi Slides: AI Tool Converts Notes into Investor-Ready Presentations

Moonshot AI has launched Kimi Slides, an AI-powered presentation generator that converts unstructured notes into investor-ready slide decks. The tool is positioned as a direct competitor to high-cost freelance presentation designers.

85% relevant

Kimi Launches 'Kimi Slides' AI Presentation Tool, Claims 5-Minute Investor Deck Creation

Moonshot AI's Kimi chatbot has launched a new feature called Kimi Slides that generates investor-ready presentations from messy notes in 5 minutes, positioning itself against professional design services.

85% relevant

Kimi 2.5's 1T Parameter MoE Model Runs on 96GB Mac Hardware via SSD Streaming

Developers have demonstrated that Kimi 2.5's 1 trillion parameter Mixture-of-Experts model can run on Mac hardware with just 96GB RAM by streaming expert weights from SSD, with only 32B parameters active per token.

85% relevant

Step-3.5-Flash: 196B Open-Source MoE Model Activates Only 11B Parameters, Outperforms Kimi K2.5 and Claude Opus 4.5 on Key Benchmarks

Shanghai-based StepFun's Step-3.5-Flash, a 196B parameter sparse mixture-of-experts model that activates only 11B parameters per token, achieves top scores on AIME 2025 (97.3) and LiveCodeBench-V6 (86.4) while costing 18.9x less to run than Kimi K2.5.

95% relevant

Moonshot AI's Kimi Introduces Attention Residuals to Mitigate Deep-Layer Information Loss in LLMs

Moonshot AI's Kimi team proposes Attention Residuals, a novel mechanism replacing standard residual connections. It allows each layer to attend to and selectively retrieve information from any previous layer, improving performance on long-context reasoning tasks.

89% relevant

Kimi's Selective Layer Communication Improves Training Efficiency by ~25% with Minimal Inference Overhead

Kimi has developed a method that replaces uniform residual connections with selective information routing between layers in deep AI models. This improves training stability and achieves ~25% better compute efficiency with negligible inference slowdown.

87% relevant

NVIDIA's Kimi-K2.5 Eagle Head: Supercharging Moonshot's Reasoning with Speculative Decoding

NVIDIA has released the Kimi-K2.5 Eagle head on Hugging Face, implementing Eagle-3 speculative decoding to dramatically accelerate inference for Moonshot's reasoning models. This breakthrough promises blazing-fast performance while maintaining accuracy.

89% relevant

Cursor AI Meets Kimi K2.5: The Rapid Prototyping Revolution in Software Development

The integration of Cursor AI's code editor with Kimi's K2.5 model enables developers to transform simple prompts into functional applications in under a minute, dramatically accelerating the prototyping phase and lowering barriers to software creation.

85% relevant

Kimi's Meteoric Rise: How Moonshot AI's Chatbot Became China's Fastest $10B Unicorn

Moonshot AI's Kimi chatbot generated more revenue in just 20 days than in all of 2025, achieving a $10 billion valuation in just over two years. This explosive growth signals a major shift in China's AI landscape and global AI competition.

75% relevant

Kimi Launches OpenClaw-Powered Workspace: China's Browser-Based AI Revolution

Kimi has unveiled Kimi Claw, a browser-based AI workspace featuring 24/7 operation, 5,000+ community skills, 40GB cloud storage, and native OpenClaw integration. This development represents China's growing influence in accessible, cloud-native AI tools.

85% relevant

Kimi Team's 'Attention Residuals' Replace Fixed Summation with Softmax Attention, Boosts GPQA-Diamond by +7.5%

Researchers propose Attention Residuals, a content-dependent alternative to standard residual connections in Transformers. The method improves scaling laws, matches a baseline trained with 1.25x more compute, and adds under 2% inference overhead.

97% relevant

Moonshot AI, State Bank Launch First AI-Native Credit Card in China

Moonshot AI's Kimi launches world's first AI-native credit card with state-owned bank, converting spending into compute credits.

90% relevant

mlx-vlm v0.5.0 Adds Continuous Batching, Distributed Inference for Apple Silicon

mlx-vlm v0.5.0 adds continuous batching, speculative decoding, and distributed inference for Apple Silicon. The release supports Qwen3.5, Kimi K2.5, Gemma 4 video, and new models with 21 contributors.

87% relevant

Free-Claude-Code Proxy Routes Anthropic API to Free NVIDIA NIM Models

A developer released free-claude-code, a proxy that intercepts Claude Code's API calls and routes them to free NVIDIA NIM endpoints, unlocking free access to models like Kimi K2 and GLM 4.7. This bypasses Anthropic's subscription fees and adds remote execution via a Telegram bot.

91% relevant

DeepSeek V4 Begins Limited Rollout with Fast, Expert, Vision Modes

DeepSeek V4 is reportedly in limited gray-scale testing with a new interface offering Fast, Expert, and Vision modes. This mirrors competitor Kimi's tiered system and suggests a move towards performance-based rate limiting.

85% relevant

Moonshot AI CEO Yang Zhilin Advocates for Attention Residuals in LLM Architecture

Yang Zhilin, founder of Moonshot AI, argues for the architectural value of attention residuals in large language models. This technical perspective comes from the creator of the popular Kimi Chat model.

85% relevant

Alibaba Cloud's $3 Coding Plan Disrupts AI Development Market

Alibaba Cloud has launched a unified coding subscription offering four frontier AI models for just $3, potentially reshaping how developers access and use coding assistants. The plan includes Qwen 3.5-Plus, Kimi K2.5, MiniMax M2.5, and GLM-5 in a single package.

85% relevant

SWE-Explore: AI coding agents find files but miss 81-86% of critical lines

SWE-Explore benchmark shows Claude Code, Codex cover only 14-19% of critical lines despite finding the right file. Model strength doesn't fix the structural weakness.

92% relevant

Chinese Lab's Free MoE Model Matches GPT-5.5 on Agentic Coding

A Chinese lab released an Apache-2.0 open-weights MoE model matching GPT-5.5 on agentic coding. This free model challenges proprietary AI's lead with sparse MoE architecture.

100% relevant

Cursor's Composer 2.5 matches Opus 4.7, GPT-5.5 at fraction of cost

Cursor's Composer 2.5 scores 79.8% on SWE-Bench Multilingual at $0.50/M tokens, matching Opus 4.7 and GPT-5.5 at 30x lower cost.

95% relevant