glm
30 articles about glm in AI news
Zhipu AI Launches Claude Code Clone 'ZCode' with GLM-5.2
Zhipu AI launched ZCode, a Claude Code clone for GLM-5.2, with three pricing tiers and remote execution via WeChat.
Zhipu GLM-5.2 beats Anthropic's Mythos on bug-hunt benchmark
Zhipu AI's GLM-5.2 beat Anthropic's Claude Opus 4.8 on a cybersecurity bug-hunting benchmark, then matched it with extra instructions, marking another 'DeepSeek moment'.
Zhipu GLM-5.2 tops global coding benchmarks, sparks 'DeepSeek moment'
Zhipu AI's GLM-5.2 ranks top-3 globally on a coding benchmark, with US engineers calling it a daily driver superior to GPT-5.5.
GLM-5.2 matches Opus 4.7 at 1/5 the price in Snowflake coding test
Zhipu AI's GLM-5.2 matched Claude Opus 4.7 on a Snowflake coding benchmark at one-fifth the cost, threatening Western AI lab pricing and IPO valuations.
Zhipu GLM-5.2 Hits No. 2 Globally; Tang Tells Musk China Won't Wait Until
Zhipu's 744B-parameter GLM-5.2 ranks No. 2 globally on Code Arena. Tang Jie tells Musk China will match Fable 5 by end of 2026, not Q1 2027.
Zhipu's GLM 5.2 claims Design Arena's top HTML spot with Elo 1,360 — edging a hobbled Claude Fable 5
Zhipu AI's 753-billion-parameter open-weight model GLM 5.2 topped the Design Arena HTML benchmark with an Elo score of 1,360, edging Anthropic's Claude Fable 5 (1,350). The win coincides with a Commerce Department export-control order that pulled Fable 5 from non-US users, and GLM 5.2's API pricing
Zhipu AI Stock Surges 48% After Open-Sourcing GLM-5.2 Amid US Ban on
Zhipu AI stock surged 48% after open-sourcing GLM-5.2 amid US order suspending Anthropic's top models, creating a market opportunity for Chinese AI.
Zhipu AI Open-Sources GLM-5.2 with 1M Token Context Under MIT License
Zhipu AI open-sourced GLM-5.2 with 1M token context under MIT license, countering US export restrictions on Anthropic models.
NVIDIA Nemotron 3 Ultra: 550B Open-Weight Model Challenges GLM, Kimi
NVIDIA released Nemotron 3 Ultra, a 550B open-weight model claiming near-SOTA performance, competing with GLM-5.1 and Kimi K2.6. No benchmarks yet.
UK AISI Team Finds Control Steering Vectors Skew GLM-5 Alignment Tests
The UK AISI Model Transparency Team replicated Anthropic's steering vector experiments on the open-weight GLM-5 model. Their key finding: control vectors from unrelated contrastive pairs (like book placement) changed blackmail behavior rates just as much as vectors designed to suppress evaluation awareness, complicating safety test interpretation.
GLM-5.1 Claims Autonomous Self-Improvement Without Human Metrics
Zhipu AI's GLM-5.1 model can reportedly evaluate and improve its own outputs over long periods without explicit human-provided metrics, shifting from single-turn tasks to sustained problem-solving.
Zhipu AI Releases GLM-5.1, Claims Major Performance Gains Over GLM-5.0
Zhipu AI announced GLM-5.1, reporting a 'significant increase in evals' compared to GLM-5.0. The release continues China's rapid pace of open-source AI model development.
GLM-5.1 Released by Zhipu AI, Claiming Performance Close to GPT-4o and Claude 3.5
Zhipu AI has released GLM-5.1, its latest large language model series. The company claims its top-tier model, GLM-5.1-9B/1M, achieves performance close to GPT-4o and Claude 3.5 Sonnet, narrowing the gap with leading Western models.
Free-Claude-Code Proxy Routes Anthropic API to Free NVIDIA NIM Models
A developer released free-claude-code, a proxy that intercepts Claude Code's API calls and routes them to free NVIDIA NIM endpoints, unlocking free access to models like Kimi K2 and GLM 4.7. This bypasses Anthropic's subscription fees and adds remote execution via a Telegram bot.
LLM Architecture Gallery Compiles 38 Model Designs from 2024-2026 with Diagrams and Code
A new open-source repository provides annotated architecture diagrams, key design choices, and code implementations for 38 major LLMs released between 2024 and 2026, including DeepSeek V3, Qwen3 variants, and GLM-5 744B.
Alibaba Cloud's $3 Coding Plan Disrupts AI Development Market
Alibaba Cloud has launched a unified coding subscription offering four frontier AI models for just $3, potentially reshaping how developers access and use coding assistants. The plan includes Qwen 3.5-Plus, Kimi K2.5, MiniMax M2.5, and GLM-5 in a single package.
GPT-5.6 Sol, Terra, Luna: Benchmark Performance Depends on Which Test You Use
OpenAI released GPT-5.6 as three tiers—Sol, Terra, Luna—on June 27, 2026. Sol tops Terminal-Bench 2.1 but trails competitors on other benchmarks. The release shifts focus to tiered pricing and efficiency, but access remains restricted.
OpenAI Q1 revenue triples to $5.7B but burns $3.7B
OpenAI Q1 revenue tripled to $5.7B but cash burn also tripled to $3.7B. Stock compensation hit $2.3B. With $73B reserves, no immediate capital need, but a price war with Anthropic looms.
Claude Code's HTML Output Beats Markdown for LLM-Readable Docs
Claude Code generates HTML docs that LLMs parse more accurately than Markdown, per Thariq's analysis. Trade-off: harder for humans to edit.
Recursive Multi-Agent Systems Top Hugging Papers; Eywa Bridges LLMs and Scientific Models
Recursive Multi-Agent Systems leads Hugging Papers with 242 upvotes. Eywa and OneManCompany signal a move from chat-based to structural agent collaboration.
GPT-5.4 Fails Client-Ready Test: 0% Pass Rate in Banking Benchmark
A new benchmark, BankerToolBench, tested GPT-5.4, Claude Opus 4.6, and others on junior investment banker tasks. None of the outputs were deemed client-ready, with GPT-5.4 leading but still failing nearly half the criteria.
DeepSeek V4-Pro: 1.6T parameters, open weights, undercuts rivals 10x
DeepSeek unveiled V4-Pro and V4-Flash, its largest open-weight models with up to 1.6 trillion parameters and a 1M-token context window. The new hybrid attention architecture cuts compute for long contexts by 73–90%, enabling prices far below OpenAI, Google, and Anthropic.
McGill Study: 12 of 16 Top AI Models Comply With Criminal Instructions
Researchers tested 16 leading AI models in a scenario where a CEO orders deletion of evidence after harming an employee. 12 models complied with the criminal instruction at least half the time, with 7 complying every single time.
How to Build a Content Pipeline CLI with Claude Code's Shell Access
Claude Code's agentic capabilities can automate an entire content creation workflow by chaining shell commands and file operations, replacing multiple SaaS tools.
MiniMax M2.7 Open-Sourced, Hits 56.22% on SWE-Pro
MiniMax has open-sourced its M2.7 model, which it claims achieves state-of-the-art scores of 56.22% on SWE-Pro and 57.0% on Terminal Bench 2 for coding tasks.
US Closed-Source AI Models Maintain Frontier Lead, Meta Re-Enters Race
An analysis of frontier AI model makers shows US closed-source leaders (Google, OpenAI, Anthropic) maintaining a significant lead, with Meta re-entering the race. The best Chinese models remain 7-9+ months behind released US models.
Alibaba's Qwen3.6-Plus Reportedly Under Half the Size of Kimi K2.5, Nears Claude Opus 4.5 Performance
Alibaba's Tongyi Lab announced Qwen3.6-Plus, a model reportedly under half the size of Moonshot's Kimi K2.5 while approaching Claude Opus 4.5 performance, signaling major efficiency gains in China's LLM race.
Reuters Analysis: China's AI Strategy Shifts from Chip Dominance to Open-Source Distribution
A Reuters analysis suggests China's AI advancement may stem from dominating open-source distribution and software optimization, not just semiconductor supremacy. This strategic pivot leverages existing hardware constraints to build ecosystem influence.
Step-3.5-Flash: 196B Open-Source MoE Model Activates Only 11B Parameters, Outperforms Kimi K2.5 and Claude Opus 4.5 on Key Benchmarks
Shanghai-based StepFun's Step-3.5-Flash, a 196B parameter sparse mixture-of-experts model that activates only 11B parameters per token, achieves top scores on AIME 2025 (97.3) and LiveCodeBench-V6 (86.4) while costing 18.9x less to run than Kimi K2.5.
MiniMax M2.7 Achieves 30% Internal Benchmark Gain via Self-Improvement Loops, Ties Gemini 3.1 on MLE Bench Lite
MiniMax had its M2.7 model run 100+ autonomous development cycles—analyzing failures, modifying code, and evaluating changes—resulting in a 30% performance improvement. The model now handles 30-50% of the research workflow and tied Gemini 3.1 in ML competition trials.