Zhipu AI's GLM-5.2 matched Claude Opus 4.7 on 66% vs 67% of 103 coding tasks in a Snowflake benchmark at one-fifth the output token cost. The Chinese model's price of $4.40 per million output tokens versus Opus's $25 puts direct pressure on Anthropic and OpenAI's flagship coding use case.
Key facts
- GLM-5.2 solved 66% of 103 coding tasks vs Opus 4.7's 67%.
- GLM costs $4.40/M output tokens; Opus costs $25/M.
- First-attempt accuracy: Opus 53.7%, GLM 47.6%.
- GLM used 860M tokens vs Opus's 439M tokens.
- GLM averaged 99 iterations per task vs Opus's 80.
Snowflake CEO Sridhar Ramaswamy published results Monday from a hands-on test comparing GLM-5.2 and Claude Opus 4.7 on 103 coding tasks requiring code that works across both DuckDB and Snowflake. With three attempts per task, the models were nearly tied: 66% solved for GLM-5.2 versus 67% for Opus 4.7 According to The Decoder.
First-attempt accuracy tells a different story. Opus hit 53.7% while GLM managed 47.6%, a 6.1-point gap that reveals GLM's output consistency problem. The Chinese model also burned through 860 million tokens — nearly double Opus's 439 million — and averaged 99 iterations per task versus Opus's 80.
The price gap that matters
The competitive picture flips on cost. GLM-5.2 costs $1.40 per million input tokens and $4.40 per million output tokens, according to Zhipu's official price sheet. Third-party resellers undercut even that. Claude Opus 4.7 runs $5 input and $25 output. GPT-5.5 costs $5 input and $30 output.
GLM's higher token consumption erodes some of that advantage — but not enough. At $4.40/M output, even doubling token usage still lands well below $25/M. For high-volume coding workloads, that arithmetic shifts enterprise procurement decisions.
Where GLM wins and loses
Ramaswamy noted GLM's strength is validating code reliably across both platforms simultaneously. One task only GLM could solve. Its weakness: giving up too early and obsessively checking wrong things. On one task, GLM fired off 411 tool calls over 24 minutes — checking row counts, distributions, null values, column types — and still failed all three attempts. Opus solved the same task with 49 calls in 9 minutes.
"The claim that GLM produces cleaner code didn't hold up," Ramaswamy said. More checks don't lead to more correct results. Still, Snowflake's team is excited about GLM-5.2 and wants to make it available to customers.
The valuation stress test
The real story isn't benchmark scores. It's what happens to Western AI valuations if coding — the flagship enterprise use case for both Anthropic and OpenAI — faces sustained price compression. OpenAI filed IPO paperwork in June 2026 [per prior reporting]. Anthropic targets a 2026 IPO at a $1T+ valuation. Both companies have raised $11.5B+ and $40B+ respectively, with infrastructure commitments tied to those numbers.
If enterprise customers can get 90% of the coding capability at 20% of the cost from a Chinese model, procurement teams will notice. GLM-5.2's token inefficiency matters less when the unit price is that low.
What to watch
Watch for Snowflake's formal GLM-5.2 availability announcement and whether enterprise customers shift coding workloads. Also track Anthropic and OpenAI's next pricing moves — both face IPO pressure and may need to defend their coding revenue margins against Chinese pricing.

Source: the-decoder.com









