Skip to content
gentic.news — AI News Intelligence Platform
Connecting to the Living Graph…

Listen to today's AI briefing

Daily podcast — 5 min, AI-narrated summary of top stories

A line chart comparing Chinese and US AI model prices per million tokens, showing a steep drop for Chinese models…
Big TechScore: 79

GLM-5.2 matches Opus 4.7 at 1/5 the price in Snowflake coding test

Zhipu AI's GLM-5.2 matched Claude Opus 4.7 on a Snowflake coding benchmark at one-fifth the cost, threatening Western AI lab pricing and IPO valuations.

·6h ago·3 min read··10 views·AI-Generated·Report error
Share:
Source: the-decoder.comvia the_decoder, scmp_techMulti-Source
How does Zhipu AI's GLM-5.2 compare to Anthropic's Claude Opus 4.7 in Snowflake's coding benchmark?

Snowflake CEO Sridhar Ramaswamy found Zhipu AI's GLM-5.2 solved 66% of 103 coding tasks versus Claude Opus 4.7's 67%, at $4.40 per million output tokens versus $25 for Opus, though GLM consumed nearly twice the tokens per task.

TL;DR

GLM-5.2 solved 66% vs Opus 4.7's 67% in 103-task coding benchmark. · First-attempt accuracy: Opus 53.7%, GLM 47.6%. · GLM costs $4.40/M output tokens vs Opus $25/M.

Zhipu AI's GLM-5.2 matched Claude Opus 4.7 on 66% vs 67% of 103 coding tasks in a Snowflake benchmark at one-fifth the output token cost. The Chinese model's price of $4.40 per million output tokens versus Opus's $25 puts direct pressure on Anthropic and OpenAI's flagship coding use case.

Key facts

  • GLM-5.2 solved 66% of 103 coding tasks vs Opus 4.7's 67%.
  • GLM costs $4.40/M output tokens; Opus costs $25/M.
  • First-attempt accuracy: Opus 53.7%, GLM 47.6%.
  • GLM used 860M tokens vs Opus's 439M tokens.
  • GLM averaged 99 iterations per task vs Opus's 80.

Snowflake CEO Sridhar Ramaswamy published results Monday from a hands-on test comparing GLM-5.2 and Claude Opus 4.7 on 103 coding tasks requiring code that works across both DuckDB and Snowflake. With three attempts per task, the models were nearly tied: 66% solved for GLM-5.2 versus 67% for Opus 4.7 According to The Decoder.

First-attempt accuracy tells a different story. Opus hit 53.7% while GLM managed 47.6%, a 6.1-point gap that reveals GLM's output consistency problem. The Chinese model also burned through 860 million tokens — nearly double Opus's 439 million — and averaged 99 iterations per task versus Opus's 80.

The price gap that matters

The competitive picture flips on cost. GLM-5.2 costs $1.40 per million input tokens and $4.40 per million output tokens, according to Zhipu's official price sheet. Third-party resellers undercut even that. Claude Opus 4.7 runs $5 input and $25 output. GPT-5.5 costs $5 input and $30 output.

GLM's higher token consumption erodes some of that advantage — but not enough. At $4.40/M output, even doubling token usage still lands well below $25/M. For high-volume coding workloads, that arithmetic shifts enterprise procurement decisions.

Where GLM wins and loses

Ramaswamy noted GLM's strength is validating code reliably across both platforms simultaneously. One task only GLM could solve. Its weakness: giving up too early and obsessively checking wrong things. On one task, GLM fired off 411 tool calls over 24 minutes — checking row counts, distributions, null values, column types — and still failed all three attempts. Opus solved the same task with 49 calls in 9 minutes.

"The claim that GLM produces cleaner code didn't hold up," Ramaswamy said. More checks don't lead to more correct results. Still, Snowflake's team is excited about GLM-5.2 and wants to make it available to customers.

The valuation stress test

The real story isn't benchmark scores. It's what happens to Western AI valuations if coding — the flagship enterprise use case for both Anthropic and OpenAI — faces sustained price compression. OpenAI filed IPO paperwork in June 2026 [per prior reporting]. Anthropic targets a 2026 IPO at a $1T+ valuation. Both companies have raised $11.5B+ and $40B+ respectively, with infrastructure commitments tied to those numbers.

If enterprise customers can get 90% of the coding capability at 20% of the cost from a Chinese model, procurement teams will notice. GLM-5.2's token inefficiency matters less when the unit price is that low.

What to watch

Watch for Snowflake's formal GLM-5.2 availability announcement and whether enterprise customers shift coding workloads. Also track Anthropic and OpenAI's next pricing moves — both face IPO pressure and may need to defend their coding revenue margins against Chinese pricing.

Opus 4.7 is the better model, but GLM is competitive in Snowflake's code benchmark and costs far less. | Image: via X


Source: the-decoder.com


Sources cited in this article

  1. Zhipu's
  2. Ramaswamy
Source: gentic.news · · author= · citation.json

AI-assisted reporting. Generated by gentic.news from 3 verified sources, fact-checked against the Living Graph of 4,300+ entities. Edited by Ala SMITH.

Following this story?

Get a weekly digest with AI predictions, trends, and analysis — free.

AI Analysis

This benchmark is a stress test for the Western AI pricing thesis. Anthropic and OpenAI have built their revenue models around premium pricing for coding — the single highest-value enterprise use case. GLM-5.2's results show that Chinese models have closed the quality gap enough that price becomes the deciding factor for cost-sensitive workloads. The token inefficiency is real but secondary: a 5.7x per-token gap means GLM could burn 3x the tokens and still be cheaper. The timing compounds the pressure. Both Anthropic and OpenAI have IPO paperwork filed or in preparation for late 2026. Their valuations assume continued revenue acceleration. If enterprise buyers start arbitraging coding workloads to Chinese models, the revenue growth narrative fractures. The infrastructure buildout tied to those valuations — $14B from Google into Anthropic alone, plus OpenAI's chip deals with Broadcom — becomes harder to justify. Ramaswamy's admission that GLM produced "cleaner code" claims didn't hold up is important. The model's tendency to over-check and burn tokens suggests it's less architecturally efficient than Opus. But for bulk coding tasks where cost per solved problem matters more than per-token efficiency, GLM-5.2 is already competitive. The next version will likely close the consistency gap.
Compare side-by-side
Anthropic vs OpenAI
Enjoyed this article?
Share:

AI Toolslive

Five one-click lenses on this article. Cached for 24h.

Pick a tool above to generate an instant lens on this article.

Related Articles

From the lab

The framework underneath this story

Every article on this site sits on top of one engine and one framework — both built by the lab.

More in Big Tech

View all