Is GLM-5.2 cheaper than the frontier?

A working breakdown of LLM API pricing across Anthropic, OpenAI, Google, and Z.ai, at every reasoning effort. Where the popular claim breaks down, where it could technically be true, and what the data says about intelligence.


What the data shows

Three numbers that answer the question. Source: first-party pricing and Artificial Analysis Intelligence Index v4.1.

vs GPT-5.5
5.6×

cheaper per request at medium reasoning effort

vs Claude Opus 4.8
5.1×

cheaper per request at medium reasoning effort

AA Intelligence Index v4.1
51

leading open-weights model, tied with Opus 4.8 and GPT-5.5


What people are saying

A widely-shared take from a respected voice in the space argues GLM-5.2 is more expensive than the proprietary frontier at medium reasoning effort. Worth examining.

Theo is right about the first half. GLM-5.2 does surpass GPT-5.4 and every Gemini model on the Intelligence Index. The second half, the cheaper and smarter part, breaks when you run the math.


The pricing math

Reasoning-effort parameters control how many tokens a model spends thinking. They do not change the per-token rate.

The reasoning_effort parameter on GPT-5.5, the extended-thinking budget on Claude, and reasoning_effort on GLM-5.2 all do the same thing. They control how many tokens the model spends thinking. The per-token rate stays fixed regardless of effort mode.

Model Provider Input /MTok Cache hit /MTok Output /MTok
GLM-5.2 Z.ai $1.40 $0.26 $4.40
Gemini 3 Pro Google $2.00 $0.20 $12.00
Claude Opus 4.8 Anthropic $5.00 $0.50 $25.00
GPT-5.5 OpenAI $5.00 $0.50 $30.00
Per-token pricing published June 21, 2026. Anthropic cache write is billed separately at $6.25/MTok; OpenAI and Google cache write is free.

For Opus 4.8 at medium effort to come out cheaper than GLM-5.2, Opus 4.8 would need to emit roughly 5.7× fewer reasoning tokens than GLM-5.2, enough to close the per-token gap. In practice, GLM-5.2 uses more reasoning tokens at medium effort than Opus 4.8 does. The per-token rate gap is large enough that GLM-5.2 still wins.


Cost at a real medium-reasoning workload

20K input tokens, 70% prompt-cache hit, 5K output including reasoning. Reasoning tokens are billed as output on every provider.

Cost per request

Sorted cheapest first. Bar length is cost. Color intensity shows where each model ranks.

Opus 4.8 → GLM-5.2

$0.29 → $0.06

5.1× more expensive on the same workload.

GPT-5.5 → GLM-5.2

$0.34 → $0.06

6.0× more expensive on the same workload.

Gemini 3 Pro → GLM-5.2

$0.13 → $0.06

2.4× more expensive on the same workload.


Smarter? No. They’re tied.

Artificial Analysis Intelligence Index v4.1, a weighted composite of 9 evaluations including GDPval-AA v2, Terminal-Bench v2.1, HLE, GPQA Diamond, and AA-Omniscience.

AA Intelligence Index v4.1

The four frontier models land within two points of each other. Only Claude Fable 5, at $10 and $50 per MTok, clearly leads.

GLM-5.2 at 51 is the leading open-weights model on the index. On GDPval-AA v2 specifically, the real-world agentic work benchmark, GLM-5.2 scores 1524, ahead of GPT-5.5 (xhigh) at 1514. The Intelligence Index lead belongs to Claude Fable 5 (~60), which costs 2 to 3× more than everything else on this chart.


The catch: more output tokens

Theo followed up with a fair point. Cheaper per token does not mean cheaper in time.

In his follow-up, Theo added: “the volume of them means you’ll spend much more time waiting for results.” That’s true, and worth being upfront about.

Output tokens per task

GLM-5.2 emits the most output tokens of any leading open-weights model. Lower per-token cost, but more tokens to wait for.

At max thinking effort, GLM-5.2 emits 43k output tokens per Intelligence Index task (37k reasoning, 6k answer). For comparison, MiniMax-M3 emits 24k, Kimi K2.6 emits 35k, and DeepSeek V4 Pro (max) emits 37k. GLM-5.2 is roughly 65% to 80% more verbose than the leanest open-weights peers.

For batch jobs and overnight pipelines, this is irrelevant. The cost advantage dominates. For latency-sensitive interactive use, Opus 4.8 or GPT-5.5 medium will feel snappier, even at higher cost, because they finish thinking in fewer tokens. That is a real trade-off, not a footnote.


Where the claim could be technically true

Three narrow cases where “Opus 4.8 medium beats GLM-5.2 medium on cost” can hold up, if you squint.

Narrow case 1

Batch API

Anthropic and OpenAI offer 50% discounts on batch jobs. Z.ai has no published batch tier. The gap shrinks to roughly 2.5× instead of 5×.

Narrow case 2

Subscription vs API

Claude Max or ChatGPT Pro at $200/mo bundles effectively unlimited usage of the top model. That pricing is not comparable to per-token API rates.

Narrow case 3

Single benchmarks

On individual evals (HLE, certain coding tasks), Opus 4.8 medium or GPT-5.5 medium can outperform GLM-5.2. The composite index places them as tied.


The hype is right

For the first time, an open-weights model is a head-on competitor to the proprietary frontier. That matters.

GLM-5.2 is cheaper than Opus 4.8 and GPT-5.5 at every published reasoning-effort level. It is roughly tied with both on the Artificial Analysis Intelligence Index v4.1. It comes from Z.ai under an MIT license, which means self-hosting, fine-tuning, and zero vendor lock-in are real options. None of the proprietary frontier models offer that.

For the first time, an open-weights model is a head-on competitor to the frontier on cost, intelligence, and availability. That is the right thing for builders who don’t want to depend on a single provider. The hype is correct. Set expectations on latency, then ship.


Sources

Every claim on this page ties back to a first-party source. No invented numbers.

Last updated June 21, 2026. Pricing pulled from first-party pricing pages on this date. Intelligence Index v4.1 published June 15, 2026.


Content crafted by The Spiel Engine, from a comparison session, edited and re-designed to HTML.