z-ai

Z.ai: GLM 4.7 Flash

GLM 4.7 Flash is a text-input model from Z.ai with a 202,752-token context window and a 16,384-token output ceiling. It supports tool use and reasoning, which makes it usable for multi-step workflows and agentic tasks. Structured output support is unconfirmed, and it accepts no image or audio input, so pipelines requiring those modalities will need a different option. At $0.06 per million input tokens and $0.40 per million output tokens, it sits at the budget end of the market. Its blended benchmark score of 54.1 across four benchmarks is modest overall, though its agentic score of 75.9 stands out against weaker results in coding (42.7) and general capability (49.7). That profile suits developers who need long-context, tool-calling support at low cost and whose workloads lean toward agentic orchestration rather than coding or broad reasoning tasks. Coverage is limited to four benchmarks, so treat performance claims in other areas as unverified.

Quality Score
99/100
price + capability + benchmarks
Input Price
$0.06
per 1M tokens
Output Price
$0.40
per 1M tokens
Context Window
202,752
tokens
Model ID
z-ai/glm-4.7-flash
Vendor
z-ai
Tokenizer
Other
Input Modalities
text
Output Modalities
text
Max Output
16,384 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
text only
Audio
no
Moderated
no

Similar models