Z.ai: GLM 4.7 Flash
GLM 4.7 Flash is a text-input model from Z.ai with a 202,752-token context window and a 16,384-token output ceiling. It supports tool use and reasoning, which makes it usable for multi-step workflows and agentic tasks. Structured output support is unconfirmed, and it accepts no image or audio input, so pipelines requiring those modalities will need a different option. At $0.06 per million input tokens and $0.40 per million output tokens, it sits at the budget end of the market. Its blended benchmark score of 54.1 across four benchmarks is modest overall, though its agentic score of 75.9 stands out against weaker results in coding (42.7) and general capability (49.7). That profile suits developers who need long-context, tool-calling support at low cost and whose workloads lean toward agentic orchestration rather than coding or broad reasoning tasks. Coverage is limited to four benchmarks, so treat performance claims in other areas as unverified.
- Model ID
- z-ai/glm-4.7-flash
- Vendor
- z-ai
- Tokenizer
- Other
- Input Modalities
- text
- Output Modalities
- text
- Max Output
- 16,384 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- text only
- Audio
- no
- Moderated
- no