z-ai

Z.ai: GLM 5V Turbo

GLM 5V Turbo is a multimodal model from Z.ai that accepts text, images, and video as input. It offers a 202,752-token context window and can generate up to 131,072 tokens per response. The model supports tool use and reasoning, though structured output support is not confirmed. It is a closed, paid offering with no open-weights version available. At $1.20 per million input tokens and $4.00 per million output tokens, it sits in a mid-range price bracket. Its blended benchmark score of 56.8 is drawn from only one independent benchmark, so that figure should be treated with caution rather than as a settled measure of capability. Teams that need video understanding alongside long-context text and tool-calling workflows may find it worth evaluating, but buyers who prioritize well-validated performance data should wait for broader benchmark coverage before committing.

Quality Score
100/100
price + capability + benchmarks
Input Price
$1.20
per 1M tokens
Output Price
$4.00
per 1M tokens
Context Window
202,752
tokens
Model ID
z-ai/glm-5v-turbo
Vendor
z-ai
Tokenizer
Other
Input Modalities
image, text, video
Output Modalities
text
Max Output
131,072 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
no
Moderated
no

Category rankings

Where Z.ai: GLM 5V Turbo places across the 1 category it ranks in. How we rank →

#CategoryScore
#21 Video SummarizationVideo · of 25 ranked 139

Similar models