qwen

Qwen: Qwen3 VL 32B Instruct

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...

Try on OpenRouter → Estimate cost

Quality Score

91/100

composite of price, context, capability

Input Price

$0.10

per 1M tokens

Output Price

$0.42

per 1M tokens

Context Window

131,072

tokens

Model ID: qwen/qwen3-vl-32b-instruct
Vendor: qwen
Tokenizer: Qwen
Input Modalities: text, image
Output Modalities: text
Max Output: 32,768 tokens
Tool Calling: ✓ supported
Structured Output: ✓ supported
Reasoning Mode: not supported
Vision: ✓ accepts images
Audio: no
Moderated: no

Similar models

qwen

Qwen: Qwen3 VL 32B Instruct

Similar models

Qwen: Qwen3 VL 8B Instruct

Qwen: Qwen3 VL 30B A3B Instruct

Qwen: Qwen3 30B A3B Thinking 2507

Qwen: Qwen3 Next 80B A3B Thinking

Qwen: Qwen3 235B A22B

Qwen: Qwen VL Max