qwen
Qwen: Qwen3 VL 32B Instruct
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...
Quality Score
91/100
composite of price, context, capability
Input Price
$0.10
per 1M tokens
Output Price
$0.42
per 1M tokens
Context Window
131,072
tokens
- Model ID
- qwen/qwen3-vl-32b-instruct
- Vendor
- qwen
- Tokenizer
- Qwen
- Input Modalities
- text, image
- Output Modalities
- text
- Max Output
- 32,768 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- not supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no
Similar models
qwen
Qwen: Qwen3 VL 8B Instruct
$0.08 in / $0.50 out
131,072 ctx
91
qwen
Qwen: Qwen3 VL 30B A3B Instruct
$0.13 in / $0.52 out
131,072 ctx
91
qwen
Qwen: Qwen3 30B A3B Thinking 2507
$0.08 in / $0.40 out
131,072 ctx
91
qwen
Qwen: Qwen3 Next 80B A3B Thinking
$0.10 in / $0.78 out
131,072 ctx
90
qwen
Qwen: Qwen3 235B A22B
$0.46 in / $1.82 out
131,072 ctx
90
qwen
Qwen: Qwen VL Max
$0.52 in / $2.08 out
131,072 ctx
90