StepFun: Step 3.7 Flash
Step 3.7 Flash is a multimodal model from StepFun that accepts text, image, and video inputs with a 256,000-token context window that extends to the same length on output. It supports tool use and reasoning, giving it a footprint suited to agentic workflows, though structured output support is unconfirmed. On benchmarks, the model scores 48.0 on a blended basis across three benchmarks, with a notably stronger coding result (61.6) than its agentic score (35.5), suggesting it is more reliable for code-focused tasks than for multi-step autonomous work. At $0.20 per million input tokens and $1.15 per million output tokens, it sits in a budget-to-mid range tier, making it worth shortlisting for teams that need long-context, multimodal processing at moderate cost, particularly for coding use cases. Benchmark coverage is limited to three tests, so buyers requiring broader evidence of general capability should treat those scores as a partial picture.
- Model ID
- stepfun/step-3.7-flash
- Vendor
- stepfun
- Tokenizer
- Other
- Input Modalities
- text, image, video
- Output Modalities
- text
- Max Output
- 256,000 tokens
- Tool Calling
- ✓ supported
- Structured Output
- ✓ supported
- Reasoning Mode
- ✓ supported
- Vision
- ✓ accepts images
- Audio
- no
- Moderated
- no
Strong choice for
Category rankings
Where StepFun: Step 3.7 Flash places across the 17 categories it ranks in. How we rank →
| # | Category | Score |
|---|---|---|
| #2 | Video Auto-TaggingVideo · of 25 ranked | 123 |
| #15 | Image CaptioningVision · of 25 ranked | 120 |
| #15 | Video SummarizationVideo · of 25 ranked | 144 |
| #15 | Real-Time ChatLatency · of 25 ranked | 117 |
| #16 | Customer SupportBusiness · of 25 ranked | 131 |
| #19 | Bulk Data LabelingData · of 25 ranked | 133 |
| #19 | Short-Form SummarizationWriting · of 25 ranked | 128 |
| #19 | Email DraftingWriting · of 25 ranked | 124 |
| #19 | Language LearningEducation · of 25 ranked | 124 |
| #19 | Chat CompanionPersonal · of 25 ranked | 128 |
| #20 | Trivia & General KnowledgePersonal · of 25 ranked | 118 |
| #20 | Dataset AnnotationResearch · of 25 ranked | 140 |
| #21 | JSON ExtractionData · of 25 ranked | 142 |
| #22 | Browser AutomationAgents · of 25 ranked | 152 |
| #24 | Social Media PostsWriting · of 25 ranked | 119 |
| #24 | Voice Assistant BackendVoice · of 25 ranked | 123 |
| #25 | Diagram ExtractionVision · of 25 ranked | 140 |