stepfun

StepFun: Step 3.7 Flash

Step 3.7 Flash is a multimodal model from StepFun that accepts text, image, and video inputs with a 256,000-token context window that extends to the same length on output. It supports tool use and reasoning, giving it a footprint suited to agentic workflows, though structured output support is unconfirmed. On benchmarks, the model scores 48.0 on a blended basis across three benchmarks, with a notably stronger coding result (61.6) than its agentic score (35.5), suggesting it is more reliable for code-focused tasks than for multi-step autonomous work. At $0.20 per million input tokens and $1.15 per million output tokens, it sits in a budget-to-mid range tier, making it worth shortlisting for teams that need long-context, multimodal processing at moderate cost, particularly for coding use cases. Benchmark coverage is limited to three tests, so buyers requiring broader evidence of general capability should treat those scores as a partial picture.

Quality Score
100/100
price + capability + benchmarks
Input Price
$0.20
per 1M tokens
Output Price
$1.15
per 1M tokens
Context Window
256,000
tokens
Model ID
stepfun/step-3.7-flash
Vendor
stepfun
Tokenizer
Other
Input Modalities
text, image, video
Output Modalities
text
Max Output
256,000 tokens
Tool Calling
✓ supported
Structured Output
✓ supported
Reasoning Mode
✓ supported
Vision
✓ accepts images
Audio
no
Moderated
no

Strong choice for

Category rankings

Where StepFun: Step 3.7 Flash places across the 17 categories it ranks in. How we rank →

#CategoryScore
#2 Video Auto-TaggingVideo · of 25 ranked 123
#15 Image CaptioningVision · of 25 ranked 120
#15 Video SummarizationVideo · of 25 ranked 144
#15 Real-Time ChatLatency · of 25 ranked 117
#16 Customer SupportBusiness · of 25 ranked 131
#19 Bulk Data LabelingData · of 25 ranked 133
#19 Short-Form SummarizationWriting · of 25 ranked 128
#19 Email DraftingWriting · of 25 ranked 124
#19 Language LearningEducation · of 25 ranked 124
#19 Chat CompanionPersonal · of 25 ranked 128
#20 Trivia & General KnowledgePersonal · of 25 ranked 118
#20 Dataset AnnotationResearch · of 25 ranked 140
#21 JSON ExtractionData · of 25 ranked 142
#22 Browser AutomationAgents · of 25 ranked 152
#24 Social Media PostsWriting · of 25 ranked 119
#24 Voice Assistant BackendVoice · of 25 ranked 123
#25 Diagram ExtractionVision · of 25 ranked 140

Similar models