stepfun

StepFun: Step 3.7 Flash

Step 3.7 Flash is a multimodal model from StepFun that accepts text, image, and video inputs with a 256,000-token context window that extends to the same length on output. It supports tool use and reasoning, giving it a footprint suited to agentic workflows, though structured output support is unconfirmed. On benchmarks, the model scores 48.0 on a blended basis across three benchmarks, with a notably stronger coding result (61.6) than its agentic score (35.5), suggesting it is more reliable for code-focused tasks than for multi-step autonomous work. At $0.20 per million input tokens and $1.15 per million output tokens, it sits in a budget-to-mid range tier, making it worth shortlisting for teams that need long-context, multimodal processing at moderate cost, particularly for coding use cases. Benchmark coverage is limited to three tests, so buyers requiring broader evidence of general capability should treat those scores as a partial picture.

Query via API → View on stepfun → Estimate cost

Quality Score

100/100

price + capability + benchmarks

Input Price

$0.20

per 1M tokens

Output Price

$1.15

per 1M tokens

Context Window

256,000

tokens

Model ID: stepfun/step-3.7-flash
Vendor: stepfun
Tokenizer: Other
Input Modalities: text, image, video
Output Modalities: text
Max Output: 256,000 tokens
Tool Calling: ✓ supported
Structured Output: ✓ supported
Reasoning Mode: ✓ supported
Vision: ✓ accepts images
Audio: no
Moderated: no

Strong choice for

Video

Video Auto-Tagging

Bulk video metadata generation.

Category rankings

Where StepFun: Step 3.7 Flash places across the 17 categories it ranks in. How we rank →

#	Category	Score
#2	Video Auto-TaggingVideo · of 25 ranked	123
#15	Image CaptioningVision · of 25 ranked	120
#15	Video SummarizationVideo · of 25 ranked	144
#15	Real-Time ChatLatency · of 25 ranked	117
#16	Customer SupportBusiness · of 25 ranked	131
#19	Bulk Data LabelingData · of 25 ranked	133
#19	Short-Form SummarizationWriting · of 25 ranked	128
#19	Email DraftingWriting · of 25 ranked	124
#19	Language LearningEducation · of 25 ranked	124
#19	Chat CompanionPersonal · of 25 ranked	128
#20	Trivia & General KnowledgePersonal · of 25 ranked	118
#20	Dataset AnnotationResearch · of 25 ranked	140
#21	JSON ExtractionData · of 25 ranked	142
#22	Browser AutomationAgents · of 25 ranked	152
#24	Social Media PostsWriting · of 25 ranked	119
#24	Voice Assistant BackendVoice · of 25 ranked	123
#25	Diagram ExtractionVision · of 25 ranked	140

Similar models

stepfun

StepFun: Step 3.5 Flash

$0.09 in / $0.30 out

262,144 ctx

99