head-to-head

StepFun: Step 3.7 Flash vs xAI: Grok 4.20

Side-by-side comparison of specs, pricing, benchmark scores, and task rankings. Updated 2026-06-23.

StepFun: Step 3.7 Flash xAI: Grok 4.20
Vendorstepfunx-ai
Quality Score100100
Benchmark Score48.061.5
Input Price$0.20/M$1.25/M
Output Price$1.15/M$2.50/M
Context Window256,0002,000,000
Max Output256,000-
Tool Calling
Structured Output
Reasoning Mode
Vision
Audio--
Benchmark Scores
ai_index49.161.0
ai_index_agentic35.5-
ai_index_coding61.6-
eqbench-55.8

Who wins by task?

TaskStepFun: Step 3.7 FlashxAI: Grok 4.20
SQL Generation 152 144
Code Review 145 150
Code Completion 129 122
Code Refactoring 143 153
Bug Fixing 154 154
Unit Test Generation 138 135
Code Documentation 132 141
Regex Writing 129 127
CI/CD Pipelines 131 131
Frontend Component Design 135 131
Data Analysis 149 136
CSV / Spreadsheet Cleanup 140 139
ETL Scripting 137 142
JSON Extraction 142 123
Bulk Data Labeling 133 120
OCR / Document Parsing 137 135
Table Extraction from PDFs 137 135
Long-Document Summarization 141 154
Short-Form Summarization 128 119
Blog Post Writing 129 132

Scores reflect capability match + benchmark data + pricing for each task. Methodology →

Related comparisons

MoonshotAI: Kimi K2.7 Code vs StepFun: Step 3.7 Flash MoonshotAI: Kimi K2.7 Code vs xAI: Grok 4.20 Qwen: Qwen3.7 Plus vs StepFun: Step 3.7 Flash Qwen: Qwen3.7 Plus vs xAI: Grok 4.20 MiniMax: MiniMax M3 vs StepFun: Step 3.7 Flash MiniMax: MiniMax M3 vs xAI: Grok 4.20 StepFun: Step 3.7 Flash vs xAI: Grok Build 0.1 StepFun: Step 3.7 Flash vs Google: Gemini 3.5 Flash