head-to-head

StepFun: Step 3.7 Flash vs xAI: Grok 4.20

Side-by-side comparison of specs, pricing, benchmark scores, and task rankings. Updated 2026-06-12.

StepFun: Step 3.7 Flash xAI: Grok 4.20
Vendorstepfunx-ai
Quality Score100100
Benchmark Score74.474.7
Input Price$0.20/M$1.25/M
Output Price$1.15/M$2.50/M
Context Window256,0002,000,000
Max Output256,000-
Tool Calling
Structured Output
Reasoning Mode
Vision
Audio--
Benchmark Scores
ai_index70.381.4
ai_index_agentic98.288.9
ai_index_coding61.269.6
eqbench-55.8

Who wins by task?

TaskStepFun: Step 3.7 FlashxAI: Grok 4.20
SQL Generation 163 171
Code Review 156 170
Code Completion 130 122
Code Refactoring 151 168
Bug Fixing 171 185
Unit Test Generation 146 154
Code Documentation 136 147
Regex Writing 135 137
CI/CD Pipelines 137 146
Frontend Component Design 142 146
Data Analysis 166 170
CSV / Spreadsheet Cleanup 143 153
ETL Scripting 144 157
JSON Extraction 143 136
Bulk Data Labeling 133 125
OCR / Document Parsing 139 145
Table Extraction from PDFs 139 145
Long-Document Summarization 148 166
Short-Form Summarization 131 124
Blog Post Writing 135 143

Scores reflect capability match + benchmark data + pricing for each task. Methodology →

Related comparisons

Qwen: Qwen3.7 Plus vs StepFun: Step 3.7 Flash Qwen: Qwen3.7 Plus vs xAI: Grok 4.20 MiniMax: MiniMax M3 vs StepFun: Step 3.7 Flash MiniMax: MiniMax M3 vs xAI: Grok 4.20 StepFun: Step 3.7 Flash vs xAI: Grok Build 0.1 StepFun: Step 3.7 Flash vs Google: Gemini 3.5 Flash StepFun: Step 3.7 Flash vs Google: Gemini 3.1 Flash Lite StepFun: Step 3.7 Flash vs xAI: Grok 4.3