head-to-head
xAI: Grok 4.20 vs OpenAI: GPT-5.4 Mini
Side-by-side comparison of specs, pricing, benchmark scores, and task rankings. Updated 2026-06-16.
| xAI: Grok 4.20 | OpenAI: GPT-5.4 Mini | |
|---|---|---|
| Vendor | x-ai | openai |
| Quality Score | 100 | 100 |
| Benchmark Score | 74.7 | 85.9 |
| Input Price | $1.25/M | $0.75/M |
| Output Price | $2.50/M | $4.50/M |
| Context Window | 2,000,000 | 400,000 |
| Max Output | - | 128,000 |
| Tool Calling | ✓ | ✓ |
| Structured Output | ✓ | ✓ |
| Reasoning Mode | ✓ | ✓ |
| Vision | ✓ | ✓ |
| Audio | - | - |
| Benchmark Scores | ||
| ai_index | 81.4 | 80.7 |
| ai_index_agentic | 88.9 | 97.1 |
| ai_index_coding | 69.6 | 84.9 |
| eqbench | 55.8 | - |
Who wins by task?
| Task | xAI: Grok 4.20 | OpenAI: GPT-5.4 Mini |
|---|---|---|
| SQL Generation | 171 | 171 |
| Code Review | 170 | 166 |
| Code Completion | 122 | 132 |
| Code Refactoring | 168 | 163 |
| Bug Fixing | 185 | 181 |
| Unit Test Generation | 154 | 153 |
| Code Documentation | 147 | 142 |
| Regex Writing | 137 | 136 |
| CI/CD Pipelines | 146 | 144 |
| Frontend Component Design | 146 | 146 |
| Data Analysis | 170 | 171 |
| CSV / Spreadsheet Cleanup | 153 | 152 |
| ETL Scripting | 157 | 153 |
| JSON Extraction | 136 | 146 |
| Bulk Data Labeling | 125 | 133 |
| Long-Document Summarization | 166 | 159 |
| Short-Form Summarization | 124 | 131 |
| Blog Post Writing | 143 | 140 |
Scores reflect capability match + benchmark data + pricing for each task. Methodology →
Related comparisons
MoonshotAI: Kimi K2.7 Code vs xAI: Grok 4.20
MoonshotAI: Kimi K2.7 Code vs OpenAI: GPT-5.4 Mini
Qwen: Qwen3.7 Plus vs xAI: Grok 4.20
Qwen: Qwen3.7 Plus vs OpenAI: GPT-5.4 Mini
MiniMax: MiniMax M3 vs xAI: Grok 4.20
MiniMax: MiniMax M3 vs OpenAI: GPT-5.4 Mini
StepFun: Step 3.7 Flash vs xAI: Grok 4.20
StepFun: Step 3.7 Flash vs OpenAI: GPT-5.4 Mini