Cost · best for

Best AI model for Cheap Bulk Inference (2026)

Lowest cost-per-million for high-volume jobs. Ranked from 346 live models on the OpenRouter catalog, weighted for low cost, low latency.

#ModelScoreIn / 1MOut / 1MContext
1 Pareto Code Routeropenrouter/pareto-code 1000138 $-1000000.00 $-1000000.00 200,000 Try →
2 Body Builder (beta)openrouter/bodybuilder 1000138 $-1000000.00 $-1000000.00 128,000 Try →
3 Auto Routeropenrouter/auto 1000138 $-1000000.00 $-1000000.00 2,000,000 Try →
4 Google: Gemma 4 26B A4B (free)google/gemma-4-26b-a4b-it:free 138 Free Free 262,144 Try →
5 Google: Gemma 4 31B (free)google/gemma-4-31b-it:free 138 Free Free 262,144 Try →
6 Qwen: Qwen3.5-9Bqwen/qwen3.5-9b 137 $0.10 $0.15 262,144 Try →
7 Google: Gemma 4 26B A4B google/gemma-4-26b-a4b-it 137 $0.07 $0.35 262,144 Try →
8 ByteDance Seed: Seed-2.0-Minibytedance-seed/seed-2.0-mini 137 $0.10 $0.40 262,144 Try →
9 Qwen: Qwen3.5-Flashqwen/qwen3.5-flash-02-23 137 $0.07 $0.26 1,000,000 Try →
10 ByteDance Seed: Seed 1.6 Flashbytedance-seed/seed-1.6-flash 137 $0.07 $0.30 262,144 Try →
11 Google: Gemini 2.5 Flash Lite Preview 09-2025google/gemini-2.5-flash-lite-preview-09-2025 137 $0.10 $0.40 1,048,576 Try →
12 OpenAI: GPT-5 Nanoopenai/gpt-5-nano 137 $0.05 $0.40 400,000 Try →
13 Google: Gemini 2.5 Flash Litegoogle/gemini-2.5-flash-lite 137 $0.10 $0.40 1,048,576 Try →
14 OpenAI: GPT-4.1 Nanoopenai/gpt-4.1-nano 137 $0.10 $0.40 1,047,576 Try →
15 Google: Gemini 2.0 Flash Litegoogle/gemini-2.0-flash-lite-001 137 $0.07 $0.30 1,048,576 Try →

How we ranked these

For Cheap Bulk Inference, we weight models on low cost, low latency. Higher means better. Scores combine OpenRouter's model metadata (context length, modality support, tool calling, structured output, reasoning capability) with public pricing. See full methodology →

Related tasks