Voice · best for

Top picks for Transcription (2026)

Speech-to-text accuracy and speed. Ranked from 335 live models on the OpenRouter catalog, weighted for audio input, low latency, requires_audio.

What this is Ranked by capability match + real benchmark scores (Aider Polyglot, Artificial Analysis Intelligence Index) + live pricing. Models need the right specs for Transcription, then benchmark performance refines the order. Full methodology →
#ModelScoreIn / 1MOut / 1MContext
1 Google: Gemini 3.1 Flash Litegoogle/gemini-3.1-flash-lite 123 $0.25 $1.50 1,048,576 Details →
2 NVIDIA: Nemotron 3 Nano Omni (free)nvidia/nemotron-3-nano-omni-30b-a3b-reasoning:free 123 Free Free 256,000 Details →
3 Xiaomi: MiMo-V2.5xiaomi/mimo-v2.5 123 $0.14 $0.28 1,048,576 Details →
4 Google: Gemini 3.1 Flash Lite Previewgoogle/gemini-3.1-flash-lite-preview 123 $0.25 $1.50 1,048,576 Details →
5 Google: Gemini 3 Flash Previewgoogle/gemini-3-flash-preview 123 $0.50 $3.00 1,048,576 Details →
6 Google: Gemini 2.5 Flash Lite Preview 09-2025google/gemini-2.5-flash-lite-preview-09-2025 123 $0.10 $0.40 1,048,576 Details →
7 Google: Gemini 2.5 Flash Litegoogle/gemini-2.5-flash-lite 123 $0.10 $0.40 1,048,576 Details →
8 Google: Gemini 2.5 Flashgoogle/gemini-2.5-flash 123 $0.30 $2.50 1,048,576 Details →
9 Google: Gemini 3.5 Flashgoogle/gemini-3.5-flash 115 $1.50 $9.00 1,048,576 Details →
10 Google Gemini Pro Latest~google/gemini-pro-latest 115 $2.00 $12.00 1,048,576 Details →
11 Google Gemini Flash Latest~google/gemini-flash-latest 115 $1.50 $9.00 1,048,576 Details →
12 Google: Gemini 3.1 Pro Preview Custom Toolsgoogle/gemini-3.1-pro-preview-customtools 115 $2.00 $12.00 1,048,756 Details →
13 Google: Gemini 3.1 Pro Previewgoogle/gemini-3.1-pro-preview 115 $2.00 $12.00 1,048,576 Details →
14 Google: Gemini 2.5 Progoogle/gemini-2.5-pro 115 $1.25 $10.00 1,048,576 Details →
15 Google: Gemini 2.5 Pro Preview 06-05google/gemini-2.5-pro-preview 115 $1.25 $10.00 1,048,576 Details →

How we ranked these

For Transcription, we weight models on audio input, low latency, requires_audio. Scores combine each model's public specs with independent benchmark results (Aider Polyglot coding scores, Artificial Analysis intelligence/coding/agentic indices) and live pricing. See full methodology →

About Transcription

Transcription is the conversion of spoken audio into written text using AI speech recognition. You need this when you have recorded conversations, meetings, interviews, or lectures that require searchable, editable text records. What separates high-performing models from weak ones is accuracy on accented speech, background noise handling, and punctuation placement. Whisper and similar transformer-based models excel at diverse audio conditions; older RNN approaches fail noticeably on overlapping speakers or poor audio quality. Speed matters in production: cloud APIs add latency (500ms to 2 seconds per minute of audio), while local models run faster but require GPU memory. Real-world accuracy typically ranges from 85 percent on clean studio audio to 60-70 percent on noisy field recordings. # WHEN_TO_USE Use this when you have audio recordings that need to become searchable text, like interviews, podcasts, meetings, or lectures you want indexed or archived without manual typing. # FAQ_Q1 What is the best AI model for transcription accuracy? # FAQ_A1 OpenAI's Whisper and Anthropic's backend models currently lead on mixed-condition audio. Whisper handles accents and background noise better than older alternatives like DeepSpeech, achieving 85-95 percent accuracy on standard English speech. For specialized domains (medical, legal), fine-tuned models often outperform general ones but require more setup. # FAQ_Q2 How much does AI transcription cost, and is it faster than manual transcription? # FAQ_A2 API pricing ranges from $0.01 to $0.25 per minute depending on the provider and model used. AI transcription is 50-100x faster than human typing, completing a 60-minute recording in 30-120 seconds depending on whether you use cloud or local processing, versus 4-6 hours of manual work.

When to use: Use this when you have audio recordings that need to become searchable text, like interviews, podcasts, meetings, or lectures you want indexed or archived without manual typing. # FAQ_Q1 What is the best AI model for transcription accuracy? # FAQ_A1 OpenAI's Whisper and Anthropic's backend models currently lead on mixed-condition audio. Whisper handles accents and background noise better than older alternatives like DeepSpeech, achieving 85-95 percent accuracy on standard English speech. For specialized domains (medical, legal), fine-tuned models often outperform general ones but require more setup. # FAQ_Q2 How much does AI transcription cost, and is it faster than manual transcription? # FAQ_A2 API pricing ranges from $0.01 to $0.25 per minute depending on the provider and model used. AI transcription is 50-100x faster than human typing, completing a 60-minute recording in 30-120 seconds depending on whether you use cloud or local processing, versus 4-6 hours of manual work.

Common questions

What is the best AI model for transcription accuracy? # FAQ_A1 OpenAI's Whisper and Anthropic's backend models currently lead on mixed-condition audio. Whisper handles accents and background noise better than older alternatives like DeepSpeech, achieving 85-95 percent accuracy on standard English speech. For specialized domains (medical, legal), fine-tuned models often outperform general ones but require more setup. # FAQ_Q2 How much does AI transcription cost, and is it faster than manual transcription? # FAQ_A2 API pricing ranges from $0.01 to $0.25 per minute depending on the provider and model used. AI transcription is 50-100x faster than human typing, completing a 60-minute recording in 30-120 seconds depending on whether you use cloud or local processing, versus 4-6 hours of manual work.

OpenAI's Whisper and Anthropic's backend models currently lead on mixed-condition audio. Whisper handles accents and background noise better than older alternatives like DeepSpeech, achieving 85-95 percent accuracy on standard English speech. For specialized domains (medical, legal), fine-tuned models often outperform general ones but require more setup. # FAQ_Q2 How much does AI transcription cost, and is it faster than manual transcription? # FAQ_A2 API pricing ranges from $0.01 to $0.25 per minute depending on the provider and model used. AI transcription is 50-100x faster than human typing, completing a 60-minute recording in 30-120 seconds depending on whether you use cloud or local processing, versus 4-6 hours of manual work.

How much does AI transcription cost, and is it faster than manual transcription? # FAQ_A2 API pricing ranges from $0.01 to $0.25 per minute depending on the provider and model used. AI transcription is 50-100x faster than human typing, completing a 60-minute recording in 30-120 seconds depending on whether you use cloud or local processing, versus 4-6 hours of manual work.

API pricing ranges from $0.01 to $0.25 per minute depending on the provider and model used. AI transcription is 50-100x faster than human typing, completing a 60-minute recording in 30-120 seconds depending on whether you use cloud or local processing, versus 4-6 hours of manual work.

Related tasks