SpecializedOpenAI

Whisper v3

Context

30 seconds audio chunks (unlimited via streaming)

Pricing

$0.006/minute via OpenAI API; free self-hosted

Modalities

audio, text

Released

Nov 2023

Overview
OpenAI's open-source speech recognition model supporting transcription and translation across 100+ languages. Whisper v3 delivers near-human accuracy on diverse audio inputs including accented speech, background noise, and technical jargon.
Why it matters
Whisper v3 effectively commoditized speech-to-text. Before Whisper, accurate transcription required expensive commercial APIs or specialized models for each language. Now, any developer can run near-human-level transcription locally for free. This unlocked an explosion of voice-powered features: meeting summarizers, podcast search, accessibility tools, and voice-controlled interfaces. For CTOs, Whisper's open-source nature means zero per-minute API costs at scale — critical for applications processing thousands of hours of audio. Its multilingual capability also makes it the default choice for global products.

Key strengths

  • Near-human accuracy across 100+ languages
  • Fully open-source — free self-hosted inference
  • Handles noisy audio, accents, and jargon well
  • Both transcription and translation in one model
  • Extensive community tooling and integrations

We cover ai models every week.

Get the 5 AI stories that matter — free, every Friday.

Know the terms. Know the moves.

Get the 5 AI stories that matter every Friday — free.

Free forever. No spam.