We’re excited to announce that Distil-Whisper (distil-whisper-large-v3-en) is now available to the developer community on GroqCloud™ Developer Console. This compressed version of OpenAI’s Whisper model complements the existing Automatic Speech Recognition (ASR) model, Whisper Large V3, and is designed to provide faster and less expensive English speech recognition while maintaining comparable accuracy.
Distil-Whisper offers a remarkable balance of speed and accuracy. Compared to Whisper Large V3, Distil-Whisper is faster, currently running at 240x real-time speed factor, and 51% smaller, with only 756 million parameters versus Whisper Large V3’s 1.55 billion. Despite this reduction in size, Distil-Whisper performs remarkably well, achieving a Word Error Rate (WER) within 2.4% on short-form transcriptions.
The distilled model excels in robustness to noise and shows reduced hallucination, with 1.3x fewer instances of repeated 5-gram word duplicates and a 2.1% reduction in insertion error rate compared to Whisper Large V3. With its compatibility with popular Whisper libraries, Distil-Whisper is an attractive option for commercial applications seeking to improve transcription efficiency without sacrificing quality.
Enterprises can leverage Distil-Whisper to build a range of innovative AI applications, such as:
- Real-time customer service chatbots that can quickly and accurately transcribe customer inquiries and respond with personalized solutions
- Automated speech-to-text systems for industries like healthcare, finance, and education, where accurate transcription is critical
- Voice-controlled interfaces for smart homes, cars, and other devices, where fast and accurate speech recognition is essential
- Transcribing audio and video recordings – such as interviews, lectures, podcasts, and TV shows – for media professionals, enabling them to focus on editing, analysis, and other tasks
- In conjunction with LLMs, transcribe and summarize meeting recordings, creating a list of action items and decisions
- Simplify processing of insurance claims and improve service by transcribing recordings of interviews, phone calls, and other interactions with customers
Pricing
We’re excited to offer Distil-Whisper at a competitive price point. The new model will be available at a cost of $0.02 per hour, making it an attractive option for developers and enterprises looking to improve their English speech recognition capabilities without breaking the bank.
With the release of Distil-Whisper, we will be increasing the cost of Whisper Large V3 for on-demand GroqCloud™ users to $0.111 per hour effective October 1, 2024. We will also be implementing a minimum per request charge which is equivalent to transcribing 10 seconds of audio on all ASR models (for requests less than 10 seconds). For Distill-Whisper, this translates to $0.01 per 18,000 requests if all of the requests are less than 10 seconds.
Comparison Table
Not sure which model is best for your usage? See below for a quick comparison table of the two models to help guide your decision.
*Prices effective October 1, 2024
Performance
Artificial Analysis has included our Distil-Whisper performance in their latest independent speech-to-text benchmark. Dive into some of the results below.
Speed Factor
Measured as input audio seconds transcribed per second, Groq clocks in at a speed factor rate of 240x real-time, the fastest implementation of Whisper models.
Price
Artificial Analysis defines price as USD per 1000 minutes of audio, bringing the Groq price to $0.333 based on offering Distill-Whisper at a price of $0.02 per hour transcribed.
With the addition of Distil-Whisper, developers can now access this powerful model and start building efficient and accurate speech recognition applications using Groq. Start building today on GroqCloud™ Developer Console.