Largest, Most Capable ASR Model Now Faster on GroqCloud™

Written by:
Groq
Share:

Up to 100MB File Size & Faster Speed for Whisper Large v3 Powered by Groq 

Automatic Speech Recognition (ASR) and Speech-To-Text (STT) models are expanding demand for a modality beyond text inputs and changing the way we interact with AI, making it easier than ever to have control and interact with AI for your needs. For end users, the impact is already being felt. From dictation to real-time captions to customer support, with ASR and STT the use cases are ubiquitous. Developers building on GroqCloud™ are using ASR models like Whisper Large v3 to run their apps. Examples include: 

  • Stream of Thought by Dan Hernandez – you can start recording and use your voice to draft a quick email, generate a prompt for other AI tools, create a social media post, or any other kind of text that you’d want to pull out and use in another tool within your workflow.
  • Brainy Read by Misbah Syed which combines the power of Whisper and a vision model to analyze videos and generate content in the backend that’s then displayed in a Notion-like interface. 

To deliver on the natural experience of voice these use cases require ultra-low latency speeds of inference. This all makes a major speed milestone for whisper-large-v3, a powerful ASR model available on GroqCloud today, even more exciting. More speed for our customers means more impact for end-users.  

Our team achieved a significant boost to throughput, with a real-time speed factor of 299x. That’s not all: our Distil and Turbo models have also received substantial upgrades. More good news? The new performance doesn’t impact quality – our Word Error Rate (WER) remains unaffected at 10.3%, meaning you can count on the same quality standards. To top it all off, we’re excited to announce that our paid tier GroqCloud customers can now process files up to 100MB in size, as long as they’re provided via a URL. We can’t wait to see the innovative applications that our customers will develop with these new improvements. In the meantime, explore the highlights from Artificial Analysis’ official benchmarks below. 

Performance

Artificial Analysis has independently benchmarked Groq performance of Whisper Large v3 at a real-time speed factor of 299x. This is ~120 T/sec faster than our previous performance benchmark. 

Feature Update

We’re excited to share that the filze size for Whisper Large v3 is now 100MB for paid tier GroqCloud™ customers. This is up from 40MB previously. Note, you must provide the file via a url to use the full limit.

Pricing

Groq offers Whisper Large v3 at $0.111 per hour transcribed (please note for ASR models, Groq charges a minimum of 10 seconds per request). See all Groq pricing here

Quality

The quality of ASR models like Whisper Large v3 is measured by its Word Error Rate (WER), or the percentage of words incorrect in the transcription. Groq has made no compromises to its previous quality benchmark despite the speed increase, maintaining a WER of 10.3% for its latest benchmark. 

Whisper Model Comparisons on Groq

The Whisper model is ideal for applications such as:

  • Real-time customer service chatbots that need to quickly transcribe customer inquiries and respond with personalized solutions
  • Automated speech-to-text systems for industries like finance and education, where fast and accurate transcription is critical
  • Voice-controlled interfaces for smart homes, cars, and other devices, where rapid speech recognition is essential
  • Audio and video recording transcriptions – such as interviews, lectures, podcasts, and TV shows – for media professionals, enabling them to focus on editing, analysis, and other tasks
  • Transcription and summarization (in conjunction with LLMs) that can used on meeting recordings to create a list of action items and decisions or simplify the process of insurance claims and improve service by transcribing recordings of interviews, phone calls, and other interactions with customers

Not sure which model is best for your usage? See a quick comparison of the three models as well as a decision tree to help guide you. You can explore our full ASR Model Guide here

The latest Groq news. Delivered to your inbox.