On-demand Pricing for
Tokens-as-a-Service
Groq powers leading openly-available AI models.
Other models are available for specific customer requests including fine tuned models. Send us your inquiries here.
Large Language Models (LLMs)
AI Model | Current Speed(Tokens per Second) | Input Token Price(Per Million Tokens) | Output Token Price(Per Million Tokens) | ||
---|---|---|---|---|---|
DeepSeek R1 Distill Llama 70B | 275 | $0.75 (1.33M / $1)* | $0.99 (1.01M / $1)* | Try Now | Model Card |
DeepSeek R1 Distill Qwen 32B 128k | 140 | $0.69 (1.45M / $1)* | $0.69 (1.45M / $1)* | Try Now | Model Card |
Qwen 2.5 32B Instruct 128k | 200 | $0.79 (1.27M / $1)* | $0.79 (1.27M / $1)* | Try Now | Model Card |
Qwen 2.5 Coder 32B Instruct 128k | 390 | $0.79 (1.27M / $1)* | $0.79 (1.27M / $1)* | Try Now | Model Card |
Qwen QwQ 32B (Preview) 128k | 400 | $0.29 (3.44M / $1)* | $0.39 (2.56M / $1)* | Try Now | Model Card |
Mistral Saba 24B | 330 | $0.79 (1.27M / $1)* | $0.79 (1.27M / $1)* | Try Now | |
Llama 3.2 1B (Preview) 8k | 3100 | $0.04 (25M / $1)* | $0.04 (25M / $1)* | Try Now | Model Card |
Llama 3.2 3B (Preview) 8k | 1600 | $0.06 (17M / $1)* | $0.06 (17M / $1)* | Try Now | Model Card |
Llama 3.3 70B Versatile 128k | 275 | $0.59 (1.69M / $1)* | $0.79 (1.27M / $1)* | Try Now | Model Card |
Llama 3.1 8B Instant 128k | 750 | $0.05 (20M / $1)* | $0.08 (12.5M / $1)* | Try Now | Model Card |
Llama 3 70B 8k | 330 | $0.59 (1.69M / $1)* | $0.79 (1.27M / $1)* | Try Now | Model Card |
Llama 3 8B 8k | 1250 | $0.05 (20M / $1)* | $0.08 (12.5M / $1)* | Try Now | Model Card |
Gemma 2 9B 8k | 500 | $0.20 (5M / $1)* | $0.20 (5M / $1)* | Try Now | Model Card |
Llama Guard 3 8B 8k | 765 | $0.20 (5M / $1)* | $0.20 (5M / $1)* | Try Now | Model Card |
Llama 3.3 70B SpecDec 8k | 1600 | $0.59 (1.69M / $1)* | $0.99 (1.01M / $1)* | Try Now | Model Card |
*Approximate number of tokens per $
Automatic Speech Recognition (ASR) Models
AI Model | Speed Factor | Price(Per Hour Transcribed) | ||
---|---|---|---|---|
Whisper V3 Large | 189x | $0.111* | Try Now | Model Card |
Whisper Large v3 Turbo | 216x | $0.04* | Try Now | Model Card |
Distil-Whisper | 250x | $0.02* | Try Now | Model Card |
*For ASR models above, Groq charges a minimum of 10 seconds per request.
Vision Models
AI Model | Input Token Price(per M tokens) | Output Token Price(per M tokens) | ||
---|---|---|---|---|
Llama 3.2 11B Vision 8k (Preview) | $0.18* | $0.18* | Try Now | Model Card |
Llama 3.2 90B Vision 8k (Preview) | $0.90* | $0.90* | Try Now | Model Card |
*For vision models, images are billed at 6,400 tokens per image.
Batch API
The Batch API is now available for Dev Tier customers and currently offered at a 25% discount rate. Batch processing lets you run thousands of API requests at scale by submitting your workload as a batch to Groq and letting us process it with a 24-hour turnaround.
Now through the end of April 2025, we’re doubling our discount on Batch Processing to 50% off for all paid GroqCloud customers!
Learn more about Batch pricing and how to get started here.
For enterprise API solutions or on-prem deployments, please fill out the form on our Enterprise Access Page.
Never miss a Groq update! Sign up below for our latest news.