Smart, Fast, and Affordable
Unmatched Price Performance
Fast responses, scalable performance, and costs you can plan for.
Large Language Models
AI Model | Current Speed(Tokens per Second) | Input Token Price(Per Million Tokens) | Output Token Price(Per Million Tokens) | |
---|---|---|---|---|
AI Model GPT OSS 20B 128k | Current Speed 1,000 TPS | Input Token Price(Per Million Tokens) $0.10(10M / $1)* | Output Token Price(Per Million Tokens) $0.50(2M / $1)* | |
AI Model GPT OSS 120B 128k | Current Speed 500 TPS | Input Token Price(Per Million Tokens) $0.15(6.67M / $1)* | Output Token Price(Per Million Tokens) $0.75(1.33M / $1)* | |
AI Model Kimi K2-0905 1T 256k | Current Speed 200 TPS | Input Token Price(Per Million Tokens) $1.00(1M / $1)* | Output Token Price(Per Million Tokens) $3.00(333,333 / $1)* | |
AI Model Llama 4 Scout (17Bx16E) 128k | Current Speed 594 TPS | Input Token Price(Per Million Tokens) $0.11(9.09M / $1)* | Output Token Price(Per Million Tokens) $0.34(2.94M / $1)* | |
AI Model Llama 4 Maverick (17Bx128E) 128k | Current Speed 562 TPS | Input Token Price(Per Million Tokens) $0.20(5M / $1)* | Output Token Price(Per Million Tokens) $0.60(1.6M / $1)* | |
AI Model Llama Guard 4 12B 128k | Current Speed 325 TPS | Input Token Price(Per Million Tokens) $0.20(5M / $1)* | Output Token Price(Per Million Tokens) $0.20(5M / $1)* | |
AI Model Qwen3 32B 131k | Current Speed 662 TPS | Input Token Price(Per Million Tokens) $0.29(3.44M / $1)* | Output Token Price(Per Million Tokens) $0.59(1.69M / $1)* | |
AI Model Llama 3.3 70B Versatile 128k | Current Speed 394 TPS | Input Token Price(Per Million Tokens) $0.59(1.69M / $1)* | Output Token Price(Per Million Tokens) $0.79(1.27M / $1)* | |
AI Model Llama 3.1 8B Instant 128k | Current Speed 840 TPS | Input Token Price(Per Million Tokens) $0.05(20M / $1)* | Output Token Price(Per Million Tokens) $0.08(12.5M / $1)* |
Text-to-Speech Models
AI Model | Characters /s | PricePrice (Per M Characters) | |
---|---|---|---|
AI Model PlayAI Dialog v1.0 | Characters /s 140 | Price $50.00 |
Automatic Speech Recognition (ASR) Models
AI Model | Speed Factor | Price(Per Hour Transcribed) | |
---|---|---|---|
AI Model Whisper V3 Large | Speed Factor 217x | Price $0.111* | |
AI Model Whisper Large v3 Turbo | Speed Factor 228x | Price $0.04* |
Prompt Caching
Model | Uncached Input Tokens (Per M Tokens) | Cached Input Tokens (Per M Tokens) | Output Tokens (Per M Tokens) |
---|---|---|---|
Model moonshotai/kimi-k2-instruct-0905 | Uncached Input Tokens (Per M Tokens) $1.00 | Cached Input Tokens (Per M Tokens) $0.50 | Output Tokens (Per M Tokens) $3.00 |
Model openai/gpt-oss-120b | Uncached Input Tokens (Per M Tokens) $0.15 | Cached Input Tokens (Per M Tokens) $0.075 | Output Tokens (Per M Tokens) $0.75 |
Model openai/gpt-oss-20b | Uncached Input Tokens (Per M Tokens) $0.10 | Cached Input Tokens (Per M Tokens) $0.05 | Output Tokens (Per M Tokens) $0.50 |
Built In Tools (Compound)
Tool | Price | Parameter |
---|---|---|
Tool Basic Search | Price $5 / 1000 requests | Parameter web_search |
Tool Advanced Search | Price $8 / 1000 requests | Parameter web_search |
Tool Visit Website | Price $1 / 1000 requests | Parameter visit_website |
Tool Code Execution | Price $0.18 / hour | Parameter code_interpreter |
Tool Browser Automation | Price $0.08 / hour | Parameter browser_automation |
Built In Tools (GPT-OSS)
Tool | Price | Parameter |
---|---|---|
Tool Browser Search - Basic Search | Price $5 / 1000 requests | Parameter browser_search - browser.search |
Tool Browser Search - Visit Website | Price $1 / 1000 requests | Parameter browser_search - browser.open |
Tool Code Execution - Python | Price $0.18 / hour | Parameter code_interpreter - python |
About Our Pricing
No Surprise Inference Bills
Other inference providers spike costs without warning. Some hide behind elastic pricing. Groq pricing is linear and predictable, with no hidden costs or idle infrastructure. Every new user is growth, not risk, and you can keep margins secure.
Get started for free and upgrade as your needs grow. View the pricing of our core models above and note all prices are in USD. Other models are available for specific customer requests including fine tuned models. Send us your inquiries here.
Compound Systems
Intelligent Tool Selection Across Multiple Models
Compound AI systems are powered by multiple openly-available models already supported in GroqCloud to intelligently and selectively use tools to answer user queries, starting first with web search and code execution.Pricing is passed through to the underlying models and server side tools that are part of the compound AI system.
Batch API
Process Large-Scale Workloads Asynchronously
Batch processing lets you run thousands of API requests at scale by submitting your workload as an asynchronous batch of requests to Groq with 50% lower cost, no impact to your standard rate limits, and 24-hour to 7 day processing window.
For enterprise API solutions or on-prem deployments, please fill out the form on our Enterprise Access Page.