Fast AI Inference
Multiple Modalities
Text
Text
Llama 3.3 70B Spec Decode
at 1,718 T/s
Audio
Audio
Distil-Whisper
at 203 T/s
Image
Image
Llama 3.2 11B (Vision)
at 751 T/s
A New Class of Inference Solution
Historically, most AI investments have focused on training. We are now at an inflection point where trained AI models must move into production, to inference. Inference uses input data to solve real-world challenges. Inference also requires performance at speeds that deliver instant results. To meet this moment, Groq created LPU AI inference technology, providing the speed developers need and the scalability industry requires.
Meet the LPU
Groq LPU AI inference technology is built for speed, affordability, and energy efficiency. Groq created the LPU (Language Processing Unit) as a new category of AI processor because the demand for inference is accelerating and legacy technology can’t deliver the instant speed, scalability, and low latency the market requires
How the LPU Is Different Than a GPU
The LPU is fundamentally different from a GPU, which was originally designed for graphics processing.
- Groq Compiler is in control, not secondary to hardware
- Compute and memory are co-located on the chip, eliminating resource bottlenecks
- Kernel-less compiler makes it easy and fast to compile new models
- No caches and switches means seamless scalability
Groq Speed Is Instant
From 10k to 100k tokens on a model like Llama 3.1 8B and 3.3 70B, you can see Groq performance via independent benchmarks. Where others shortcut context length to compete on speed, Groq never cuts corners to deliver performance, quality, or cost advantages.
Leading Openly-available
AI Models
Take advantage of fast AI inference performance for leading openly-available Large Language Models and Automatic Speech Recognition models, including:
- Llama 3 8 & 70B
- Llama 3.1 8B
- Llama 3.2 1 & 3B
- Llama 3.2 11 (Vision) & 90B (Vision)
- Llama 3.3 70B
- Mixtral 8x7B
- Gemma 2 9B
- Whisper Large V3
Fast Inference API for Half a Million Developers
Groq inference is available now to our community of over half a million developers. There is no waitlist for an API key and we’re giving away five billion tokens per day for free.
Groq Inference Resources
Never miss a Groq update! Sign up below for our latest news.