
Text
Llama 3.3 70B Spec Decode at 1,521 T/s
Groq offers industry-leading speed, quality, and pricing
Llama 3.3 70B Spec Decode at 1,521 T/s
Distil-Whisper at 325 T/s
Llama 3.2 11B (Vision) at 751 T/s
Historically, most AI investments have focused on training. We are now at an inflection point where trained AI models must move into production, to inference. Inference uses input data to solve real-world challenges. Inference also requires performance at speeds that deliver instant results. To meet this moment, Groq created LPU™ AI inference technology, providing the speed developers need and the scalability industry requires.
Groq LPU AI inference technology is built for speed, affordability, and energy efficiency. Groq created the LPU (Language Processing Unit) as a new category of AI processor because the demand for inference is accelerating and legacy technology can’t deliver the instant speed, scalability, and low latency the market requires
The LPU is fundamentally different from a GPU, which was originally designed for graphics processing.
From 10k to 100k tokens on a model like Llama 3.1 8B and 3.3 70B, you can see Groq performance via independent benchmarks. Where others shortcut context length to compete on speed, Groq never cuts corners to deliver performance, quality, or cost advantages.
Take advantage of fast AI inference performance for leading GenAI models across text, audio, and vision modalities from providers like Meta, DeepSeek, Qwen, Mistral, Google, OpenAI, and more.