Fast AI Inference

Groq offers industry-leading speed, quality, and pricing

Multiple Modalities

Text
Llama 3.3 70B Spec Decode at 1,521 T/s
Audio
Distil-Whisper at 325 T/s
Image
Llama 3.2 11B (Vision) at 751 T/s

A New Class of Inference Solution

Historically, most AI investments have focused on training. We are now at an inflection point where trained AI models must move into production, to inference. Inference uses input data to solve real-world challenges. Inference also requires performance at speeds that deliver instant results. To meet this moment, Groq created LPU™ AI inference technology, providing the speed developers need and the scalability industry requires.

Try GroqChat Get Free API Key

Developer Integrators

Meet the LPU

Groq LPU AI inference technology is built for speed, affordability, and energy efficiency. Groq created the LPU (Language Processing Unit) as a new category of AI processor because the demand for inference is accelerating and legacy technology can’t deliver the instant speed, scalability, and low latency the market requires

Read the Whitepaper

How the LPU Is Different Than a GPU

The LPU is fundamentally different from a GPU, which was originally designed for graphics processing.

Groq Compiler is in control, not secondary to hardware
Compute and memory are co-located on the chip, eliminating resource bottlenecks
Kernel-less compiler makes it easy and fast to compile new models
No caches and switches means seamless scalability

Groq Speed Is Instant

From 10k to 100k tokens on a model like Llama 3.1 8B and 3.3 70B, you can see Groq performance via independent benchmarks. Where others shortcut context length to compete on speed, Groq never cuts corners to deliver performance, quality, or cost advantages.

Leading GenAI Models

Take advantage of fast AI inference performance for leading GenAI models across text, audio, and vision modalities from providers like Meta, DeepSeek, Qwen, Mistral, Google, OpenAI, and more.

Join Over 1.7 Million Developers and Teams

Try Groq For Free

Groq Inference Resources

Project Media QA: Summarize and ask questions about online media content
Launch Demo
Vectorize: A powerful RAG experimentation and pipeline platform
Read Case Study
Real-time Inference for the Real World: Athena Intelligence
Read Case Study