
Blog

Inside the LPU
Deconstructing Groq's Speed
Legacy hardware forces a choice: faster inference with quality degradation, or accurate inference with unacceptable latency. This tradeoff exists because GPU architectures optimize for training workloads. The LPU–purpose-built hardware for inference–preserves quality while eliminating architectural bottlenecks which create latency in the first place.


GPT‑OSS Improvements: Prompt Caching & Lower Pricing

Introducing Remote MCP Support in Beta on GroqCloud

Introducing the Next Generation of Compound on GroqCloud

Introducing Kimi K2‑0905 on GroqCloud

Introducing Prompt Caching on GroqCloud

Day Zero Support for OpenAI Open Models

Inside the LPU: Deconstructing Groq’s Speed

OpenBench: Open, Reproducible Evals

Build Faster with Groq + Hugging Face

GroqCloud™ Now Supports Qwen3 32B

Introducing GroqCloud™ LoRA Fine-Tune Support: Unlock Efficient Model Adaptation for Enterprises

From Speed to Scale: How Groq Is Optimized for MoE & Other Large Models

How to Build Your Own AI Research Agent with One Groq API Call

The Official Llama API, Accelerated by Groq

Now in Preview: Groq’s First Compound AI System

Llama 4 Live Today on Groq — Build Fast at the Lowest Cost, Without Compromise

Build Fast with Text-to-Speech

Groq & Vercel Partner To Make Building Fast and Simple

Batch Processing with GroqCloud™ for AI Inference Workloads

Build Fast with Word-Level Timestamping

A Guide to Reasoning with Qwen QwQ 32B

What is a Language Processing Unit?

Qwen QwQ 32B Running Same Day As Release
1