
Blog

Inside the LPU
Deconstructing Groq's Speed
Legacy hardware forces a choice: faster inference with quality degradation, or accurate inference with unacceptable latency. This tradeoff exists because GPU architectures optimize for training workloads. The LPU–purpose-built hardware for inference–preserves quality while eliminating architectural bottlenecks which create latency in the first place.
GPT‑OSS Improvements: Prompt Caching & Lower Pricing
Introducing Remote MCP Support in Beta on GroqCloud
Introducing the Next Generation of Compound on GroqCloud
Introducing Kimi K2‑0905 on GroqCloud
Introducing Prompt Caching on GroqCloud
Day Zero Support for OpenAI Open Models
Inside the LPU: Deconstructing Groq’s Speed
OpenBench: Open, Reproducible Evals
Build Faster with Groq + Hugging Face
GroqCloud™ Now Supports Qwen3 32B
Introducing GroqCloud™ LoRA Fine-Tune Support: Unlock Efficient Model Adaptation for Enterprises
From Speed to Scale: How Groq Is Optimized for MoE & Other Large Models
How to Build Your Own AI Research Agent with One Groq API Call
The Official Llama API, Accelerated by Groq
Now in Preview: Groq’s First Compound AI System
Llama 4 Live Today on Groq — Build Fast at the Lowest Cost, Without Compromise
Build Fast with Text-to-Speech
Groq & Vercel Partner To Make Building Fast and Simple
Batch Processing with GroqCloud™ for AI Inference Workloads
Build Fast with Word-Level Timestamping
A Guide to Reasoning with Qwen QwQ 32B
What is a Language Processing Unit?
Qwen QwQ 32B Running Same Day As Release
1