08/20/2025 · Groq

Introducing Prompt Caching on GroqCloud

Fast, Low Cost, and Seamless AI Inference for Repetitive Workloads

Prompt caching is rolling out on GroqCloud, starting with Kimi K2-Instruct. It works by reusing computations for prompts that start with the same prefix, so developers only pay full price for the differences. The result is a 50% cost savings on cached tokens and dramatically faster response times, with no code changes required.

Ideal for chatbots, retrieval augmented generation, code assistants, and any workflow with stable, reusable prompt components, prompt caching works automatically on every API request, making your AI workflows faster and cheaper right out of the box.

Why Prompt Caching Matters

Instant Speed‑Ups

  • Reduced latency for any request that shares an identical token prefix with a recent request.

50% Token‑Cost Savings

  • All input tokens in the identical prefix get a 50% discount; the tokens after the first difference between prompts are charged at full price.

Perfect for Repetitive Workflows

  • System instructions, tool definitions, and few‑shot examples are re‑used across calls.

Ideal for chatbots, retrieval augmented generation, code assistants, and any workflow with stable, reusable prompt components.

How It Works (No Code Changes Required)

  1. Prefix Matching: The system identifies matching prefixes from recent requests. Prefixes can include system prompts, tool definitions, few-shot examples, and more. Note: prefixes can only match up to the first difference, even if later parts of the prompt are the same!
  2. Cache Hit: If a matching prefix is found, cached computation is reused, dramatically reducing latency and token costs by 50% for cached portions.
  3. Cache Miss: If no match exists, your prompt is processed normally, with the prefix temporarily cached for potential future matches.
  4. Automatic Expiration: All cached data automatically expires within a few hours.

No code changes, no new SDK calls. Once rolled out, prompt caching works automatically starting first with moonshotai/kimi‑k2‑instruct with support for additional models coming soon.

Pricing

Note: No extra fee for the caching feature itself. The discount only applies when a cache hit occurs.
ModelUncached Input Tokens (Per Million Tokens)Cached Input Tokens (Per Million Tokens)Output Tokens (Per Million Tokens)

moonshotai/kimi‑k2‑instruct

$1.00

$0.50

$3.00

Ready to Cut Latency and Slash Token Costs?

Start experimenting with prompt caching today on GroqCloud and experience faster and cheaper AI workflows right out of the box.

To learn more about prompt caching and best practices check out our developer documentation.