09/04/2025 · Benjamin Klieger

Introducing the Next Generation of Compound on GroqCloud

Our agentic AI system, rolling out in general availability


Today we’re announcing that Compound Beta, Groq’s first agent and compound AI system, is moving to general availability as Compound on GroqCloud. With Compound, developers can integrate agentic AI which can conduct research, execute code, control browsers, navigate the web on their behalf. Compound from Groq uniquely delivers leading quality and low latency at low cost. Now in general availability, developers can expect production-grade stability and increased rate limits.

Since the launch of beta, we’ve seen more than 100k developers use Compound generating more than 5M requests and thousands of active customers. With their feedback, we have improved Compound, shifting the vision from tool-enabled models towards an agentic operator.

New Version Available

We’re also releasing a new version that is our smartest, most capable system yet. This includes improved tooling, prompting, and models to deliver ~25% higher accuracy and ~50% fewer mistakes across SimpleQA (n=250) and RealtimeEval (n=257), boosting Compound to frontier performance on both benchmarks, surpassing both OpenAI’s Web Search Preview and Perplexity Sonar*.

This image is a graph that shows Compound performance on Humaneval run with epochs=3.
Figure 1: Compound performance on Humaneval run with epochs=3.
This image shows a graph of Compound performance on RealtimeEval which benchmarks real-time information factuality. Accuracy is measured as the F-score from the first and only run of each system with default settings.
Figure 2: Compound performance on RealtimeEval which benchmarks real-time information factuality. Accuracy is measured as the F-score from the first and only run of each system with default settings.

The new version upgrades Compound’s default models to leverage the new frontier open source intelligence from OpenAI’s gpt-oss-120b in combination with Llama models. Compound is also upgraded with new and more advanced tools, including the ability to get contents from a specific web page, the ability to leverage Wolfram Alpha’s knowledge base and intelligence engine, and the ability to, in parallel, spin up and control up to 10 web browsers at a time to visit search result websites to gather more content for better answer generation.

"Compared to alternatives like Perplexity’s Sonar API and Gemini’s Grounded Search, Compound has been both more powerful and far more cost-effective. Compound has been key in enabling us to move beyond simple entity recognition to building enriched, context-aware knowledge graphs that connect our user’ content in powerful ways."

Paul Richards, CEO, getrecall.ai, Product Hunt #1 Product of the Day, trusted by 400k+ customers

This version is production‑ready and will become the default on October 1st, 2025. You can try it today by setting the Groq-Model-Version header to "latest".

Here’s everything you need to know: how Compound works, new features, pricing, and how to start building today.

What is Compound?

Compound blends openly available foundation models (the same models that power Groq’s other services) with built‑in, server‑side tool use. The result is a single, low‑cost API call that can:

  • Search the web for real‑time information
  • Execute code or run Wolfram‑Alpha calculations on the fly
  • Spin up multiple browsers in parallel to harvest data from several sources

All of this happens inside Groq’s high‑throughput inference engine, so you get high‑performance, low‑latency responses without any client‑side orchestration.

"Using Compound for fast research at high volume produces superior results, faster, especially when the end user is waiting on an answer. Compound is more efficient than building other solutions and saves us the engineering cycles of having to build out the tooling ourselves, which takes a lot of time!"

Andre Dean Smith, Co-Founder and CEO of ScreenApp, AI-Powered Meeting Assistant with over 1M users

How does Compound work? You send a single request, just the user query and any optional context.

  1. Compound’s orchestration layer decides which tools (web search, Wolfram, code exec, browsers) are needed and how many times to call them.
  2. All tool calls run server‑side on Groq’s inference fleet, keeping latency low and eliminating client‑side coordination.

The model iteratively consumes tool outputs, refines its reasoning, and finally returns a single, polished answer.

“With Groq’s Compound, we can build agentic search workflows that deliver highly curated results unmatched by traditional search engines. We use Compound because it can get AI Search done fast and efficiently with data privacy.”

Tom Bendien, Founder and CEO of GT Edge AI

What’s New In the Latest Version

New Capability Benefits for Developers

Advanced Search – enables more context pulled from web search results

Adds richer context, boosting answer accuracy and depth

Visit Website Ability – retrieve the contents from a specific URL

Provides deeper context, improving answer quality on niche topics

Parallel Browser Automation – up to 10 browsers are launched and controlled simultaneously to get deeper search results

Gathers richer evidence sets, cutting hallucinations and boosting factuality

Wolfram‑Alpha Integration – invoke Wolfram’s computational knowledge engine

Solves math, scientific, and engineering problems that require precise computation

OSS 120B Model – upgraded default answering model to gpt-oss-120b

Increases output quality and instruction following abilities

Enhanced Markdown Rendering – richer, structured output via the OSS model

Makes downstream consumption (docs, dashboards, chat UIs) more compelling

Versioning – developers can lock to a specific Compound version

Provides stability while still letting you test new features on a convenient schedule

By the Numbers

Benchmarks include RealtimeEval, a benchmark designed to evaluate tool‑using systems on current events and live data, and benchmarks like HumanEval and SimpleQA. Both HumanEval and SimpleQA can be run within OpenBench, a provider‑agnostic, open‑source evaluation infrastructure for language models, and RealtimeEval is coming to OpenBench soon.

A graph that shows performance on RealtimeEval which benchmarks real-time information factuality.
Figure 3: Performance on RealtimeEval which benchmarks real-time information factuality.
A graph that shows Compound performance on the first 250 samples of SimpleQA as set by Openbench with limit=250, which benchmarks ability to search for information.
Figure 4: Compound performance on the first 250 samples of SimpleQA as set by Openbench with limit=250, which benchmarks ability to search for information.
A graph that shows Compound performance on Humaneval run with epochs=3.
Figure 5: Compound performance on Humaneval run with epochs=3.

Pricing & Billing

Component Current GA priceAfter September 15th

Model tokens

Charged per token used by the underlying model (same rates as existing Groq models)

No change - Still charged per token used by the underlying model (same rates as existing Groq models)

Tool usage

Free during beta

Starting September 15 th 2025, we will charge a flat fee per tool invocation (web‑search, browser automation, and code execution). Detailed pricing is on our pricing page.

Getting Started

We’re excited to see how developers will combine Compound’s autonomous tool use with Groq’s ultra‑fast inference. Whether you’re building a research assistant, a financial‑analysis bot, or a next‑gen customer‑support agent, Compound now gives you a production‑grade, single‑call solution.

Start building today – the new version of Compound, now Generally Available, is live on GroqCloud.