Apr 29, 2025

Official Llama API Now Fastest via Groq Inference

The official Llama API is now accelerated by Groq. Served on the world’s most efficient inference chip, it’s the fastest way to run the world’s most trusted openly available models with no tradeoffs. In collaboration with Meta, a limited free preview is live now.

What Is It?

The official Llama API is an upcoming developer platform accelerated by Groq. It's not a wrapper. It’s not a copy. It’s the real thing, served directly from Meta and accelerated by Groq's purpose-built inference hardware. With Llama 4 and more available now, you can start building with zero setup.

Why it Matters to Builders

The Llama API, accelerated by Groq, offers several benefits to builders, including:

First-party access: You're using Meta models, served the way they were meant to be. Optimized, up-to-date, and fully integrated into the Llama roadmap.
Performance you can trust: No guessing. No degraded replicas. Just official models with consistent quality and performance, predictable latency, and reliable support.
Ready for scale: Whether you're building a weekend project or an enterprise-grade product, Groq infrastructure is fast, reliable, and offered at a price point that won't skyrocket when your user base does.
More than an endpoint, it's an edge: You get the latest models, faster responses, lower costs, and inference built for real-world AI applications.

Headache-free Migration

Migrating to the official Llama API, accelerated by Groq, is easy. No new libraries or SDKs are required. Just change three lines of code to turbocharge your app.

For example, if you're currently using a different API, you can simply update your code to use the official Llama API:

1from openai import OpenAI
2
3client = OpenAI(
4     api_key=LLAMA_API_KEY,
5     base_url="https://api.llama.com/compat/v1/"
6)

Private & Secure

The official Llama API offers the same privacy and security users have come to expect from Groq. The security and privacy of your content and data is a top priority for both Meta and Groq. The official Llama API does not use your prompts or model responses to train Meta AI models. When you’re ready, the models you build on the official Llama API are yours to take with you wherever you want to host them – we don’t keep them locked on our servers. These features will be first introduced to select customers, with plans for a broader rollout in the coming weeks and months – opening up new possibilities for developers to build custom models for every kind of use case.

Additional Supported Features

No code Playground
Streaming
Tool calling
JSON structured output
Python and Typescript SDKs
OpenAI endpoint compatibility
Built in moderation via safeguard models

Getting Started

To get started with the official Llama API, accelerated by Groq, simply request early experimental access at llama.developer.meta.com. Once approved, you can just select the Groq model names in the API and get a streamlined experience with all usage tracked in one location.

We're excited to bring this powerful combination to the developer community, and we can't wait to see what you'll build with the official Llama API, accelerated by Groq.