Apr 05, 2025

Llama 4 Inference Fast & Affordable – Now Live on GroqCloud

Meta’s Llama 4 Scout and Maverick models are live today on GroqCloud™, giving developers and enterprises day-zero access to the most advanced open-source AI models available.

Today, Meta released the first models in the Llama 4 herd, which will enable people to build more personalized multimodal experiences. With Llama 4 Scout and Llama 4 Maverick available on GroqCloud today to its free users and paid customers, developers can run cutting-edge multimodal workloads while keeping costs low and latency predictable.

https://youtu.be/Eq0rl6B1i5Y

Groq Performance & Pricing

Our vertically integrated GroqCloud and inference-first architecture deliver unmatched performance and price. With Llama 4 models, developers can run cutting-edge multimodal workloads while keeping costs low and latency predictable.

Llama 4 Scout is currently running at over 460 tokens/s while Llama 4 Maverick is coming today. Stay tuned for official 3rd party benchmarks from Artificial Analysis.

Groq is offering the first of the Llama 4 model herd at the following pricing:

Llama 4 Scout: $0.11 / M input tokens and $0.34 / M output tokens
Llama 4 Maverick: $0.50 / M input tokens and $0.77 / M output tokens

About Llama 4

The new Llama 4 models are Meta’s first models that use a Mixture of Experts (MoE) architecture. In MoE models, a single token activates only a fraction of the total parameters. MoE architectures are more compute efficient for model training and inference and, given a fixed training FLOPs budget, deliver higher quality models compared to dense architectures.

Llama 4 models are designed with native multimodality, incorporating early fusion to seamlessly integrate text and vision tokens into a unified model backbone.

Meta aims to develop the most helpful, useful models for developers while protecting against and mitigating the most severe risks. This includes integrating mitigations at each layer of model development from pre-training to post training and tunable system-level mitigations that shield developers from adversarial users. In doing so, Meta is helping empower developers to create helpful, safe, and adaptable experiences for their Llama supported applications.

Llama 4 Scout & Maverick

These latest Llama models from Meta include smaller and larger options to accommodate a range of use cases and developer needs.

Llama 4 Scout is a leading multimodal model and is more powerful than the Llama 3 models. It contains 17 billion active parameters, 16 experts, and 109 billion total parameters; it delivers state-of-the-art performance for its class.

Llama 4 Maverick contains 17 billion active parameters, 128 experts, and 400 billion total parameters, offering high quality at a lower price compared to Llama 3.3 70B. It offers unparalleled, industry-leading performance in image and text understanding with support for 12 languages, enabling the creation of sophisticated AI applications that bridge language barriers. As the workhorse model for general assistant and chat use cases, Llama 4 Maverick is great for precise image understanding and creative writing. For developers, it offers state-of-the-art intelligence with high speed, optimized for best response quality on tone, and refusals.

Build Fast with Llama 4 on GroqCloud

Try Llama 4 via GroqChat, the GroqCloud Developer Console as well as API calls.

Start building today on GroqCloud – sign up for free access here or scale without rate limits by upgrading to a GroqCloud paid tier.