Groq Supercharges Fast AI Inference For Meta Llama 3.1

Written by:
Groq
New Largest and Most Capable Openly Available Foundation Model 405B Running on GroqCloud™

Mountain View, Calif. – July 23, 2024 – Groq, a leader in fast AI inference, launched Llama 3.1 models powered by its LPU™ AI inference technology. Groq is proud to partner with Meta on this key industry launch, and run the latest Llama 3.1 models, including 405B Instruct, 70B Instruct, and 8B Instruct, at Groq speed. The three models are available on GroqCloud Dev Console, a community of over 300K developers already building on Groq® systems, and on GroqChat for the general public.

Mark Zuckerberg, Founder & CEO, Meta, shared:

“I’m really excited to see Groq’s ultra-low-latency inference for cloud deployments of the Llama 3.1 models. This is an awesome example of how our commitment to open source is driving innovation and progress in AI. By making our models and tools available to the community, companies like Groq can build on our work and help push the whole ecosystem forward.” 

The Llama 3.1 models are a significant step forward in terms of capabilities and functionality. As the largest and most capable openly available Large Language Model to date, Llama 3.1 405B rivals industry-leading closed-source models. For the first time, enterprises, startups, researchers, and developers can access a model of this scale and capability without proprietary restrictions, enabling unprecedented collaboration and innovation. With Groq, AI innovators can now tap into the immense potential of Llama 3.1 405B running at unprecedented speeds on GroqCloud to build more sophisticated and powerful applications. 

With Llama 3.1, including 405B, 70B, and 8B Instruct models, the AI community gains access to increased context length up to 128K and support across eight languages. Llama 3.1 405B is in a class of its own, with unmatched flexibility, control, and state-of-the-art capabilities in general knowledge, steerability, math, tool use, and multilingual translation. Llama 3.1 405B will unlock new capabilities, such as synthetic data generation and model distillation, and deliver new security and safety tools to further the shared Meta and Groq commitment to build an open and responsible AI ecosystem.

With unprecedented inference speeds for large openly available models like Llama 3.1 405B, developers are able to unlock new use cases that rely on agentic workflows to provide a seamless, yet personalized, human-like response for use cases such as: patient coordination and care; dynamic pricing by analyzing market demand and adjusting prices in real-time; predictive maintenance using real-time sensor data; and customer service by responding to customer inquiries and resolving issues in seconds.

GroqCloud has grown to over 300,000 developers in five months, underscoring the importance of speed when it comes to building the next generation of AI-powered applications at a fraction of the cost of GPUs.

To experience Llama 3.1 models running at Groq speed, visit groq.com, and learn more about this launch from Groq and Meta.

About Groq

Groq builds fast AI inference technology. Groq® LPU™ AI inference technology is a hardware and software platform that delivers exceptional AI compute speed, quality, and energy efficiency. Groq, headquartered in Silicon Valley, provides cloud and on-prem solutions at scale for AI applications. The LPU and related systems are designed, fabricated, and assembled in North America. Try Groq speed at groq.com.

Media Contact

Megan Hartwick
[email protected]

Never miss a Groq update! Sign up below for our latest news.

The latest Groq news. Delivered to your inbox.

The latest Groq Developer news. Delivered to your inbox.