Why Groq

Groq is on a mission to set the standard for GenAI inference speed, helping real-time AI applications come to life today.

FAQts

An LPU Inference Engine, with LPU standing for Language Processing Unit™, is a new type of end-to-end processing unit system that provides the fastest inference for computationally intensive applications with a sequential component to them, such as AI language applications (LLMs).

The LPU is designed to overcome the two LLM bottlenecks: compute density and memory bandwidth. An LPU has greater compute capacity than a GPU and CPU in regards to LLMs. This reduces the amount of time per word calculated, allowing sequences of text to be generated much faster. Additionally, eliminating external memory bottlenecks enables the LPU Inference Engine to deliver orders of magnitude better performance on LLMs compared to GPUs.

For a more technical read about our architecture, download our ISCA-awarded 2020 and 2022 papers. 

Groq supports standard machine learning (ML) frameworks such as PyTorch, TensorFlow, and ONNX for inference. Groq does not currently support ML training with the LPU Inference Engine.

For custom development, the GroqWare™ suite, including Groq Compiler, offers a push-button experience to get models up and running quickly. For optimizing workloads, we offer the ability to hand code to the Groq architecture and fine-grained control of any GroqChip™ processor, enabling customers the ability to develop custom applications and maximize their performance.

We’re excited you want to get started with Groq. Here are some of the fastest ways to get up and running:

  • GroqCloud: Request API access to run LLM applications in a token-based pricing model
  • Groq Compiler: Compile your current application to see detailed performance, latency, and power utilization metrics. Request access via our Customer Portal.
  • Interested in purchasing on-prem hardware? Contact us

The Latest on Groq

Get up to speed on all the Groq happenings.

Interested in covering Groq? Reach out to our PR team.

NEWS
EVENTS

Game Changing Tech

Groq created and offers the first LPU™ Inference Engine.
We are the only provider who has created a “sand to sky” solution – from the silicon to the cloud and everything in-between. Architected from the ground up, our solutions are built for precise, energy-efficient, and repeatable inference performance at scale.

GroqCloud™

GroqCloud is powered by a scaled network of Language Processing Units.

Leverage popular open-source LLMs like Meta AI’s Llama 2 70B, running up to 18x faster than other leading providers.

GroqRack™

The backbone of low latency, large-scale deployments.

  • 42U rack with up to 64 interconnected chips
  • End-to-end latency of only 1.6µs
  • Near-linear multi-server and multi-rack scalability without the need for external switches
  • 35kW max power consumption

Connect with Sales to learn about on-prem solutions

GroqNode™

Unprecedented low latency meets uncompromised scalability.

  • 4U rack-ready scalable compute system featuring eight interconnected GroqCard™ accelerators
  • 4kW max power consumption

GroqCard™

Ready, set, done. Guaranteed low latency.

  • A single chip in a standard PCIe Gen 4×16 form factor providing hassle-free server integration
  • 375W max power consumption. 240W average power consumption

Available through our partner Bittware.

Language Processing Unit

A new class of processor for a new class of workloads.
  • Exceptional sequential performance
  • Single core architecture
  • Synchronous networking that is maintained in large-scale deployments
  • Ability to auto-compile >50B LLMs
  • Instant memory access
  • High accuracy that is maintained even at lower precision level
Simplified LPU architecure