Announcements

Posts

We’re thrilled to have @MGKarch on our team to help customers cut through the noise and understand how to solve their biggest #LLM challenges.

We wanted to properly introduce @GroqInc to all of our new followers! 👋
We offer purpose-built inference solutions for real-time #AI at scale. Our hardware & software ecosystem includes the world’s first Language Processing Unit™ system for AI, Groq™ Compiler, and more

Insights

Products

Performance, predictability, scalability, and accuracy are the DNA of Groq’s entire product suite.

GroqChip Processor

Compute

The revolutionary, fully deterministic GroqChip processor is the core of scalable performance. Built from the ground up to accelerate AI, ML, and HPC workloads, GroqChip was designed to reduce data movement for predictable low-latency performance, bottleneck-free. Featuring 16 chip-to-chip interconnects and 230MB of SRAM, this standalone chip provides flexible integration into embedded applications.

Performance

Up to 750 TOPs, 188 TFLOPs (INT8, FP16 @900 MHz)
GroqChip™
GroqChip 1 2
GroqChip 2 2
GroqChip 3 2
GroqChipSVG
GroqChip 4 2
GroqChip 5 2
GroqChip 6 2

GroqCard Accelerator

Accelerate

For plug and play low latency, scalable performance, GroqCard accelerator packages a single GroqChip™ into a standard PCIe Gen4 x16 form factor providing hassle-free server integration. Featuring up to 11 RealScale™ chip-to-chip connections alongside an internal software-defined network, GroqCard enables near-linear multi-server and multi-rack scalability without the need for external switches.

Performance

Up to 750 TOPs, 188 TFLOPs (INT8, FP16 @900 MHz)
GroqCard™
GroqCard 1 2
GroqCard 2 2
GroqCard 3 2
groqcard
GroqCard 4 2
GroqCard 5 2

GroqNode Server

Scale

For large scale deployments, GroqNode server provides a rack-ready scalable compute system. The eight GroqCard™ set features integrated chip-to-chip connections alongside dual server-class CPUs and up to 1TB of DRAM in a 4U server chassis, GroqNode is built to enable high performance and low latency deployment of large deep learning models.

Performance

Up to 6 POPs, 1.5 PFLOPs (INT8, FP16 @900MHz)
GroqNode™
GroqNode 1 2
GroqNode 2 2
GroqNode 3 2
groqnode
GroqNode 4 2
GroqNode 5 2

GroqRack Compute Cluster

Extend

For data center deployments, GroqRack provides an extensible accelerator network. Combining the power of an eight GroqNode™ set, GroqRack features up to 64 interconnected chips. The result is a deterministic network with an end-to-end latency of only 1.6µs for a single rack, ideal for massive workloads and designed to scale out to an entire data center.

Performance

Up to 48 POPs, 12 PFLOPs (INT8, FP16 @900MHz)
Groq Product Graphics 1 GroqRack 23
GroqRack 1 2
GroqRack 2 2
groqrack
GroqRack 3 2
GroqRack 4 2

GroqWare Suite

Simplify

The foundation of our software-defined hardware approach is the GroqWare™ suite. Groq Compiler, Groq API, and Utilities make up the extremely versatile software stack made to efficiently run a wide array of HPC and ML workloads. Groq Compiler, co-developed with the TSP architecture, is an efficient and flexible tool to deploy state of the art deep learning Models trained in PyTorch, TensorFlow, and ONNX. Groq API provides customers granular control of GroqChip. Finally, Groq provides utility tools such as GroqView™ profiler and runtime to not only enhance the developer workflow, but simplify it all together.

GroqWare 1 2
GroqWare 2 2
GroqWare 3 2
groqware
GroqWare 4 2
GroqWare 5 2