@rowancheung Hey @rowancheung, another competitive difference is responsiveness. LLMs run faster on Groq®’s LPU™ chips than any other hardware, so if you want better answers fast let @elonmusk know that you want @xai to run at #GroqSpeed. That, or you can wait, and wait, and wait for them to…



Performance, predictability, scalability, and accuracy are the DNA of Groq’s entire product suite.

GroqChip Processor


The revolutionary, fully deterministic GroqChip processor is the core of scalable performance. Built from the ground up to accelerate AI, ML, and HPC workloads, GroqChip was designed to reduce data movement for predictable low-latency performance, bottleneck-free. Featuring 16 chip-to-chip interconnects and 230MB of SRAM, this standalone chip provides flexible integration into embedded applications.


Up to 750 TOPs, 188 TFLOPs (INT8, FP16 @900 MHz)
GroqChip 1 2
GroqChip 2 2
GroqChip 3 2
GroqChip 4 2
GroqChip 5 2
GroqChip 6 2

GroqCard Accelerator


For plug and play low latency, scalable performance, GroqCard accelerator packages a single GroqChip™ into a standard PCIe Gen4 x16 form factor providing hassle-free server integration. Featuring up to 11 RealScale™ chip-to-chip connections alongside an internal software-defined network, GroqCard enables near-linear multi-server and multi-rack scalability without the need for external switches.


Up to 750 TOPs, 188 TFLOPs (INT8, FP16 @900 MHz)
GroqCard 1 2
GroqCard 2 2
GroqCard 3 2
GroqCard 4 2
GroqCard 5 2

GroqNode Server


For large scale deployments, GroqNode server provides a rack-ready scalable compute system. The eight GroqCard™ set features integrated chip-to-chip connections alongside dual server-class CPUs and up to 1TB of DRAM in a 4U server chassis, GroqNode is built to enable high performance and low latency deployment of large deep learning models.


Up to 6 POPs, 1.5 PFLOPs (INT8, FP16 @900MHz)
GroqNode 1 2
GroqNode 2 2
GroqNode 3 2
GroqNode 4 2
GroqNode 5 2

GroqRack Compute Cluster


For data center deployments, GroqRack provides an extensible accelerator network. Combining the power of an eight GroqNode™ set, GroqRack features up to 64 interconnected chips. The result is a deterministic network with an end-to-end latency of only 1.6µs for a single rack, ideal for massive workloads and designed to scale out to an entire data center.


Up to 48 POPs, 12 PFLOPs (INT8, FP16 @900MHz)
Groq Product Graphics 1 GroqRack 23
GroqRack 1 2
GroqRack 2 2
GroqRack 3 2
GroqRack 4 2

GroqWare Suite


The foundation of our software-defined hardware approach is the GroqWare™ suite. Groq Compiler, Groq API, and Utilities make up the extremely versatile software stack made to efficiently run a wide array of HPC and ML workloads. Groq Compiler, co-developed with the TSP architecture, is an efficient and flexible tool to deploy state of the art deep learning Models trained in PyTorch, TensorFlow, and ONNX. Groq API provides customers granular control of GroqChip. Finally, Groq provides utility tools such as GroqView™ profiler and runtime to not only enhance the developer workflow, but simplify it all together.

GroqWare 1 2
GroqWare 2 2
GroqWare 3 2
GroqWare 4 2
GroqWare 5 2
Play Video