Performance, predictability, scalability, and accuracy are the DNA of Groq’s entire product suite.
The revolutionary, fully deterministic GroqChip processor is the core of scalable performance. Built from the ground up to accelerate AI, ML, and HPC workloads, GroqChip was designed to reduce data movement for predictable low-latency performance, bottleneck-free. Featuring 16 chip-to-chip interconnects and 230MB of SRAM, this standalone chip provides flexible integration into embedded applications.
For large scale deployments, GroqNode server provides a rack-ready scalable compute system. The eight GroqCard™ set features integrated chip-to-chip connections alongside dual server-class CPUs and up to 1TB of DRAM in a 4U server chassis, GroqNode is built to enable high performance and low latency deployment of large deep learning models.
GroqRack™ Compute Cluster
For data center deployments, GroqRack provides an extensible accelerator network. Combining the power of an eight GroqNode™ set, GroqRack features up to 64 interconnected chips. The result is a deterministic network with an end-to-end latency of only 1.6µs for a single rack, ideal for massive workloads and designed to scale out to an entire data center.
The foundation of our software-defined hardware approach is the GroqWare™ suite. Groq Compiler, Groq API, and Utilities make up the extremely versatile software stack made to efficiently run a wide array of HPC and ML workloads. Groq Compiler, co-developed with the TSP architecture, is an efficient and flexible tool to deploy state of the art deep learning Models trained in PyTorch, TensorFlow, and ONNX. Groq API provides customers granular control of GroqChip. Finally, Groq provides utility tools such as GroqView™ profiler and runtime to not only enhance the developer workflow, but simplify it all together.