Announcements

Posts

@GroqInc will attend the #AISummit #NewYork this December 7-8.🙌

Why not meet them at booth 119 during the exhibition to discover what their products have to offer, including their GroqChip Processor and GroqCard Accelerator.

Find out who’s attending: http://spr.ly/6018M0ivp

We are thankful for all the Groq stars and the amazing future we are building together and for others. Happy Thanksgiving!
#Thanksgiving #Thanksgiving2022

Insights

Groq at ISCA 2022

Written by:
Groq
ISCA Paper 2022

Startup Presents Second Paper on Novel Approach to Large-scale Machine Learning

In June 2022, Groq presented its second paper in three years, A Software-defined Tensor Streaming Multiprocessor for Large-Scale Machine Learning, at the 2022 International Symposium on Computer Architecture (ISCA). Want to learn more? 

  • Read the abstract below
  • Download the paper
  • Watch the paper’s overview presentation by Dennis Abts, Groq Chief Architect and Fellow

Abstract

We describe our novel commercial software-defined approach for large-scale interconnection networks of tensor streaming process- ing (TSP) elements. The system architecture includes packaging, routing, and flow control of the interconnection network of TSPs. We describe the communication and synchronization primitives of a bandwidth-rich substrate for global communication. This scalable communication fabric provides the backbone for large-scale systems based on a software-defined Dragonfly topology, ultimately yielding a parallel machine learning system with elasticity to sup- port a variety of workloads, both training and inference. We extend the TSP’s producer-consumer stream programming model to include global memory which is implemented as logically shared, but physically distributed SRAM on-chip memory. Each TSP contributes 220 MiBytes to the global memory capacity, with the maximum capacity limited only by the network’s scale — the maximum number of endpoints in the system. The TSP acts as both a processing element (endpoint) and network switch for moving tensors across the communication links. We describe a novel software-controlled networking approach that avoids the latency variation introduced by dynamic contention for network links. We describe the topology, routing and flow control to characterize the performance of the network that serves as the fabric for a large-scale parallel ma- chine learning system with up to 10,440 TSPs and more than 2 TeraBytes of global memory accessible in less than 3 microseconds of end-to-end system latency.

Video

Never miss a Groq update! Sign up below for our latest news.

Groq's latest news delivered to your inbox