We introduce the Tensor Streaming Processor (TSP) architecture, a functionally-sliced microarchitecture with memory units interleaved with vector and matrix deep learning functional units. This architecture also takes advantage of dataflow locality of deep learning operations. The TSP is built based on two key observations: (1) machine learning workloads exhibit abundant data parallelism, which can be readily mapped to tensors in hardware, and (2) a deterministic processor with a stream programming model enables precise reasoning and control of hardware components to achieve good performance and power efficiency. The TSP is designed to exploit parallelism inherent in machine-learning workloads including instruction-level, memory concurrency, data and model parallelism. This guarantees determinism by eliminating all reactive elements in the hardware, for example, arbiters and caches. Early ResNet50 image classification results demonstrate 20.4K processed images per second with a batch size of one. This is a 4X improvement compared to other modern GPUs and accelerators. Our first ASIC implementation of the TSP architecture yields a computational density of more than 1 TOp/s per square mm of silicon. This TSP is a 25x29mm 14nm chip operating at a nominal clock frequency of 900MHz. The TSP also demonstrates a novel hardware-software approach to achieve fast yet predictable performance on machine-learning workloads within a desired power envelope. This architecture can be deployed across a broad range of applications, from ML to HPC within the datacenter, where low-latency and high-throughput are critical facets of total cost of ownership.
Menu
Announcements
News
Read:
Explore: Groq at SC22
Latest Benchmarks: Groq and STAC Release Financial Services Industry Benchmarks
In the News: Forbes, Cybersecurity Is Entering The High-Tech Era
Posts
What does driving the cost of compute mean? @GroqInc CEO and founder @JonathanRoss321 explains to @danielnewmanUV, the key lies in our performance. “When we’re delivering 200X, 600X, 1000X the performance, we’re giving customers 200, 600, 1000 times the performance per dollar.”
On this episode [email protected], @DanielNewmanUV is joined by @GroqInc CEO, Jonathan Ross, for a conversation about the future of #AI accelerators and how this #technology can help organizations reduce uncertainty.
https://buff.ly/40sqPk1
@FuturumResearch