Today the Linley Group released its latest Microprocessor Report titled “Groq Rocks Neural Networks”, which concludes that Groq’s “TSP stands out in both peak performance and ResNet-50 throughput,” and that “Groq’s [deep-learning] accelerator is the fastest available on the merchant market.” The Linley Group’s report provides the most detailed overview of the novel Groq architecture available to date. You can download a copy of the Microprocessor Report below.
In the few weeks since our interview with the Linley Group, we’ve been able to improve the performance of our ResNet-50 v2 implementation. The TSP can now reach 21,700 IPS (core compute) for Resnet-50 running at 900 MHz. Groq hardware clocks in at 18,900 IPS on real data, including I/O, with a latency of 0.05 msecs at batch size 1. Groq’s level of inference performance exceeds that of other commercially available neural network architectures, with throughput that more than doubles the ResNet-50 score of the incumbent GPU-based architecture. For real-time workloads which are sensitive to response time and rely on small batches, the TSPs batch size 1 performance is up to 17x faster than competing architectures.
With the Groq architecture providing a substantial performance advantage over GPU-based solutions, engineering managers can deploy machine learning platforms that offer twice the inference performance without doubling infrastructure costs. Reducing the number of deployed systems will lower power usage, save datacenter space, and significantly decrease system complexity.
To learn more about Groq, sign up for our mailing list at https://groq.com/contact/.
ResNet-50 is an inference benchmark for image classification, and ResNet-50 v1.5 is part of a suite of MLPerf standards for measuring performance of machine learning accelerators.