10/21/2019 · Groq

World, Meet Groq

Businesses and governmental entities are increasingly turning to compute-intensive applications, such as machine learning and artificial intelligence (AI), to enhance the experiences of customers, increase competitive advantage, and improve security and safety in communities. However, achieving and maintaining the high-performance processing that these workloads require is extremely difficult, largely due to the growing complexity of hardware processor models.

To gain the benefits of AI, predictive intelligence and smart infrastructure will require a much simpler and more scalable processing architecture that can sustainably accelerate the performance of compute intensive workloads. At Groq, we believe a less complex chip design is the answer.

A simpler processing architecture

The current complexity of processor architectures is the primary inhibitor that slows developer productivity and hinders the adoption of AI applications and other compute-heavy workloads. Current processor complexity decreases developer productivity. Moore’s law is slowing, making it harder to deliver ever-greater compute performance.

Groq is introducing a new, simpler processing architecture designed specifically for the performance requirements of machine learning applications and other compute-intensive workloads. The simpler hardware also saves developer resources by eliminating the need for profiling, and also makes it easier to deploy AI solutions at scale.

Groq is taking bold steps to develop software and hardware products that defy conventional approaches. Our vision of a simpler, high-performance architecture for machine learning and other demanding workloads is based on three key areas of technology innovation:

Software-defined hardware. Inspired by a software-first mindset, Groq’s chip architecture provides a new processing paradigm in which the control of execution and data flows is moved from the hardware to the compiler. All execution planning happens in software, freeing up valuable silicon space for additional processing capabilities.This approach allows Grog to fundamentally bypass the constraints of traditional, hardware-focused architectural models.
Silicon innovation: Groq’s simplified architecture removes extraneous circuitry from the chip to achieve a more efficient silicon design with more performance per square millimeter. This eliminates the need for caching, core-to-core communication, speculative and out-of-order execution. Higher compute density is achieved by increasing total cross-chip bandwidth and a higher percentage of total transistors used for computation.
Maximizing developer velocity: The simplicity of the Groq system architecture eliminates the need for hand optimization, profiling and the specialized device knowledge that dominates traditional hardware-centric design approaches. Groq instead focuses on the compiler, enabling software requirements to drive the hardware specification. At compile time, developers know memory usage, model efficiency and latency, thereby simplifying production and speeding deployment. This results in a better developer experience with push-button performance, allowing users to focus on their algorithm and deploy solutions faster.

Groq products provide the flexibility to quickly adapt to the diverse, real-world set of computations required to build the next generation of compute technologies. By simplifying the deployment and execution of machine learning, Groq makes it possible to extend the advantages of AI applications and insights to a much broader audience. The entire system – the software and hardware – substantially simplifies and improves the experience for all who use Groq’s technology.

Groq is ideal for deep learning inference processing for a wide range of AI applications, but it is critical to understand that the Groq chip is a general-purpose, Turing-complete, compute architecture. It is an ideal platform for any high-performance, low latency, compute-intensive workload.