Artificial intelligence (AI) is one of the most hyped buzzwords in technology today. But while everyone may be talking about AI, a much smaller number of people are successfully doing AI. 

According to a McKinsey report cited by Forbes, nearly three-quarters of over 2,000 organizations surveyed expect to increase investments in AI in the future, but just one in five respondents claimed they had already successfully rolled out AI in more than one process.

In large part, this slow uptake of AI is due to the extreme difficulty of achieving and maintaining the high-performance processing that AI workloads require. 

Why is inference so hard?

In part, the challenge of achieving high-performance compute processing involves managing rapidly increasing volumes of data. Data scientists estimate that the volume of data is doubling every two years, and will reach 44 zettabytes by 2020 – in other words, there will be more than 40 times more bytes of data than there are stars in the observable universe.

In addition, to meet human-like inference performance with neural networks will require exponential increases in model complexity and computing throughput. However, achieving faster, more efficient neural net processing won’t come from scaling up the number of physical processors, as investments in traditional server clusters are reaching a computational cost wall. 

Meanwhile, standard computing architectures like CPUs and GPUs are crowded with hardware features and elements that offer no benefit to inference performance. To perform more and more operations per second, chips have become larger and much more complex, with multiple cores, multiple threads, on-chip networks, and complicated control circuitry. To accelerate software performance and output, developers of machine learning models struggle with complicated programming models, security problems, and loss of visibility into compiler control due to layers of processing abstraction. To yield higher machine learning performance within these constraints relies on laborious hand-tuning optimization that is based on intimate knowledge of the hardware architecture.

Some chipmakers seek to address these computational challenges with ever more complex chips. Meanwhile, inference has reached a bottleneck.

Grog has a different, much simpler solution.

Groq introduces a new processing architecture designed specifically for the performance requirements of machine learning applications and other compute-intensive workloads. Groq’s chip design reduces the complexity of the traditional hardware-focused development, so developers can focus on algorithms (or solving other problems) instead of adapting their solutions to the hardware. 

Inspired by a software-first mindset, Groq’s overall product architecture provides an innovative and unique approach to accelerated computation. In Groq’s architecture, the compiler choreographs the operation of the hardware.

This software-defined architecture approach enables Groq to leap-frog the constraints of chips designed using traditional, hardware-focused architectural models. All execution planning happens in software, freeing up valuable silicon real estate and providing additional memory bandwidth and transistors for performance.

Groq’s simpler design also eliminates the cores and other hardware that crowd traditional chip architectures with “dark silicon” – areas that are dedicated to functionalities that offer no processing advantage for AI or machine learning.

When it comes to performance and efficiency of machine learning and compute intensive processing, Groq believes simplicity will win, because it leads to a more streamlined architecture that delivers greater throughput and greater ease of use, providing a much better overall solution for both developers and customers.

The result is a design architecture that is simpler to use and capable of much higher performance. Because of its simplicity of design, Groq is positioned to be the only hardware and software solution with a sustainable performance advantage beyond the limitations of process scaling.