The Next Generation of Computing is here.

Building the computer for the next generation of high performance machine learning. Groq hardware is designed to be both high performance and highly responsive. Groq’s new simplified architecture drives incredible performance at batch size 1. Whether you have one image or a million, Groq hardware responds faster.


Compute more. Consume Less.

Groq’s superior architecture gives you more compute cycles per server, batch 1 at max performance, and zero overhead in context switching. This gives you blazing fast compute with 50% less energy than the nearest competitor — driving down total cost of ownership while reducing your CO 2 footprint.


Powerfully Predictable.

Groq hardware has the fastest ResNet-50 performance of any commercially available hardware; so you can perform over 400,000 multiplications before one byte is retrieved from memory on a GPU.
Only Groq architecture provides information about power and performance at compile time. What does that mean? Groq makes fast work out of all kinds of work. No need to waste time profiling your code on hardware. You can optimize using the compiler, know the power consumption and completion time at compile time, limit power to below a certain level, and bound the time taken to execute a model.

You shouldn’t have to choose between
performance and responsiveness.

Until now, every accelerator required a tradeoff between fastest response and maximum performance. Not anymore.
See for yourself how Groq saves in costs up front — and maintenance down the road — by giving incredible
performance with fewer servers.

Number of Servers Deploying at Max Performance

Move the slider to see how small changes in the percentage of your workload responsiveness (i.e. minimum latency) make big changes in the number of servers deployed for other architectures.


Notice that while it can take upto 162* of our competitors servers to reach maximum performance and responsiveness, Groq does it with 10*. (* 2 cards per server)

Learn More



% of Responsive Workload

See the proof behind the claim here.


Be part of the future of simplified, high performance compute.

Read more @GroqInc on twitter