The Next Generation of Computing is here.
Building the computer for the next generation of high performance machine learning. Groq hardware is designed to be both high performance and highly responsive. Groq’s new simplified architecture drives incredible performance at batch size 1. Whether you have one image or a million, Groq hardware responds faster.
Compute more. Consume Less.
Groq’s superior architecture gives you more compute cycles per server, batch 1 at max performance, and zero overhead in context switching. This gives you blazing fast compute with 50% less energy than the nearest competitor — driving down total cost of ownership while reducing your CO 2 footprint.
Groq hardware has the fastest ResNet-50 performance of any commercially available hardware; so you can perform over 400,000 multiplications before one byte is retrieved from memory on a GPU.
Only Groq architecture provides information about power and performance at compile time. What does that mean? Groq makes fast work out of all kinds of work. No need to waste time profiling your code on hardware. You can optimize using the compiler, know the power consumption and completion time at compile time, limit power to below a certain level, and bound the time taken to execute a model.
You shouldn’t have to choose between
performance and responsiveness.
Until now, every accelerator required a tradeoff between fastest response and maximum performance. Not anymore.
See for yourself how Groq saves in costs up front — and maintenance down the road — by giving incredible
performance with fewer servers.
Number of Servers Deploying at Max Performance
Move the slider to see how small changes in the percentage of your workload responsiveness (i.e. minimum latency) make big changes in the number of servers deployed for other architectures.
% of Responsive Workload