Announcements

Posts

Our fabulous EAs think of fun ways for GroqStars to meet one another, no matter where they live around the world. Speed friendship participants discovered a host of common interests, from cooking and hiking to woodworking and meditation.
#teambuilding #geoagnostic #remotework

Watch @DennisAbts, @GroqInc Chief Architect and Fellow, present on Groq’s #ISCA2022 paper, A Software-defined Tensor Streaming Multiprocessor for Large-Scale Machine Learning: http://youtu.be/mUsBORr-T8E
#WhyGroq #architecture #machinelearning #ml

Insights

Year of the Compiler

Written by
Jeremy Fowers

I’ve had an eye-opening month and I have three stories to share. Groq recently kicked off the release of our “early adopter” SDK, marking Groq Compiler as the primary means of programming GroqChip™ accelerators.

Now, I’ve built my career on kernel optimization – meticulously coding important workloads specifically for targeted hardware. Sometimes Verilog has felt a little too high level for my needs. While I’m happy to attend events here like Groq-a-thon – an all-day Groq hackathon – to try out the early adopter SDK, I’m also confident my habits and experience will send me back to my kernel optimizing ways the next morning.

But change is in the air. First, we get a drop of over 100 LSTM-based models from a customer. Before I can do so much as load up VS Code, my teammate Lev has kicked off a distributed cluster to run Groq Compiler on the entire drop. 

Not only do some compiled programs beat my hand-coded benchmark, but the whole set comes in at an average of 16x speedup over Nvidia A100 and redefines what is possible within the customer’s latency requirement. No kernel engineering needed here.

The next week, another customer sends us a model that mixes LSTM and Transformer layers. I could code this by hand, but I give Groq Compiler first crack at the problem. After a little slicing and dicing of the ONNX file – this compiler is good but not perfect yet – I am looking at a result that offers over 100x speedup compared to the reference implementation. 

Finally, my teammate Chetan asks me if we should try supporting his favorite Transformer, ELECTRA. I’ve never heard of it before, and nobody at Groq had worked on it yet, but I may as well take a look. ELECTRA is an improved version of BERT that changes up the hyperparameters and adds a projection layer at the top. These differences sound benign, but it could still take considerable effort to adapt an optimized handwritten BERT into optimized ELECTRA.

So I put it through Groq Compiler, and it just worked. Same performance advantage as our BERT.

This week a friend asked me if I’m worried about my skill set being obsolete. I told him no, hung up the phone, and went merrily back to learning PyTorch.

Interested in seeing the Groq Compiler in action? Reach out to [email protected] to learn how you can participate in the early adopter program.

Header image credit: Photo by Behnam Norouzi on Unsplash

Image one credit: Photo by Yuichi Kageyama on Unsplash 

Image two credit: Photo by charlesdeluvio on Unsplash