12 Hours Later, Groq Deploys Llama 3 Instruct (8 & 70B) by Meta AI on Its LPU™ Inference Engine

Written by:
Groq
Share:
Llama 3 Now Available to Developers via GroqChat and GroqCloud™

Here’s what’s happened in the last 36 hours:

Throughput 

Groq offers 284 tokens per second for Llama 3 70B, over 3-11x faster than other providers.

Throughput vs. Price

While ArtificialAnalysis.ai used a mixed price (input/output) of $0.64 per 1M tokens Groq currently offers Llama 3 70B at a price of $0.59 (input) and $0.79 (output) per 1M tokens. 

Latency vs. Throughput 

For latency, measured in seconds to first tokens chunk received, versus throughput, measured in tokens per second, Groq comes in at 0.3 seconds versus 282 tokens per second.

Total Response Time

Measured as the time to receive 100 tokens output, calculated by latency and throughput metrics, Groq clocks in at 0.6 seconds. 

Summary Table of Key Comparison Metrics

Thank You to Our Developer Community!

We’re already starting to see a number of members from our developer community sharing reactions, applications, and side-by-side comparisons. Check out our X page to see more examples, join our Discord community, and play for yourself on GroqCloud™ Console

 

Special shoutouts to those captured here:

The latest Groq news. Delivered to your inbox.