GroqCloud™ Makes DeepSeek R1 Distill Llama 70B Available

Written by:

Groq

DeepSeek-R1-Distill-Llama-70b, a fine-tuned version of Llama 3.3 70B using samples generated by DeepSeek-R1, is now live on GroqCloud™ for instant reasoning and we’ve enabled the full 128k context window for this model. You can try it at console.groq.com. Please note the initial release of this model on GroqCloud is in preview mode, meaning we recommend its use for evaluation purposes only until it is listed as a production model (coming soon).

DeepSeek has raised the bar on model quality all the while sharing a blueprint by making their work open-source. This is a game-changer. We expect to see huge leaps in model capabilities as others build off their work. As models improve, the possibilities are endless and only limited by our creativity. This means the demand for compute will be massive so we’re adding capacity at Groq every day.

Ian Andrews I CRO, Groq

Why Distill Llama 70B?

DeepSeek R1 – the bigger and smarter model of the Deepseek suite – was distilled into the Llama 70B architecture, making it smarter, based on benchmarks and human evaluation, than the original Llama 70B,and particularly exceptional at tasks requiring mathematical and factual precision.

DeepSeek-R1-Distill-Llama-70b delivers top-tier performance on MATH-500 (94.5%), the best among all distilled models, and achieves a strong score of 86.7% on AIME 2024 (an exam designed to challenge the brightest high school math students in America) – this makes it a top choice for advanced mathematical reasoning. It is also more competent in coding tasks than most other models, performing better than OpenAI’s o1 mini and gpt-4o at GPQA Diamond (65.2%) and LiveCode Bench (57.5%).

Groq believes DeepSeek-R1-Distill-Llama-70b is the right balance of performance versus price, where the economics actually work at scale. We can’t wait to see what people unlock and build with this new release.

Reasoning Models

Reasoning models are unique because they introduce a chain-of-thought (CoT) thinking phase before generating an answer. Because they are specifically trained to do this, they have improved reasoning performance at inference time. This means reasoning models excel at complex problem-solving tasks that require step-by-step analysis, logical deduction, structured thinking and solution validation.

Inference based on reasoning generates a high volume of tokens so it takes longer to generate the output. Speed is essential, because fast responses engage users, where lagging responses cause frustration. Groq offers fast AI inference speed so that CoT models can deliver instant reasoning capabilities critical for high-quality, real-time applications.

DeepSeek in particular is a special reasoning model because it proves you can significantly improve LLM reasoning with pure Reinforcement Learning (RL), no labeled data needed, as used in DeepSeek-R1-Zero. DeepSeek-R1 went on to improve readability, providing the clarity and precision in the results users needed. If you want to learn more about the DeepSeek-R1 training process, check this out.

Many thought model scaling hit a wall, but as Groq CEO and Founder Jonathan Ross shared in his 2025 predictions, we’re in a new law-defying time. Model quality, not just model size, matters now more than ever. Groq is ready for the new wave of models and we’re ready to run them fast.

Why Speed Matters for Reasoning

Reasoning models are capable of complex decision making with explicit reasoning chains that are part of the token output and used for decision-making – this makes ultra-low latency and fast inference essential. Complex problems often require multiple chains of reasoning tokens where each step builds on previous results. Low latency compounds benefits across reasoning chains and shaves off minutes of reasoning to a response in seconds.

Your Data + DeepSeek = Secure on GroqCloud™

Note that Groq is a US-based company and when you send a query via DeepSeek-R1-Distill-Llama-70b on GroqCloud, we’ve deployed it on our infrastructure. In particular, when you send a query (inputs/outputs) to any model instance via GroqCloud™, we temporarily store it in memory and clear it as soon as your session is completed. We do this because we don’t do training – we do inference. While Groq has no incentive to keep your data, customers who want storage can work with their own provider to meet their application needs.

This means your data won’t be sent to any DeepSeek servers in China. We take privacy seriously at Groq and you can learn more about that commitment at trust.groq.com.

Ready, Set, Build

Get the best out of DeepSeek-R1-Distill-Llama-70b on GroqCloud.

Temperature & Token Management: Early testing indicates the model performs best with temperature settings between 0.5-0.7, with lower values (closer to 0.5) producing more consistent mathematical proofs and higher values allowing for more creative problem-solving approaches. Monitor and adjust your token usage based on the complexity of your reasoning tasks. While the default max_completion_tokens is 1024, complex proofs may require higher limits.
Prompt Engineering: To ensure accurate, step-by-step reasoning while maintaining high performance, DeepSeek-R1 works best when all instructions are included directly in user messages rather than system prompts. We recommend you structure your prompts to request explicit validation steps and intermediate calculations. If possible, avoid few-shot prompting and go for zero-shot prompting only.
2x Rate Limits for Dev Tier: Rate limits are now 2X higher for Dev Tier customers building with DeepSeek-R1-Distill-Llama-70b. Time to ship!

Stay tuned for more updates on DeepSeek-R1-Distill-Llama-70b running on GroqCloud.

GroqCloud™ Makes DeepSeek R1 Distill Llama 70B Available

The latest Groq news. Delivered to your inbox.