How Willow Achieved Zero Downtime and 500ms Faster AI Responses with Groq

Willow Voice is an AI-powered dictation tool that is fast, accurate, and works on any app. Willow turns your voice into clear, formatted, and personalized writing while keeping everything secure and in your control. With dictation now playing a central role in how people interact with large language models (LLMs) and productivity apps, Willow is at the forefront of transforming how we talk to technology.

But to maintain that edge, Willow needed infrastructure that was faster, more reliable, and built to scale as quickly as their ambitions. That’s where Groq came in.

The challenge: How reliability became Willow’s #1 priority

In the world of AI-powered tools, one thing matters more than speed: reliability. For voice-driven apps like Willow, uptime isn’t optional, it’s critical. Sure, shaving a few hundred milliseconds off your response time is great, but if your service goes down mid-sentence, speed doesn't matter.

“Uptime is the lifeblood of our product,” says Lawrence Liu, CTO and Co-founder of Willow. “If the service goes down, even for a short time, we risk losing trust, and losing users.”

In its early days, Willow self-hosted its large language models (LLMs), taking full responsibility for scaling, uptime, and incident response. While they had previously partnered with larger providers, Willow found that most were focused on model fine-tuning, not on building a reliable, high-uptime infrastructure.

Two issues kept coming up:

Latency and throughput: Longer prompts meant longer wait times, something users felt immediately.
Reliability: Willow was experiencing weekly outages, often tied to public GPU instability. It frustrated users and led to churn.

“We were sending way too many, ‘Sorry, our servers are down’ emails,” Lawrence admitted. “That’s not the kind of communication you want to be known for.”

Willow’s team knew they had powerful tech, but to grow, they needed infrastructure that could keep up: fast, stable, and scalable. They needed infrastructure that could handle real-time voice input without blinking.

Enter Groq.

The solution: Groq-powered performance and peace of mind

Willow made the switch to Groq to power their custom fine-tuned version of Llama-3.1-8b. By running their own LoRA fine-tune on a GroqCloud dedicated instance, they gained full control over model performance, without the burden of managing infrastructure themselves.

One of the biggest technical hurdles Willow anticipated was handling throughput and longer prompts. Typically, the more tokens you send to an LLM, the slower the response. But the unique architecture behind the LPU that powers GroqCloud enables developers to scale workloads from thousands to millions of tokens instantly. And Groq’s engineering team worked closely with Willow to implement fine-tuned models with speculative decoding, dramatically improving latency across the board.

“We expected latency to increase linearly with longer token counts, but with Groq, it didn’t,” said Lawrence. “That was a huge win.” Speed wasn’t the only benefit. “Since switching to Groq, we’ve had zero downtime,” he added. “That’s been transformational for our users and our team.”

Groq's deterministic performance means you always know what you're going to get, even under heavy workloads.

The impact: Faster responses, happier customers

Switching to GroqCloud brought Willow three critical improvements:

No more downtime - zero
Noticeably lower latency (300–500 ms faster)
Reduced support requests

“We haven’t had to send a single, ‘Our servers are down’ message since we moved our workloads to Groq,” said Lawrence. “That’s huge for customer trust.”

Latency improvements were just as impactful. In a voice-first product, speed makes the difference between a useful tool and a frustrating one.

“That speed really unlocks new workflows,” he explained. “If you can get a quick, accurate response, you’re more likely to use dictation to send a Slack message or fire off a short email.”

With uninterrupted uptime, Willow’s team no longer worries about emergency fixes or user complaints. This consistency has directly led to improved retention and fewer spikes in churn. Users now experience 300–500 millisecond faster response times, a difference that may seem minor but in real-time communication, it’s transformative.

Groq also delivered unexpected value to Willow developers. One standout feature was how easy it was to swap out model weights. “We used to dread retraining models. Now, it’s nearly real-time, on-demand to upload and deploy new model weights. It’s so much easier."

Willow went from signing with Groq to going live in production in just three weeks, with hands-on support from Groq’s engineering team every step of the way.

Unlocking the future of office productivity

With Groq handling the heavy lifting, Willow is evolving from a voice dictation app into a productivity platform, especially for workplace communication. “We’re seeing more users dictate emails, send Slack messages, even summarize meetings with voice,” Lawrence said. “Groq’s performance gives us the confidence to push those use cases further.”

The reduced latency and increased reliability make it easy for users to adopt voice input as a natural part of their workflow, not just for prompts to a language model, but for actual communication with coworkers. Even without formal engagement tracking, the results are clear:

Higher user retention
Improved user experience with fewer issues
Smoother daily usage
Positive feedback on speed and reliability

When asked what changed the most since switching to Groq, Lawrence didn’t hesitate, “The biggest benefits we've seen are reliable uptime, much better latency, and excellent support. Customers are happier, and we’re no longer worrying about server outages. It’s been a night-and-day difference.”

What used to be an unreliable experience is now a fast, dependable tool that users trust and enjoy using.

What’s next: Scaling voice across Silicon Valley and beyond

Willow has big plans for growth. The team wants to become the go-to dictation provider in Silicon Valley, then expand nationwide as voice input becomes more common across the workplace.

And Groq is a key part of that vision.

“Groq is becoming known as the infrastructure provider for low latency AI,” said Lawrence. “We’re proud to be building with them.”

The partnership isn’t just about performance, it’s a shared mission to reshape how people communicate with software. Together, Willow and Groq are unlocking real-time AI experiences that feel effortless and human.

Final thoughts: A true win-win partnership

For Willow, Groq delivered exactly what they needed: Zero downtime, ultra-low latency, seamless developer tools, and scalable performance for LLMs

And for Groq, Willow showcases what’s possible when infrastructure actually meets the demands of modern AI applications.

“At the end of the day, our users just want a tool that works and with Groq, it does,” Lawrence concluded.