Why Recall Switched to Groq: Fast, Intelligent Knowledge Retrieval, 10X Lower Cost

Every day, we come across content worth remembering: a quote that hits home, a podcast that sparks an idea, or an article that shifts our thinking. But in a world of endless tabs and information overload, those moments of insight often get lost in the chaos.

That’s where Recall comes in. Founded by Paul Richards, Igor Gligorevic and Sankari Nair, Recall was born out of a simple frustration: consuming great content only to forget it later or never knowing where to find it again. What began as a personal Evernote workflow for organization content, turned into a personal knowledge engine that lets you save content from anywhere, automatically summarize and organize it, and surface meaningful connections across everything you’ve collected. It also helps you interact with and retain what you’ve learned through chat-based Q&A, spaced repetition, and active recall.

From podcasts and YouTube videos to articles, notes, and PDFs, Recall acts like a secure, self-organizing extension of your mind, turning discovery into memory by helping you remember what you've learned over time.

Turning a viral idea into a real-time knowledge engine

Paul, a self-taught coder, made a post to launch the company on Hacker News titled, “A tool to remember all the sh*t you care about.” It hit #1 ranking and thousands of users poured in. An investor wired him money before he even had a registered company. Recall became a real company. And that’s when the hard part started. Behind the simplicity of “save, ask, remember,” is a staggering amount of AI inference.

As Recall scaled from a side project to a funded startup, speed became non-negotiable. Early users loved the idea of summarizing and chatting with their saved content, but only if responses felt instantaneous. “If it takes one minute to summarize a five-minute article, the value disappears for our users,” Paul explains. “Features like knowledge-graph linking, chat interactions, and auto-tagging simply aren’t viable without high throughput and low latency.”

At the same time, Recall needed to build a knowledge graph that could accurately understand and link concepts, like knowing whether “Apple” meant fruit or a trillion-dollar company, across each user’s stored content. Existing solutions like Google’s Grounded Search API and Perplexity’s Sonar API offered accuracy but at a steep cost and slow speed, threatening the company’s margins and scalability.

To succeed, Recall needed:

Fast inference for summaries, transcription, and chat responses
Low-cost, accurate entity extraction to power its knowledge graph
Scalable infrastructure that didn’t require a dedicated machine-learning ops team

Recall had proven the concept. Now it needed a breakthrough in speed, cost, and scalability to turn a viral idea into a sustainable, intelligent product.

Building Recall’s real-time intelligence on Groq

To overcome lagging speed and escalating costs, Paul benchmarked Recall’s workloads across leading AI infrastructure providers, including Groq, Together AI, Replicate, Fireworks, and Google Vertex AI. Groq immediately stood out because it delivered dramatically higher token throughput and real-time performance.

Today, Recall runs open-source models on GroqCloud as well as Compound, Groq's web research agent, in production. Compound conducts research, executes code, controls browsers, and navigates the web on its own. The combination of Compound and open source models on Groq powers the Recall intelligence engine:

Summarization and formatting: Recall runs Llama 3.3 and Llama 4 models on Groq to generate instant summaries, bullet points, and chat responses, eliminating the lag that once slowed user engagement.
Audio processing: With Whisper on Groq, Recall transcribes YouTube videos and podcasts that lack transcripts, turning audio into searchable, structured content.
Entity extraction and knowledge graph: Recall replaced Google’s costly grounded search API with Groq Compound, achieving fast, low-cost entity extraction and linking—essential for connecting concepts across each user’s knowledge graph.

“Groq Compound is what unlocked our knowledge graph. We moved from simple entity extraction to truly understanding relationships between ideas, and we did it 10x cheaper than using Google’s grounded search,” says Igor Gligorevic, Recall’s Co-founder & CTO.

By building on Groq, Recall unlocked real-time summarization, transcription, and contextual search, transforming a powerful idea into a scalable, sustainable platform for personal knowledge.

From slow and costly to instant and scalable

Groq’s fast inference and low-latency architecture allowed Recall users to click "Summarize" and get answers before they’d even finished a sip of coffee.

Before Groq, summaries took seconds or longer, users dropped off while waiting, and inference costs ballooned. Google’s $3-per-1K entity searches made deeper features impossible to sustain.

After Groq, summaries felt instant, transforming the user experience from passive to interactive. Groq Compound cut inference costs to a fraction of Google’s model, making advanced contextual features viable for the first time.

With Groq, Recall was able to:

Reduce inference costs by 10x
Summarize 10,000+ hours of content daily
Run Llama 3.3/4 models at 280 tokens per seconds
Switch to Whisper on Groq for audio and video transcription
Evolve its knowledge graphs from static to contextual

“We process millions of minutes of audio,” said Paul. “Groq is the only platform we’ve used that didn’t choke when we scaled. It didn’t just make us faster, it made Recall a more sustainable business. With ChatGPT and traditional APIs, every query was eating into our margins. Groq gave us instant inference at a cost that finally made sense. Groq brought response times down to milliseconds—and that kind of speed is the difference between a tool people try and a tool people rely on every day.”

These performance gains unlocked a new product category—a real-time, personal knowledge engine.

The results followed fast:

10,000+ paying users and 1,000+ enterprise customers
Earned Product of the Day, Week, and Month on Product Hunt

What began as a viral prototype is now a thriving AI platform powered by Groq’s speed, scalability, and precision, turning slow and costly processes into instant, affordable performance.

Looking ahead: Redefining how we learn and remember

Today, professionals across industries rely on Recall to turn information overload into insight. Whether mastering new topics, accelerating research, or organizing personal interests, users experience a faster, more intuitive way to capture and connect what matters most.

Powered by Groq, Recall delivers instant summaries, contextual chat, and a dynamic knowledge graph that evolves with every new piece of content. Together, Recall and Groq are redefining what’s possible in knowledge management: instant, intelligent, and infinitely scalable.

Turning a viral idea into a real-time knowledge engine

Building Recall’s real-time intelligence on Groq

From slow and costly to instant and scalable

The results followed fast:

Looking ahead: Redefining how we learn and remember

Build Fast