Your AI finally remembers.

Recallr is the #1 memory layer for conversational AI agents. Persistent, versioned and temporally-aware knowledge, powering chatbots, voice agents, and copilots that never forget.

Accuracy on LongMemEval
97.5%
Over strongest baseline
+46.6pp
Low-latency recall
<400ms

97.5% accuracy.
Best in Class.

Evaluated on LongMemEval (oracle) across 500 questions spanning six memory task types.

  • Recallr (Agentic)
  • Recallr (Balanced)
  • Recallr (Low-Latency)
  • Mem0
  • Mem0 (Graph)
  • Supermemory

Temporal Reasoning Dominance

97.0% vs 50.4% (Mem0 Graph) vs 27.1% (Supermemory)

+46.6pp improvement over the nearest competitor. Recallr knows the difference between when something happened and when you mentioned it. So 'last summer' means last summer, not last Tuesday.

Knowledge Update: 97.4%

97.4% vs 76.9% (Mem0) vs 60.3% (Supermemory)

When facts change, old versions are archived, not erased. Your agent always has the latest truth, and the full history of how it got there.

Fast when you need it. Deep when it matters.

Latency percentiles for Recallr strategies compared to Mem0 and Supermemory
StrategyMinP25P50P75P95Max
Recallr (Low-Latency)
0.234s0.265s0.299s0.338s0.408s0.750s
Recallr (Balanced)
1.032s1.132s1.198s1.286s1.575s3.548s
Recallr (Agentic)
5.125s6.194s6.997s7.765s8.619s20.095s
Mem0
0.489s0.504s0.786s0.967s1.787s6.171s
Mem0 (Graph)
0.697s0.746s0.961s1.987s2.692s10.458s
Supermemory
0.392s0.851s1.301s1.876s3.293s4.242s
0s
2s
4s
6s
8s
10s
12s
14s
16s
18s
20s
Voice threshold 0.3s
Chat threshold 1.5s
Recallr (Low-Latency)0.41s
Recallr Low-Latency latency range
Recallr (Balanced)1.57s
Recallr Balanced latency range
Recallr (Agentic)8.62s
Recallr Agentic latency range
Mem01.79s
Mem0 latency range
Mem0 (Graph)2.69s
Mem0 Graph latency range
Supermemory3.29s
Supermemory latency range
voice threshold (0.3s)
chat threshold (1.5s)

<400ms

Low-Latency Recall

Real-time voice and chat — answer before the user finishes thinking.

~1.5s

Balanced Recall

General production workloads — thorough retrieval without perceptible delay.

~8s

Agentic Recall

Deep reasoning over months of memory — when completeness matters more than speed.

Two loops.
Infinite memory.

Asynchronous Memory Curation
Asynchronous Memory Curation

Every conversation leaves a richer graph than it found.

While you sleep

Runs after the turn, off the critical path — so deeper processing never adds latency.
Extracts facts, flags contradictions, and versions every change — the graph grows richer each turn.

Example

You: "I moved to Delhi last month."

Previous memory updated."Location: Bengaluru" archived."Location: Delhi" marked current.Version chain preserved.
Synchronous Context Retrieval
Synchronous Context Retrieval

The right memory, at the right depth, before the LLM responds.

Fast → Deep · Auto-Recall

Blocking by design — the LLM waits for context, so retrieval speed is response speed.
Depth adapts to the query — low-latency for voice, balanced for chat, agentic for accuracy.

Example

You: "What restaurants did I like when I lived in Bengaluru?"

Location history traversed.Past preferences recalled across 3 sessions.Response grounded in memory, not hallucination.

Two lines of code.
Permanent memory.

A dead-simple, drop-in replacement for your existing agents. Route your OpenAI, Anthropic, or Gemini calls through Recallr. No SDK changes, no architecture rewrites, no new abstractions.

1 from openai import OpenAI
2
3 client = OpenAI(
4 base_url="https://api.openai.com/v1",
5+ base_url='https://api.recallrai.com/api/v1/forward/https://api.openai.com/v1',
6 api_key='sk-...',
7+ default_headers={
8+ 'X-Recallr-API-Key': 'rai-...',
9+ 'X-Recallr-Project-Id': 'your-project-id',
10+ 'X-Recallr-Allow-New-User-Creation': 'true',
11+ 'X-Recallr-Session-Timeout-Seconds': '600',
12+ }
13 )
14
15 # Memory is automatically injected on every call
16 response = client.chat.completions.create(
17 model="gpt-5.5",
18 messages=[{"role": "user", "content": "My name is Mayank"}],
19+ extra_headers={
20+ 'X-Recallr-User-Id': 'user-123',
21+ 'X-Recallr-Recall-Strategy': 'low_latency',
22+ }
23 )
24
25 print(response.choices[0].message.content)
+9 additions3 deletions

Zero refactoring

Your existing code works unchanged.

Any model

OpenAI, Anthropic, Gemini. One proxy.

Instant memory

Every user gets a persistent profile.

Built for conversational AI agents that need to remember everything.

Every domain where context compounds over time. Every agent that should never ask twice.

Healthcare · Longitudinal

“Clinical Memory”

Remembers medication reactions, contradictory symptoms, and what the patient said last month versus what they’re saying now. The doctor gets context. The patient gets heard.

Education · Personalization

“Personalized Mentor”

Knows which concepts clicked, which needed three tries, and how you learn best. Every session builds on the last.

Legal · Case Management

“Case Memory for Legal AI”

Tracks every witness statement, flags contradictions across depositions months apart, and never loses the thread of a multi-year case. The brief writes itself.

Customer Support · Context

“Support That Never Asks Twice”

Remembers every ticket, every frustration, every resolution. When a customer returns, the agent already knows their history, preferences, and what was promised last time.

Sales & CRM · Pipeline

“Persistent Sales Context”

Tracks every interaction, objection, and milestone across long sales cycles. Reps get instant recall at every touchpoint — no more re-reading CRM notes before calls.

E-Commerce · Personalization

“The Store That Knows You”

Remembers size preferences, past purchases, style evolution, and gift recipients. Every recommendation builds on real history, not just last-click behavior.

Chronic Illness · Patient Autonomy

"The AI That Remembers Your Body"

For patients managing chronic conditions, every new doctor visit starts from zero. Recallr gives the AI the full longitudinal picture: symptoms reported six months ago, medications that caused side effects, what "bad days" actually look like for this specific person.

When reported symptoms contradict what was said before, the conflict is surfaced, not silently overwritten. The patient's history becomes a living record, not a forgotten intake form.

Patient Memory Visualization — Oct 2024 → Jan 2025 → Mar 2025

Conflict detected at Oct 2024 · Versions diverged

See exactly what you save.

We give $20 to free accounts every month.

Chat exchanges / session

10

User + assistant pairs per session

Sessions / day

5

Per active user, on average

Simulation window

60 days

Cost projection horizon

Both costs include upstream LLM fees. Naive re-sends full history as context — cost grows quadratically. Recallr uses a fixed memory pipeline per session — cost grows linearly.

Without Recallr

$541.80

full history re-sent every call

With Recallr

$199.80

included in total above

  • ·Your LLM (compressed context)$60.30
  • ·Recallr memory pipeline$139.50

You save

$342.00

63% cheaper than naive

Breakeven

Day 22

Recallr cheaper from here on

Cumulative LLM cost

Naive (full history re-sent) Recallr (memory pipeline)
$0$275$550$825$1.1kBREAKEVEN · DAY 22Day 1Day 13Day 25Day 36Day 48Day 60

Frequently Asked Questions

RAG retrieves static document chunks. Recallr maintains a versioned knowledge graph that evolves over time, with conflict resolution and temporal provenance. It knows what changed, when, and why. RAG just knows what was stored.

Recallr classifies the conflict type (temporal update, correction, preference change, or contradiction) and applies the appropriate resolution strategy. For ambiguous conflicts, it can prompt user clarification rather than silently overwriting.

Yes. The memory layer is model-agnostic. Route your OpenAI, Anthropic, or Gemini calls through Recallr with two lines of code. No SDK changes, no architecture rewrites.

Each memory entity maintains a linked-list version chain. New information creates a new version node with a "supersedes" edge to the previous version, along with dual timestamps (event time and ingestion time) for temporal provenance.

Zero for ingestion. Curation runs asynchronously. For recall: <400ms (Low-Latency), ~1.5s (Balanced), or ~8s (Agentic). Choose the strategy that fits your use case.

Yes. You can query the graph at any point in time using temporal filters. Retrieve what the system knew at a specific date, trace the evolution of a fact, or audit the full version history of any entity.

Stop building agents

that forget.

Start Building