Your AI finally
remembers.

Recallr is a versioned memory graph for conversational agents. No more amnesia. No more stale facts. Just persistent, self-consistent, evolving knowledge — at any scale.

0%
Accuracy on LongMemEval
0pp
Over strongest baseline
<0ms
Low-latency recall
scroll to explore
[ BENCHMARK & LATENCY ]

98.9% accuracy.
Best in Class.

Evaluated on LongMemEval across 500 questions spanning six memory task types. Compared against Mem0, Zep, and Supermemory on identical configurations.

0.0%

Average accuracy

+0.0pp

vs Mem0 (best) improvement

0.0%

Low-Latency accuracy

0%

Knowledge-Update (Agentic)

Recallr (Agentic)
Recallr (Balanced)
Recallr (Low-Latency)
Mem0 (Graph)
Zep
Supermemory
025507510098.684.3Single-SessionUser(n=70)96.780Single-SessionPref.(n=30)10076.9KnowledgeUpdate(n=78)99.248.1TemporalReasoning(n=133)98.565.4Multi-Session(n=133)89.321.4Single-SessionAsst.(n=56)
Temporal Reasoning Dominance

99.2% vs 48.1% (Mem0) vs 44.4% (Zep)

+51.1pp improvement over the nearest competitor. Recallr knows the difference between when something happened and when you mentioned it. So 'last summer' means last summer, not last Tuesday.

Knowledge Update: 100%

100.0% Agentic  ·  96.2% Balanced

When facts change, old versions are archived, not erased. Your agent always has the latest truth, and the full history of how it got there.

LATENCY

Fast when you need it. Deep when it matters.

StrategyMinP25P50P75P95Max
Recallr (Low-Latency)0.234s0.265s0.298s0.338s0.396s0.750s
Recallr (Balanced)1.032s1.134s1.197s1.286s1.543s3.548s
Recallr (Agentic)5.125s6.182s6.987s7.765s8.432s20.095s
Mem0 (Graph)0.697s0.734s0.945s1.987s2.578s10.458s
Mem0 (Non-Graph)0.489s0.489s0.789s0.967s1.765s6.187s
Zep0.489s0.892s1.345s1.987s3.416s4.225s
Supermemory0.392s0.845s1.298s1.876s3.214s4.242s
0s3s6s9s12s15s18s21s
Recallr (Low-Latency)
0.40s
Recallr (Balanced)
1.54s
Recallr (Agentic)
8.43s
Mem0 (Graph)
2.58s
Mem0 (Non-Graph)
1.76s
Zep
3.42s
Supermemory
3.21s
┊ voice threshold (0.3s)┊ chat threshold (1.5s)
<300ms

Low-Latency Recall

Real-time voice and chat — answer before the user finishes thinking.

~1.2s

Balanced Recall

General production workloads — thorough retrieval without perceptible delay.

~7s

Agentic Recall

Deep reasoning over months of memory — when completeness matters more than speed.

[ ARCHITECTURE ]

Two loops.
Infinite memory.

ASYNC · POST-INTERACTION

Asynchronous Memory Curation

Every conversation leaves a richer graph than it found.

while you sleep

Conversations become structured knowledge — not raw text logs.

Contradictions get flagged, not silently overwritten.

Facts evolve over time — every version preserved, none lost.

EXAMPLE

You: "I moved to Delhi last month."

→ Previous memory updated.

  "Location: Bengaluru" archived.

  "Location: Delhi" marked current.

  Version chain preserved.

continuous learningnon-destructivezero latency impact
SYNC · REAL-TIME

Synchronous Context Retrieval

The right memory, at the right depth, before the LLM responds.

FASTDEEP
<300msLow-latency
~1.2sBalanced
5–8sAgentic

Auto-Recall automatically routes every query to the right strategy.

Retrieval depth adapts to query complexity — automatically.

Graph traversal surfaces connected memories, not just direct matches.

Session summaries provide episodic context when facts aren’t enough.

EXAMPLE

You: "What restaurants did I like when I lived in Bengaluru?"

→ Location history traversed.

  Past preferences recalled across 3 sessions.

  Response grounded in memory — not hallucination.

adaptive speedgraph-awareauto-routing
[ SEE IT IN ACTION ]

Memory that works like you think.

recallr — memory_graph.py
# Initialize Recallr memory agent
#00e5c8">"color:#a78bfa">from recallr "color:#a78bfa">import MemoryGraph, RecallStrategy

agent = MemoryGraph(user_id=#00e5c8">"maya_01")

# Session 1 — March 2024
agent.ingest(conversation=[
  {#00e5c8">"role": "user", "content": "I live ">in Bengaluru"},
])
# → Memory stored: "user lives ">in Bengaluru" [v1]

# Session 2 — August 2024
agent.ingest(conversation=[
  {#00e5c8">"role": "user", "content": "I moved to Delhi last week"},
])
# → Version cha"color:#a78bfa">in updated: v1 → v2 [TEMPORAL_CONFLICT resolved]
# "user now lives ">in Delhi" [CURRENT]

# Query — any time
context = agent.recall(
  query=#00e5c8">"Where does the user live?",
  strategy=RecallStrategy.AUTO
)
# → Retrieved: "user lives ">in Delhi" [98.9% accuracy]
memory graph — liveStore initial memory
Location: Bengaluru[v1]
[ INTEGRATION ]

Two lines of code.
Permanent memory.

A dead-simple, drop-in replacement for your existing agents. Route your OpenAI, Anthropic, or Gemini calls through Recallr — no SDK changes, no architecture rewrites, no new abstractions.

1 from openai import OpenAI
2
3 client = OpenAI(
4 base_url="https://api.openai.com/v1",
5+ base_url='https://api.recallrai.com/api/v1/forward/https://api.openai.com/v1',this is all that matters
6 api_key='sk-...',
7+ default_headers={
8+ 'X-Recallr-API-Key': 'rai-...',
9+ 'X-Recallr-Project-Id': 'your-project-id',
10+ 'X-Recallr-Allow-New-User-Creation': 'true',
11+ 'X-Recallr-Session-Timeout-Seconds': '600',
12+ }
13 )
14
15 # Memory is automatically injected on every call
16 response = client.chat.completions.create(
17 model="gpt-4o-mini",
18 messages=[{"role": "user", "content": "My name is Mayank"}],
19+ extra_headers={
20+ 'X-Recallr-User-Id': 'user-123',
21+ 'X-Recallr-Recall-Strategy': 'low_latency',
22+ }
23 )
24
25 print(response.choices[0].message.content)
+9 additions3 deletions

Zero refactoring

Your existing code works unchanged.

Any model

OpenAI, Anthropic, Gemini. One proxy.

Instant memory

Every user gets a persistent profile.

[ USE CASES ]

Memory for every long-running agent.

Clinical Context Persistence

Patient preferences, medication history, appointment tracking across months of interactions — with conflict detection when contradictory symptoms emerge.

healthcare · longitudinal

Adaptive Learning Memory

Track concepts learned, misconceptions corrected, preferred explanation styles — versioned to show learning progression over time.

education · personalization

Codebase & Preference Recall

Remember architectural decisions, preferred patterns, project context across sessions. Never explain your stack twice.

developer tools · productivity

CRM-Grade Memory

Customer preferences, past issues, escalation history — retrieved in <300ms even at millions-of-users scale.

enterprise · CRM
[ COST CALCULATOR ]

See exactly what you save.

We give $5 to free accounts every month.

Session

Chat exchanges / session10
User + assistant pairs per session
Tokens / exchange400
Naive re-sends ALL past sessions — this drives quadratic growth
Sessions / day5
Simulation window60 days

Recallr Pipeline

Memories extracted / session5
New records written to the graph DB — drives stage 2 cost

LLM Pricing

Input cost$3/M tokens
Output cost$15/M tokens
BreakevenDay 22
Savings
$342.0063.1% cheaper
Naive total
$541.80
Recallr total
$199.80
Cumulative LLM Cost
$0.00$108.36$216.72$325.08$433.44$541.80Day 1Day 13Day 25Day 36Day 48Day 60Breakeven — Day 22Naive (full history)Recallr

Naive approach: every session re-sends the entire raw conversation history as context — cost grows quadratically with sessions Recallr: fixed pipeline overhead per session regardless of history size — cost grows linearly

[ FAQ ]

Common questions.

How is Recallr different from a RAG system?

RAG retrieves static document chunks. Recallr maintains a versioned knowledge graph that evolves over time, with conflict resolution and temporal provenance. It knows what changed, when, and why — RAG just knows what was stored.

What happens when conflicting information is detected?

Recallr classifies the conflict type (temporal update, correction, preference change, or contradiction) and applies the appropriate resolution strategy. For ambiguous conflicts, it can prompt user clarification rather than silently overwriting.

Can Recallr work with any LLM?

Yes. The memory layer is model-agnostic. While the default stack uses Claude 3.5 Sonnet for memory generation, any LLM can query the graph via the recall API. Swap models without losing memory.

How does versioning work in practice?

Each memory entity maintains a linked-list version chain. New information creates a new version node with a "supersedes" edge to the previous version, along with dual timestamps (event time and ingestion time) for temporal provenance.

What's the latency overhead?

Zero for ingestion — curation runs asynchronously. For recall: <300ms (Low-Latency), 900-1500ms (Balanced), or 5-8s (Agentic). Choose the strategy that fits your use case.

Is the memory graph queryable historically?

Yes. You can query the graph at any point in time using temporal filters. Retrieve what the system knew at a specific date, trace the evolution of a fact, or audit the full version history of any entity.

[ GET STARTED ]

Stop building agents
that forget.

Join the waitlist for early API access.

Early access to API docs · No spam