Two memory interfaces — flat KV and the layered paradigm honest about its shape.
v0.15 shipped a flat memory Backend interface — caller-keyed Get/Set, faithful round-trip, deterministic embedding, vector Search overlay, per-(provider, model) Stats — with Mem9 as the first external implementation. v0.16 ships MemoryLayer alongside it. Same Memory tool, two retrieval paradigms, one capability probe deciding which surface a given backend serves. The first shape isn't wrong. The second shape isn't new. What's new is the honest engineering: they're different paradigms, and the substrate now treats them as such instead of pretending one shape fits both.
This post is the story of why the flat interface alone wasn't enough, the product survey that forced the question, and how shipping a second capability instead of bending the first one preserved the bits of v0.15 that worked while opening the door to mem0 (and re-targeting Mem9 at the paradigm it was always actually serving).
What v0.15 shipped — and why we needed more
RFC I (memory ranking + pluggable backends) landed in v0.15 with three substantive additions to the Memory tool. A native hybrid ranker on memory.search — score = α·cosine + β·recency_decay + γ·source_weight + δ·log(1 + access_count), with the default α=1 (everything else 0) reproducing the prior pure-cosine behavior so nobody saw a regression. Search-time dedup as a post-rank filter — drop candidates within cosine distance threshold of an already-retained higher-ranked entry, with three modes (drop, merge, keep) per call. And the seam this post is mostly about: a pluggable memory.Backend interface with the in-process sqlite-vec + Postgres path refactored behind it as the default implementation, plus a MemoryBackendDef substrate primitive — the sixth content-addressed Def after AgentDef / SkillDef / MCPServerDef / ScheduleDef / WebhookDef — for operators to register external backends and bind specific AgentDefs to them.
Mem9 was the first external backend on that seam. Open-source Apache-2.0 Go, REST API, X-API-Key auth — a clean fit for what the interface promised. And it shipped stub-tested, because the alpha v1alpha2 API was thin and we had to assume some wire-shape details ahead of contract verification. The implementation banner said so plainly: "verify against the real v1alpha2 API before production."
Then we did the verification. And the survey that became RFC K.
The product survey that became RFC K
The brief was small: is Mem9 actually the right horse, or is there a more mature external memory product we should lean on? Five candidates against the flat-KV contract, scored on maturity (stars, releases, recency), self-hosting story, embedding ownership, and fit against loomcycle's Backend:
| Product | Paradigm | Maturity | Self-host | Fit verdict |
|---|---|---|---|---|
| mem0 | memory-layer (LLM-extract) | 57.2k ★, daily commits, funded company | ✓ Apache-2.0 | poor as KV; strong as layer |
| Zep / Graphiti | memory-layer (temporal KG) | ~25k ★ ecosystem, self-host deprecated | ✗ cloud lock-in | disqualified for self-hostable runtime |
| Letta / MemGPT | hybrid agent framework | 23.1k ★, active | ✓ Apache-2.0 | not a memory store — needs an agent |
| OpenAI Conversations / Vector Stores | transcript + file-chunk | n/a (managed) | ✗ proprietary | no memory API; embedder hard-pinned |
| Mem9 (shipped) | memory-layer (LLM-extract) | 1.1k ★, zero tagged releases, alpha API | ✓ Apache-2.0 | poor as KV — its actual paradigm is layer |
Every single one was paradigm-mismatched against the flat-KV interface. Not for maturity reasons — for shape reasons. Four of the five (mem0, Zep, Mem9, the only real OpenAI memory feature) are LLM-extract memory layers: you call add(messages) with conversation turns, the server runs an LLM to extract durable facts (ADD/UPDATE/DELETE/NOOP), and you query later via search(natural_language_query). The fifth (Letta) is agent-state, not a standalone memory store at all.
And the live Mem9 v1alpha2 docs refuted the stub-tested wire shape point by point: write takes content or messages (not a caller key), identity is a server-assigned UUID (no caller-chosen address), ingest returns 202 Accepted (not a synchronous deterministic write), "smart" mode lets the LLM mutate or delete the input (not a faithful round-trip), and there's no Stats endpoint. The shipped backend had been writing against an interface the live API doesn't expose.
The instinct was right — Mem9 was the wrong horse for what we'd built. The fix wasn't.
Why "just swap Mem9 for mem0" was the wrong move
The temptation when a backend doesn't fit is to swap implementations. The harder question is whether the interface fits. Four axes told the answer:
| Axis | Flat Backend | Memory-layer products |
|---|---|---|
| Identity | caller-chosen key | server-assigned UUID — there is no Get(key) |
| Fidelity | what you Set is what you Get | LLM rewrites / merges / may delete input in smart mode — round-trip is intentionally not faithful, that's the product's value |
| Synchrony | Set returns {Embedded, EmbedWarning} synchronously | add returns 202/PENDING; extraction completes later; read-after-write not guaranteed |
| Embedding accounting | provider-agnostic Embedder substrate + per-(provider, model) Stats | embeds with the server's own model; no per-model telemetry; silently bypasses the v0.9.0 reembed / dimension-mismatch machinery |
You cannot make an LLM-extract product satisfy Backend.Get/Set faithfully. Every adaptation we sketched forced the same four compromises. A key→UUID side-index to fake addressability. An infer=false "raw" mode — which disables the only reason to pay for a memory layer in the first place. Async-Set lying about Embedded because there's no synchronous answer to give. An unimplemented Stats. That's the trap: using 20% of the product (its vector search) while fighting the other 80%.
Swapping mem0 in for Mem9 doesn't escape the trap. It just relocates it.
Two interfaces, one tool
The fix RFC K locked: keep the flat Backend interface frozen as the canonical contract; add MemoryLayer as an optional capability alongside it. Every backend implements at most both, at least one, and the tool probes capabilities once at wire-time to route ops correctly:
// internal/memory/layer.go (NEW)
type MemoryLayer interface {
// Add ingests conversation messages. The backend MAY run an LLM
// to extract and reconcile durable facts. Returns a handle for
// the (often async) ingestion; AddResult.Status reflects whether
// extraction completed.
Add(ctx, scope, scopeID, msgs []LayerMessage, opts AddOptions) (AddResult, error)
// Recall runs a natural-language semantic search over extracted
// facts. Distinct from Backend.Search: results are derived facts
// with server-assigned IDs, not caller-keyed entries.
Recall(ctx, scope, scopeID, q RecallQuery) (RecallResult, error)
}
type Capabilities struct {
KV bool // satisfies the flat Backend contract faithfully
VectorSearch bool // Backend.Search supported
Stats bool // Backend.Stats returns real per-(provider, model) rows
MemoryLayer bool // implements MemoryLayer (add / recall)
}
The Memory tool gains two ops mirroring the interface: memory.add routes through MemoryLayer, memory.recall routes through MemoryLayer. The existing six KV ops (get, set, delete, list, search, bulkInsert) keep routing through Backend. A backend that lacks MemoryLayer doesn't get memory.add as a degraded behavior — it gets ErrCapabilityUnsupported, mirroring the existing vector_unsupported refusal. Fail closed; never silently no-op.
The in-process sqlite-vec backend implements Backend only — it's the canonical KV+vector implementation, nothing else changes about it. A pure memory-layer backend (mem0, Mem9 smart-mode) implements MemoryLayer only and optionally declines Backend by advertising KV=false in its capabilities. We stopped forcing LLM-extract products into a KV mold; we exposed their real shape.
Mem9, re-targeted at its actual paradigm
The Mem9 backend that shipped in v0.15 stub-tested against a wire shape the live API doesn't expose. The honest move in v0.16: demote it to PREVIEW (LOOMCYCLE_MEM9_PREVIEW=1 gate), keep the verified parts (X-API-Key auth, body size cap, the tenancy-prefix isolation that prevents shared-key cross-tenant leaks), and re-implement against MemoryLayer — which is Mem9's natural paradigm. The smart-mode write becomes Add(messages); the natural-language search becomes Recall(query). The flat-KV claim is retired; the capability probe reports {KV: false, MemoryLayer: true}.
One isolation invariant matters here and got specific tests: a shared-key Mem9 deployment serving multiple loomcycle tenants must never let tenant A's memory.add reach tenant B's memory.recall. The Mem9 session_id field gets tenant-prefixed on every call — the same prefix isolation the KV path uses. A PR #302 hardening pass closed the original cross-tenant leak in the KV variant before that path was retired; the MemoryLayer path inherits the fix.
No production user is affected by the demotion. Mem9 had zero tagged releases when v0.15 shipped; it has zero tagged releases now. The shipped backend never worked against a live server. The PREVIEW gate is the honest framing.
What v0.16 added alongside the second interface
Three smaller things worth naming. The PR #302 cross-tenant-leak hardening on the KV variant closed a real isolation gap before retirement: a shared_key_with_prefix tenancy strategy must apply the prefix on every read, write, list, and delete — not just the writes — to keep tenant A's keys out of tenant B's listings. Tests pin the closure at every op. The same pattern applied to SSRF guards on the optional fallback-on-error path and the request-body size enforcement on every method.
PR #303 added runtime suites covering Schedules + Webhooks + Memory dedup end-to-end, including a 30-min soak. Memory-dedup specifically: 10K writes, top-k recall, dedup-drop counts and embedding-call counts asserted at the substrate level. No drift in dedup behavior under 30 continuous minutes of writes.
PR #304 fixed a webhook trust-boundary subtlety from QA: a signature-valid replay (a duplicate delivery_id within the dedup TTL) had been returning 401. The 401 misled legitimate senders into rotating their secret on an issue that's actually idempotency, not auth. The right contract — what GitHub and Stripe expect — is 200 deduped with the original run_id when recoverable. A new verdict=accepted_replay distinguishes a fresh accept from a deduped re-send for triage. Caught by the whole-feature QA report; the response semantics matter as much as the security.
The architectural insight is small and important: interfaces lie when they advertise more than they support. v0.15's flat Backend was honest about its own shape, but it implied — by being the only shape — that every external backend ought to fit it. The mature memory products don't, and forcing them in disrespects what makes them mature in the first place (the LLM-extract paradigm IS the product). The fix wasn't picking a better-fitting product; it was admitting the interface had been answering only half the question. Add the second half; let backends declare what they actually serve; fail closed when they don't.
What's next for memory
Three concrete follow-ups. First, the mem0 backend itself — RFC K's ranked recommendation is that if we adopt one external memory-layer product, it's mem0 (57k stars, daily commits, funded company, published LoCoMo and LongMemEval numbers, clean Apache-2.0 self-host, async add with event-id polling that maps cleanly onto AddResult.Pending). Implementation is straightforward now that the MemoryLayer seam exists: a Authorization: Token <key> header (not Bearer), the async ingest contract, the v1/memories/search recall surface. Two-week scope post-v0.16.
Second, the memory-eval harness — the one I called load-bearing in RFC I and haven't shipped yet. Without measured precision / recall / dedup-rate / evidence-quality numbers against a held-out dataset, every ranker tuning PR is dice-rolling. Building this is the gate to confidently saying "v0.16 retrieval is better than v0.15" with data, not vibes.
Third, the /ui/memory introspection tab — operator-readable table with filters, the "show recalls for run X" overlay joined against OTEL traces. The current "what's in memory and what got recalled" answer is SQL or curl; this is the right next operator-surface to ship.
Companion reading: the RFC I writeup that introduced the flat Backend seam (MEMORY-BACKENDS.md), the bundled Context.help memory-layer topic on any post-v0.16 build for the add/recall wire shapes, and Three MCP tokens in one run for the per-run credentials channel external backends authenticate through.