OmniRoute/docs/frameworks/MEMORY.md
Diego Rodrigues de Sa e Souza 91b6983564
Release v3.8.1 (#2441)
Release v3.8.1 — feature flags settings page, bracketed combo names, security hardening, multi-driver SQLite
2026-05-21 01:29:12 -03:00

18 KiB
Raw Permalink Blame History

title version lastUpdated
Memory System 3.8.1 2026-05-13

Memory System

Source of truth: src/lib/memory/ and src/app/api/memory/ Last updated: 2026-05-13 — v3.8.0

OmniRoute provides persistent conversational memory keyed by API key (and optionally session id). Memories are extracted automatically from LLM responses via lightweight regex pattern matching and injected back into subsequent requests as a leading system message (or first user message for providers that reject the system role).

Memory is scoped per API key, not per user — every request authenticated with the same API key shares the same memory pool, with optional further scoping by sessionId.

Architecture

Client → /v1/chat/completions (apiKeyInfo resolved upstream)
  → handleChatCore() [open-sse/handlers/chatCore.ts]
    → resolveMemoryOwnerId(apiKeyInfo)        # extracts id
    → getMemorySettings()                     # cached settings
    → shouldInjectMemory(body, {enabled})     # gate
    → retrieveMemories(apiKeyId, config)      # SQL + optional FTS5
    → injectMemory(body, memories, provider)  # system or user message
  → upstream provider call
  → on response: extractFacts(text, apiKeyId, sessionId)  # non-blocking
    → setImmediate → createMemory(fact) per match

The injection and extraction call-sites are wired in open-sse/handlers/chatCore.ts (look for retrieveMemories, injectMemory, and extractFacts).

Storage Layers

Primary: SQLite (memories table)

Created by migration 015_create_memories.sql:

Column Type Notes
id TEXT PRIMARY KEY UUID generated via crypto.randomUUID()
api_key_id TEXT NOT NULL Owning API key
session_id TEXT Optional per-conversation scope
type TEXT NOT NULL One of factual, episodic, procedural, semantic
key TEXT Stable upsert key, e.g. preference:i_prefer_python
content TEXT NOT NULL The actual fact text
metadata TEXT JSON blob (category, extractedAt, source, ...)
created_at / updated_at TEXT ISO 8601 strings
expires_at TEXT Optional expiry; NULL means permanent
memory_id INTEGER UNIQUE Added by 023_fix_memory_fts_uuid.sql to bridge UUIDs ↔ FTS5 rowids

Indexes: api_key_id, session_id, type, expires_at, plus the unique memory_id index.

Upsert semantics: createMemory() looks for an existing row with the same (api_key_id, key) and updates it in place when found (merging metadata via shallow spread). This keeps the table from growing unbounded for repeated preference statements.

Full-text Search (memory_fts virtual table)

022_add_memory_fts5.sql creates an FTS5 virtual table over content and key. 023_fix_memory_fts_uuid.sql fixes a real-world bug where the UUID primary key did not join to FTS5's integer rowid — the migration adds the memory_id column, recreates the FTS table, and wires triggers (memory_fts_ai, memory_fts_ad, memory_fts_au) that keep FTS in sync on INSERT, DELETE, and UPDATE.

Used by retrieval.ts for the semantic and hybrid strategies (see below). The retrieval code guards with hasTable("memory_fts") and falls back to chronological order if the FTS table is missing or the FTS query throws.

Optional: Qdrant (vector store)

src/lib/memory/qdrant.ts implements an optional Qdrant integration for true semantic memory:

  • upsertSemanticMemoryPoint() — embed key + content with the configured embedding model, ensure the collection exists (creates cosine-distance vectors on first use), and upsert a point with payload {memoryId, apiKeyId, sessionId, key, content, metadata, createdAtUnix, expiresAtUnix}.
  • searchSemanticMemory(query, topK, scope) — embed the query, search the collection filtered by kind = "omniroute_memory" and optionally by apiKeyId / sessionId. Caps topK to [1, 20].
  • deleteSemanticMemoryPoint(id) — single point delete.
  • cleanupSemanticMemoryPoints({retentionDays}) — bulk delete points whose expiresAtUnix is in the past or whose createdAtUnix is older than the retention cutoff. Counts first so the dashboard can show actual numbers.
  • checkQdrantHealth()GET /readyz health probe with latency.

TODO: The chat pipeline (chatCore.ts) and the in-tree retrieveMemories() implementation do not currently call upsertSemanticMemoryPoint or searchSemanticMemory. The Qdrant integration is feature-flagged via qdrantEnabled in settings, but at the time of writing the searchSemanticMemory results are not fused into retrieval — the semantic/hybrid retrieval strategies use SQLite FTS5 only. The settings UI in dashboard/settings → MemorySkillsTab exposes Qdrant config, health, search test, and cleanup, but the corresponding /api/settings/qdrant, /api/settings/qdrant/health, /api/settings/qdrant/search, and /api/settings/qdrant/cleanup routes are referenced from the UI but not present under src/app/api/settings/qdrant/ (only embedding-models/ is wired). Treat Qdrant as preview/optional plumbing.

Memory Types

MemoryType (src/lib/memory/types.ts):

Type Used for
factual Preferences, stable user facts, behavioral patterns
episodic Decisions tied to a specific moment ("I chose Postgres")
procedural Workflow / how-to memory (reserved; no auto-extractor today)
semantic Reserved for vector-store entries

MemoryConfig retrieval strategy is one of exact, semantic, or hybrid, and scope is one of session, apiKey, or global. The default scope from getMemorySettings() is apiKey.

Fact Extraction (extraction.ts)

Extraction is regex-based, not LLM-based — it runs in-process with setImmediate() so it never blocks the response stream:

  • Preference patternsMemoryType.FACTUAL (e.g. I prefer …, I really like …, my favorite is …, I hate …)
  • Decision patternsMemoryType.EPISODIC (e.g. I'll use …, I chose …, I went with …, I'm going to adopt …)
  • Pattern patternsMemoryType.FACTUAL (e.g. I usually …, I always …, I tend to …)

Each match is sanitised (trim, whitespace-collapse, capped at 500 chars), deduplicated within the batch via a stable factKey(category, content), and stored via createMemory() with metadata {category, extractedAt, source: "llm_response"}. Input text is capped at 64 KiB (MAX_EXTRACTION_TEXT_LENGTH) — when longer, the tail of the text is used so the most recent assistant content always participates.

extractFactsFromText(text) is exported for tests and returns the structured facts without storing them.

Retrieval (retrieval.ts)

retrieveMemories(apiKeyId, config) is the main entry point. It:

  1. Normalises and validates the config through MemoryConfigSchema.
  2. Returns [] immediately when enabled is false or maxTokens <= 0.
  3. Clamps maxTokens to [1, 8000].
  4. Detects whether the modern memories table exists (vs the legacy memory table) so older databases keep working.
  5. Builds the base query with expiry guard (expires_at IS NULL OR datetime(expires_at) > datetime('now')), optional session scope, and optional retentionDays cutoff.
  6. Branches on strategy:
    • exact (default): chronological ORDER BY created_at DESC LIMIT 100.
    • semantic: if config.query and memory_fts exists, JOIN memory_fts MATCH ? and order by FTS rank; fall back to chronological when FTS returns 0 rows.
    • hybrid: union of FTS results (higher relevance) and the chronological set, deduplicated by id.
  7. Computes a keyword relevance score (getRelevanceScore) over content, key, and metadata JSON when a query is provided. Rows with zero score are filtered out.
  8. Sorts by score desc, then createdAt desc.
  9. Walks the ranked list and accepts entries while a running estimateTokens(content) (≈ length / 4) stays under the budget. Always returns at least one entry when any matched.

estimateTokens is exported and used by retrieval, summarisation, and the MCP omniroute_memory_search tool.

Injection (injection.ts)

injectMemory(request, memories, provider):

  1. Joins all memory contents into a single Memory context: … string.
  2. Picks a strategy by provider name:
    • System message (default for OpenAI, Anthropic, Gemini, …) — prepends a {role: "system", content: memoryText} ahead of any existing system messages so user system prompts still take precedence.
    • User message (fallback) — for providers in PROVIDERS_WITHOUT_SYSTEM_MESSAGE: o1, o1-mini, o1-preview, glm, glmt, glm-cn, zai, qianfan. These reject the system role and would 400 otherwise (cf. issue #1701 for GLM/Zhipu).
  3. Logs the count, strategy, and model under memory.injection.injected.

providerSupportsSystemMessage(provider) is exported for callers that need to make routing decisions of their own. Unknown providers default to true (system role allowed) for safety.

Settings (settings.ts)

Memory configuration is stored in the DB settings table, not in env vars. getMemorySettings() reads from getSettings() and caches the result in-process; invalidateMemorySettingsCache() is called by the settings PUT route after writes.

DB key Type Default UI control
memoryEnabled boolean true Memory on/off
memoryMaxTokens integer 2000 (range 016000) Token budget for injection
memoryRetentionDays integer 30 (range 1365) Retention window
memoryStrategy enum "hybrid" (one of recent, semantic, hybrid) Retrieval strategy
skillsEnabled boolean false Toggles per-key skill injection (see SKILLS.md)

Note: the UI strategy "recent" maps to the internal "exact" retrieval strategy via toMemoryRetrievalConfig() (chronological order).

Qdrant-related DB keys (qdrantEnabled, qdrantHost, qdrantPort, qdrantApiKey, qdrantCollection default "omniroute_memory", qdrantEmbeddingModel default "openai/text-embedding-3-small") are read by normalizeQdrantConfig() in qdrant.ts.

No MEMORY_* or QDRANT_* env vars exist today — everything is per-instance DB settings. OMNIROUTE_MEMORY_MB (commented out in .env.example) is unrelated and refers to Node heap sizing.

Summarisation (summarization.ts)

summarizeMemories(apiKeyId, sessionId?, maxTokens = 4000) compacts older content when the running token total over a key's memories exceeds the budget. It iterates rows DESC by created_at, keeps rows that fit, and for the rest replaces content in place with the first three sentences of the original. tokensSaved is the difference in estimateTokens between old and new content.

This routine is available but not called automatically in the current chat pipeline — call it from a cron, an admin action, or MemoryConfig.autoSummarize glue if you need ongoing compaction. The data loss is one-way: original text is overwritten.

REST API

All endpoints require management auth (requireManagementAuth).

Method Path Description
GET /api/memory Paginated list with filters: apiKeyId, type, sessionId, q, limit, page, offset. Response includes stats.total and stats.byType
POST /api/memory Create entry (Zod-validated: content, key, optional type, sessionId, apiKeyId, metadata, expiresAt). Calls createMemory() which upserts on (apiKeyId, key)
GET /api/memory/[id] Fetch a single entry by UUID
DELETE /api/memory/[id] Delete an entry; returns 404 when missing
GET /api/memory/health Runs verifyExtractionPipeline("health-check") — round-trip create→list→delete to confirm the store is alive. Returns {working, latencyMs, error?}
GET /api/settings/memory Current normalised MemorySettings
PUT /api/settings/memory Update one or more of enabled, maxTokens, retentionDays, strategy, skillsEnabled

The /api/memory list query supports either page-based pagination (parsePaginationParams) or raw offset — when offset is present it takes precedence and a derived page is computed for the response shape.

MCP Tools (open-sse/mcp-server/tools/memoryTools.ts)

When the MCP server is enabled, three memory tools are registered:

  • omniroute_memory_search{apiKeyId, query?, type?, maxTokens?, limit?} → wraps retrieveMemories() with retrievalStrategy: "exact", optionally filters by type, and reports totalTokens.
  • omniroute_memory_add{apiKeyId, sessionId?, type, key, content, metadata?} → wraps createMemory().
  • omniroute_memory_clear{apiKeyId, type?, olderThan?} → lists matching entries, optionally filters by created-before timestamp, then deletes each via deleteMemory().

See MCP-SERVER.md for transport and scope details.

Dashboard

src/app/(dashboard)/dashboard/memory/page.tsx provides:

  • Real-time list, search, and pagination (debounced 300 ms).
  • Type filter (factual / episodic / procedural / semantic / all).
  • Add-memory modal (key, content, type).
  • Delete per row.
  • JSON export of the current page; JSON import via file picker.
  • A green/red health dot driven by GET /api/memory/health.
  • Stat cards: totalEntries, tokensUsed, hitRate (the latter two come from the API stats payload).

Memory and Qdrant settings live under /dashboard/settings → Memory & Skills (MemorySkillsTab.tsx).

Caching

src/lib/memory/store.ts keeps an in-process LRU-ish cache (MEMORY_CACHE_TTL = 5 min, MEMORY_MAX_CACHE_SIZE = 10 000, with 20 % oldest eviction) for getMemory(id) reads, plus a generic key/value memoryCache layer (src/lib/memory/cache.ts) with get/set/invalidate methods used by callers that want their own scoped cache (1 000-entry LRU, default TTL 5 min).

Privacy & Lifecycle

  • Memory ownership is the API key id (resolveMemoryOwnerId in chatCore.ts). Without an apiKeyInfo.id neither retrieval nor injection nor extraction runs.
  • Entries with a future expires_at are filtered out of retrieval; old entries beyond retentionDays are excluded by the created_at >= cutoff clause in retrieveMemories.
  • For hard deletion, use DELETE /api/memory/[id] or omniroute_memory_clear.
  • Extraction is fire-and-forget via setImmediate; failures are logged under memory.extraction.background.failed and never surface to the caller.
  • Verification round-trips (verifyExtractionPipeline) clean up their own test entries in a finally block.

See Also

  • SKILLS.md — the skillsEnabled setting injects tool definitions alongside memory.
  • MCP-SERVER.md — MCP transport / scopes.
  • API_REFERENCE.md — broader API surface.
  • Tuto_Qdrant.md — repository-root Qdrant setup tutorial (integration currently dormant — see status banner at top of that file).
  • Source modules:
    • src/lib/memory/types.ts, schemas.ts
    • src/lib/memory/store.ts, retrieval.ts, injection.ts
    • src/lib/memory/extraction.ts, summarization.ts, verify.ts
    • src/lib/memory/settings.ts, qdrant.ts, cache.ts
    • src/lib/db/migrations/015_create_memories.sql, 022_add_memory_fts5.sql, 023_fix_memory_fts_uuid.sql
    • src/app/api/memory/route.ts, [id]/route.ts, health/route.ts
    • src/app/api/settings/memory/route.ts
    • open-sse/handlers/chatCore.ts (injection / extraction wiring)
    • open-sse/mcp-server/tools/memoryTools.ts