OmniRoute/docs/frameworks/MEMORY.md

---
title: "Memory System"
version: 3.8.1
lastUpdated: 2026-05-13
---

# Memory System

> **Source of truth:** `src/lib/memory/` and `src/app/api/memory/`
> **Last updated:** 2026-05-13 — v3.8.0

OmniRoute provides persistent conversational memory keyed by API key (and
optionally session id). Memories are extracted automatically from LLM responses
via lightweight regex pattern matching and injected back into subsequent
requests as a leading system message (or first user message for providers that
reject the system role).

Memory is **scoped per API key**, not per user — every request authenticated
with the same API key shares the same memory pool, with optional further
scoping by `sessionId`.

## Architecture

```
Client → /v1/chat/completions (apiKeyInfo resolved upstream)
  → handleChatCore() [open-sse/handlers/chatCore.ts]
    → resolveMemoryOwnerId(apiKeyInfo)        # extracts id
    → getMemorySettings()                     # cached settings
    → shouldInjectMemory(body, {enabled})     # gate
    → retrieveMemories(apiKeyId, config)      # SQL + optional FTS5
    → injectMemory(body, memories, provider)  # system or user message
  → upstream provider call
  → on response: extractFacts(text, apiKeyId, sessionId)  # non-blocking
    → setImmediate → createMemory(fact) per match
```

The injection and extraction call-sites are wired in
`open-sse/handlers/chatCore.ts` (look for `retrieveMemories`, `injectMemory`,
and `extractFacts`).

## Storage Layers

### Primary: SQLite (`memories` table)

Created by migration `015_create_memories.sql`:

| Column                      | Type               | Notes                                                                |
| --------------------------- | ------------------ | -------------------------------------------------------------------- |
| `id`                        | `TEXT PRIMARY KEY` | UUID generated via `crypto.randomUUID()`                             |
| `api_key_id`                | `TEXT NOT NULL`    | Owning API key                                                       |
| `session_id`                | `TEXT`             | Optional per-conversation scope                                      |
| `type`                      | `TEXT NOT NULL`    | One of `factual`, `episodic`, `procedural`, `semantic`               |
| `key`                       | `TEXT`             | Stable upsert key, e.g. `preference:i_prefer_python`                 |
| `content`                   | `TEXT NOT NULL`    | The actual fact text                                                 |
| `metadata`                  | `TEXT`             | JSON blob (category, extractedAt, source, ...)                       |
| `created_at` / `updated_at` | `TEXT`             | ISO 8601 strings                                                     |
| `expires_at`                | `TEXT`             | Optional expiry; `NULL` means permanent                              |
| `memory_id`                 | `INTEGER UNIQUE`   | Added by `023_fix_memory_fts_uuid.sql` to bridge UUIDs ↔ FTS5 rowids |

Indexes: `api_key_id`, `session_id`, `type`, `expires_at`, plus the unique
`memory_id` index.

**Upsert semantics**: `createMemory()` looks for an existing row with the same
`(api_key_id, key)` and updates it in place when found (merging `metadata` via
shallow spread). This keeps the table from growing unbounded for repeated
preference statements.

### Full-text Search (`memory_fts` virtual table)

`022_add_memory_fts5.sql` creates an FTS5 virtual table over `content` and
`key`. `023_fix_memory_fts_uuid.sql` fixes a real-world bug where the UUID
primary key did not join to FTS5's integer rowid — the migration adds the
`memory_id` column, recreates the FTS table, and wires triggers
(`memory_fts_ai`, `memory_fts_ad`, `memory_fts_au`) that keep FTS in sync on
INSERT, DELETE, and UPDATE.

Used by `retrieval.ts` for the `semantic` and `hybrid` strategies (see below).
The retrieval code guards with `hasTable("memory_fts")` and falls back to
chronological order if the FTS table is missing or the FTS query throws.

### Optional: Qdrant (vector store)

`src/lib/memory/qdrant.ts` implements an optional Qdrant integration for true
semantic memory:

- `upsertSemanticMemoryPoint()` — embed `key + content` with the configured
  embedding model, ensure the collection exists (creates cosine-distance
  vectors on first use), and upsert a point with payload `{memoryId,
apiKeyId, sessionId, key, content, metadata, createdAtUnix, expiresAtUnix}`.
- `searchSemanticMemory(query, topK, scope)` — embed the query, search the
  collection filtered by `kind = "omniroute_memory"` and optionally by
  `apiKeyId` / `sessionId`. Caps `topK` to `[1, 20]`.
- `deleteSemanticMemoryPoint(id)` — single point delete.
- `cleanupSemanticMemoryPoints({retentionDays})` — bulk delete points whose
  `expiresAtUnix` is in the past or whose `createdAtUnix` is older than the
  retention cutoff. Counts first so the dashboard can show actual numbers.
- `checkQdrantHealth()` — `GET /readyz` health probe with latency.

> **TODO**: The chat pipeline (`chatCore.ts`) and the in-tree `retrieveMemories()`
> implementation do not currently call `upsertSemanticMemoryPoint` or
> `searchSemanticMemory`. The Qdrant integration is feature-flagged via
> `qdrantEnabled` in settings, but at the time of writing the
> `searchSemanticMemory` results are not fused into retrieval — the
> `semantic`/`hybrid` retrieval strategies use SQLite FTS5 only. The settings UI
> in `dashboard/settings → MemorySkillsTab` exposes Qdrant config, health,
> search test, and cleanup, but the corresponding `/api/settings/qdrant`,
> `/api/settings/qdrant/health`, `/api/settings/qdrant/search`, and
> `/api/settings/qdrant/cleanup` routes are referenced from the UI but **not
> present** under `src/app/api/settings/qdrant/` (only `embedding-models/` is
> wired). Treat Qdrant as preview/optional plumbing.

## Memory Types

`MemoryType` (`src/lib/memory/types.ts`):

| Type         | Used for                                                     |
| ------------ | ------------------------------------------------------------ |
| `factual`    | Preferences, stable user facts, behavioral patterns          |
| `episodic`   | Decisions tied to a specific moment ("I chose Postgres")     |
| `procedural` | Workflow / how-to memory (reserved; no auto-extractor today) |
| `semantic`   | Reserved for vector-store entries                            |

`MemoryConfig` retrieval strategy is one of `exact`, `semantic`, or `hybrid`,
and scope is one of `session`, `apiKey`, or `global`. The default scope from
`getMemorySettings()` is `apiKey`.

## Fact Extraction (`extraction.ts`)

Extraction is **regex-based**, not LLM-based — it runs in-process with
`setImmediate()` so it never blocks the response stream:

- **Preference patterns** → `MemoryType.FACTUAL`
  (e.g. `I prefer …`, `I really like …`, `my favorite is …`, `I hate …`)
- **Decision patterns** → `MemoryType.EPISODIC`
  (e.g. `I'll use …`, `I chose …`, `I went with …`, `I'm going to adopt …`)
- **Pattern patterns** → `MemoryType.FACTUAL`
  (e.g. `I usually …`, `I always …`, `I tend to …`)

Each match is sanitised (`trim`, whitespace-collapse, capped at 500 chars),
deduplicated within the batch via a stable `factKey(category, content)`, and
stored via `createMemory()` with metadata
`{category, extractedAt, source: "llm_response"}`. Input text is capped at
64 KiB (`MAX_EXTRACTION_TEXT_LENGTH`) — when longer, the **tail** of the text
is used so the most recent assistant content always participates.

`extractFactsFromText(text)` is exported for tests and returns the structured
facts without storing them.

## Retrieval (`retrieval.ts`)

`retrieveMemories(apiKeyId, config)` is the main entry point. It:

1. Normalises and validates the config through `MemoryConfigSchema`.
2. Returns `[]` immediately when `enabled` is false or `maxTokens <= 0`.
3. Clamps `maxTokens` to `[1, 8000]`.
4. Detects whether the modern `memories` table exists (vs the legacy `memory`
   table) so older databases keep working.
5. Builds the base query with expiry guard
   (`expires_at IS NULL OR datetime(expires_at) > datetime('now')`), optional
   session scope, and optional `retentionDays` cutoff.
6. Branches on strategy:
   - **`exact`** (default): chronological `ORDER BY created_at DESC LIMIT 100`.
   - **`semantic`**: if `config.query` and `memory_fts` exists, JOIN
     `memory_fts MATCH ?` and order by FTS rank; fall back to chronological
     when FTS returns 0 rows.
   - **`hybrid`**: union of FTS results (higher relevance) and the
     chronological set, deduplicated by id.
7. Computes a keyword relevance score (`getRelevanceScore`) over
   `content`, `key`, and `metadata` JSON when a query is provided. Rows with
   zero score are filtered out.
8. Sorts by score desc, then `createdAt` desc.
9. Walks the ranked list and accepts entries while a running
   `estimateTokens(content)` (≈ `length / 4`) stays under the budget. Always
   returns at least one entry when any matched.

`estimateTokens` is exported and used by retrieval, summarisation, and the MCP
`omniroute_memory_search` tool.

## Injection (`injection.ts`)

`injectMemory(request, memories, provider)`:

1. Joins all memory contents into a single `Memory context: …` string.
2. Picks a strategy by provider name:
   - **System message** (default for OpenAI, Anthropic, Gemini, …) — prepends
     a `{role: "system", content: memoryText}` ahead of any existing system
     messages so user system prompts still take precedence.
   - **User message** (fallback) — for providers in
     `PROVIDERS_WITHOUT_SYSTEM_MESSAGE`: `o1`, `o1-mini`, `o1-preview`,
     `glm`, `glmt`, `glm-cn`, `zai`, `qianfan`. These reject the system role
     and would 400 otherwise (cf. issue #1701 for GLM/Zhipu).
3. Logs the count, strategy, and model under `memory.injection.injected`.

`providerSupportsSystemMessage(provider)` is exported for callers that need to
make routing decisions of their own. Unknown providers default to `true`
(system role allowed) for safety.

## Settings (`settings.ts`)

Memory configuration is **stored in the DB settings table**, not in env vars.
`getMemorySettings()` reads from `getSettings()` and caches the result
in-process; `invalidateMemorySettingsCache()` is called by the settings PUT
route after writes.

| DB key                | Type    | Default                                            | UI control                                      |
| --------------------- | ------- | -------------------------------------------------- | ----------------------------------------------- |
| `memoryEnabled`       | boolean | `true`                                             | Memory on/off                                   |
| `memoryMaxTokens`     | integer | `2000` (range `0–16000`)                           | Token budget for injection                      |
| `memoryRetentionDays` | integer | `30` (range `1–365`)                               | Retention window                                |
| `memoryStrategy`      | enum    | `"hybrid"` (one of `recent`, `semantic`, `hybrid`) | Retrieval strategy                              |
| `skillsEnabled`       | boolean | `false`                                            | Toggles per-key skill injection (see SKILLS.md) |

Note: the UI strategy `"recent"` maps to the internal `"exact"` retrieval
strategy via `toMemoryRetrievalConfig()` (chronological order).

Qdrant-related DB keys (`qdrantEnabled`, `qdrantHost`, `qdrantPort`,
`qdrantApiKey`, `qdrantCollection` default `"omniroute_memory"`,
`qdrantEmbeddingModel` default `"openai/text-embedding-3-small"`) are read by
`normalizeQdrantConfig()` in `qdrant.ts`.

No `MEMORY_*` or `QDRANT_*` env vars exist today — everything is per-instance
DB settings. `OMNIROUTE_MEMORY_MB` (commented out in `.env.example`) is
unrelated and refers to Node heap sizing.

## Summarisation (`summarization.ts`)

`summarizeMemories(apiKeyId, sessionId?, maxTokens = 4000)` compacts older
content when the running token total over a key's memories exceeds the
budget. It iterates rows DESC by `created_at`, keeps rows that fit, and for
the rest replaces `content` in place with the first three sentences of the
original. `tokensSaved` is the difference in `estimateTokens` between old and
new content.

This routine is **available but not called automatically** in the current
chat pipeline — call it from a cron, an admin action, or
`MemoryConfig.autoSummarize` glue if you need ongoing compaction. The data
loss is one-way: original text is overwritten.

## REST API

All endpoints require management auth (`requireManagementAuth`).

| Method   | Path                   | Description                                                                                                                                                                  |
| -------- | ---------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `GET`    | `/api/memory`          | Paginated list with filters: `apiKeyId`, `type`, `sessionId`, `q`, `limit`, `page`, `offset`. Response includes `stats.total` and `stats.byType`                             |
| `POST`   | `/api/memory`          | Create entry (Zod-validated: `content`, `key`, optional `type`, `sessionId`, `apiKeyId`, `metadata`, `expiresAt`). Calls `createMemory()` which upserts on `(apiKeyId, key)` |
| `GET`    | `/api/memory/[id]`     | Fetch a single entry by UUID                                                                                                                                                 |
| `DELETE` | `/api/memory/[id]`     | Delete an entry; returns 404 when missing                                                                                                                                    |
| `GET`    | `/api/memory/health`   | Runs `verifyExtractionPipeline("health-check")` — round-trip create→list→delete to confirm the store is alive. Returns `{working, latencyMs, error?}`                        |
| `GET`    | `/api/settings/memory` | Current normalised `MemorySettings`                                                                                                                                          |
| `PUT`    | `/api/settings/memory` | Update one or more of `enabled`, `maxTokens`, `retentionDays`, `strategy`, `skillsEnabled`                                                                                   |

The `/api/memory` list query supports either `page`-based pagination
(`parsePaginationParams`) **or** raw `offset` — when `offset` is present it
takes precedence and a derived `page` is computed for the response shape.

## MCP Tools (`open-sse/mcp-server/tools/memoryTools.ts`)

When the MCP server is enabled, three memory tools are registered:

- `omniroute_memory_search` — `{apiKeyId, query?, type?, maxTokens?, limit?}`
  → wraps `retrieveMemories()` with `retrievalStrategy: "exact"`, optionally
  filters by `type`, and reports `totalTokens`.
- `omniroute_memory_add` — `{apiKeyId, sessionId?, type, key, content,
metadata?}` → wraps `createMemory()`.
- `omniroute_memory_clear` — `{apiKeyId, type?, olderThan?}` → lists matching
  entries, optionally filters by created-before timestamp, then deletes each
  via `deleteMemory()`.

See [MCP-SERVER.md](./MCP-SERVER.md) for transport and scope details.

## Dashboard

`src/app/(dashboard)/dashboard/memory/page.tsx` provides:

- Real-time list, search, and pagination (debounced 300 ms).
- Type filter (`factual` / `episodic` / `procedural` / `semantic` / all).
- Add-memory modal (key, content, type).
- Delete per row.
- JSON export of the current page; JSON import via file picker.
- A green/red health dot driven by `GET /api/memory/health`.
- Stat cards: `totalEntries`, `tokensUsed`, `hitRate` (the latter two come
  from the API stats payload).

Memory and Qdrant settings live under
`/dashboard/settings → Memory & Skills` (`MemorySkillsTab.tsx`).

## Caching

`src/lib/memory/store.ts` keeps an in-process LRU-ish cache
(`MEMORY_CACHE_TTL = 5 min`, `MEMORY_MAX_CACHE_SIZE = 10 000`, with 20 %
oldest eviction) for `getMemory(id)` reads, plus a generic key/value
`memoryCache` layer (`src/lib/memory/cache.ts`) with `get`/`set`/`invalidate`
methods used by callers that want their own scoped cache (1 000-entry LRU,
default TTL 5 min).

## Privacy & Lifecycle

- Memory ownership is the API key id (`resolveMemoryOwnerId` in
  `chatCore.ts`). Without an `apiKeyInfo.id` neither retrieval nor injection
  nor extraction runs.
- Entries with a future `expires_at` are filtered out of retrieval; old
  entries beyond `retentionDays` are excluded by the
  `created_at >= cutoff` clause in `retrieveMemories`.
- For hard deletion, use `DELETE /api/memory/[id]` or `omniroute_memory_clear`.
- Extraction is fire-and-forget via `setImmediate`; failures are logged under
  `memory.extraction.background.failed` and never surface to the caller.
- Verification round-trips (`verifyExtractionPipeline`) clean up their own
  test entries in a `finally` block.

## See Also

- [SKILLS.md](./SKILLS.md) — the `skillsEnabled` setting injects tool
  definitions alongside memory.
- [MCP-SERVER.md](./MCP-SERVER.md) — MCP transport / scopes.
- [API_REFERENCE.md](../reference/API_REFERENCE.md) — broader API surface.
- [Tuto_Qdrant.md](../../Tuto_Qdrant.md) — repository-root Qdrant setup tutorial (integration currently dormant — see status banner at top of that file).
- Source modules:
  - `src/lib/memory/types.ts`, `schemas.ts`
  - `src/lib/memory/store.ts`, `retrieval.ts`, `injection.ts`
  - `src/lib/memory/extraction.ts`, `summarization.ts`, `verify.ts`
  - `src/lib/memory/settings.ts`, `qdrant.ts`, `cache.ts`
  - `src/lib/db/migrations/015_create_memories.sql`,
    `022_add_memory_fts5.sql`, `023_fix_memory_fts_uuid.sql`
  - `src/app/api/memory/route.ts`, `[id]/route.ts`, `health/route.ts`
  - `src/app/api/settings/memory/route.ts`
  - `open-sse/handlers/chatCore.ts` (injection / extraction wiring)
  - `open-sse/mcp-server/tools/memoryTools.ts`