supermemory/apps/docs/memorybench/memscore.mdx

---
title: "MemScore"
description: "A composite metric for comparing memory providers across quality, latency, and token efficiency"
---

## What is MemScore?

MemScore is a composite metric that captures three dimensions of memory provider performance in a single line:

```
accuracy% / latencyMs / contextTok
```

For example:

```
85% / 120ms / 1500tok
```

This tells you the provider achieved **85% accuracy**, with an average search latency of **120ms**, sending **1,500 tokens** of context to the answering model per question.

## Components

| Component | What it measures | Source |
|-----------|-----------------|--------|
| **Quality** | Answer accuracy as a percentage | `(correct / total) * 100` from judge evaluations |
| **Latency** | Average search response time in milliseconds | Mean of all search phase durations |
| **Tokens** | Average context tokens sent to the answering model | Client-side token count of retrieved context per question |

<Note>
MemScore is not a single number — it's a triple. This is intentional. Collapsing quality, latency, and cost into one score hides important tradeoffs. A provider with 90% accuracy at 5,000 tokens is very different from one with 90% accuracy at 500 tokens.
</Note>

## How token counting works

MemoryBench counts tokens client-side using provider-specific tokenizers:

| Model provider | Tokenizer | Method |
|----------------|-----------|--------|
| **OpenAI** | `js-tiktoken` | Exact count using `o200k_base` or `cl100k_base` encoding |
| **Anthropic** | `@anthropic-ai/tokenizer` | Exact count using Anthropic's tokenizer |
| **Google** | Approximation | `Math.ceil(text.length / 4)` |

Three token values are tracked per question:

- **`promptTokens`** — Total tokens in the full prompt (instructions + context + question)
- **`basePromptTokens`** — Tokens in the prompt without any retrieved context
- **`contextTokens`** — Tokens in just the retrieved context string

The MemScore uses `contextTokens` because it isolates what the memory provider actually contributed.

## Where MemScore appears

### CLI output

After a benchmark run completes, MemScore is printed in the summary:

```
SUMMARY:
  Total Questions: 50
  Correct: 43
  Accuracy: 86.00%

  Quality:  86%
  Latency:  145ms (avg)
  Tokens:   1,823 (avg context sent to answering model)

  MemScore: 86% / 145ms / 1823tok
```

### Web UI

The MemScore card appears at the top of the run overview page. Per-question token counts are shown next to each model answer in both the question list and detail views.

### Report JSON

The `report.json` file includes both a display string and structured components:

```json
{
  "memscore": "86% / 145ms / 1823tok",
  "memscoreComponents": {
    "quality": 86,
    "latencyMs": 145,
    "contextTokens": 1823
  },
  "tokens": {
    "totalTokens": 142500,
    "basePromptTokens": 21000,
    "contextTokens": 91150,
    "avgTokensPerQuestion": 2850,
    "avgBasePromptTokens": 420,
    "avgContextTokens": 1823
  }
}
```

Use `memscoreComponents` for programmatic comparisons — it avoids parsing the display string.

## Comparing providers

MemScore is most useful when comparing providers on the same benchmark:

```bash
bun run src/index.ts compare -p supermemory,mem0,zep -b locomo -j gpt-4o
```

Each provider's report will include its own MemScore, making it easy to see tradeoffs at a glance:

| Provider | MemScore |
|----------|----------|
| Provider A | `88% / 145ms / 1200tok` |
| Provider B | `82% / 80ms / 2400tok` |
| Provider C | `85% / 110ms / 1800tok` |

In this example, Provider A has the highest accuracy but the slowest search. Provider B is the fastest but sends the most context without achieving the best accuracy — suggesting its retrieval may be less precise. Provider C lands in the middle on all three axes. There's no single "winner" — the right choice depends on whether you prioritize quality, speed, or token efficiency.

## Backward compatibility

Runs from before MemScore was added will still work. If token data is not present in the checkpoint, the `memscore`, `memscoreComponents`, and `tokens` fields will be `undefined` in the report. The CLI and web UI gracefully skip the MemScore display when data is unavailable.