mirror of https://github.com/bal-spec/sillytavern-character-memory.git synced 2026-04-28 03:39:44 +00:00

bal-spec dc0eab2638 v2.1.6 — UX redesign, injection viewer, unified editor, token breakdown

Complete rewrite of the UI and significant feature additions since v1.6.1.

UX Redesign (v2.0):
- Single-view dashboard replaces 4-tab sidebar
- Settings, Prompts, Troubleshooter, Memory Manager moved to center-screen modals
- Activity log in slide-out drawer
- Setup Wizard for first-run configuration
- Prompt version tracking with update notifications
- Health indicator in stats bar

Injection Viewer (v1.6–v2.1.6):
- Per-message injection data: see exactly what memories, lorebook entries,
  and extension prompts were injected for any generation
- Context/Prompt Breakdown with per-category token counts (System, Char card,
  Lorebook, Data Bank, Examples, Chat history) via ST Prompt Itemization
- Stacked bar visualization, token hints in headers, Tips popup
- Context overflow and heavy injection warnings

Memory Management:
- Unified block editor across all 5 editing surfaces (Memory Manager,
  Consolidation, Conversion, Reformat, Data Bank browser)
- Find & Replace with highlighting across all editors
- Undo support for all edit operations
- Group chat character picker in Memory Manager

Other features:
- Tablet & phone display modes with touch-friendly controls
- Topic-tagged memory format for better vector retrieval
- Self-closing memory tag handling (GLM-4.7 compatibility)
- Protect recent messages from extraction feedback loop
- 9-point health check system with retrieve chunks and score threshold
- Shared editor factory (editor.js), pure utility library (lib.js)
- Vitest test suite: unit, snapshot, and live LLM tests
- Full documentation suite in docs/

See CHANGELOG.md for detailed per-version notes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-07 15:20:14 -08:00

3.7 KiB

Raw Blame History

Automated Testing Design

Goal

Add automated testing for CharMemory's extraction pipeline using a 1000-message test chat fixture. Two tiers: deterministic snapshot tests for the processing pipeline, and live integration tests against a real LLM.

Current State

Vitest with 71 passing unit tests in test/unit/ covering lib.js pure functions
lib.js exports pure functions (parsing, serialization, escaping, format detection)
package.json has test:snapshot and test:live scripts wired up but no test files
index.js has extraction pipeline logic tightly coupled to SillyTavern globals
1000-message JSONL test chat at /Users/davidsayed/repos/st-test-chatlog/output/

Design

Step 1: Extract pure logic from `index.js` into `lib.js`

Three new functions:

stripNonDiegetic(text) — The 5 regex operations currently inline in collectRecentMessages() (lines 2031-2036). Removes code blocks, <details> sections, markdown tables, HTML tags, collapses excessive newlines.

formatChatMessages(chatArray, startIndex, endIndex) — Message filtering and formatting extracted from collectRecentMessages(). Takes a plain array of ST message objects, filters out empty/system-only messages, applies stripNonDiegetic(), returns formatted text. The caller (collectRecentMessages in index.js) handles reading from getContext() and passes the array in.

substitutePromptTemplate(template, vars) — Template variable substitution from buildExtractionPrompt(). Replaces {{charName}}, {{charCard}}, {{existingMemories}}, {{recentMessages}}, {{participants}}. The caller handles reading the template from settings and getting the character card from ST globals.

After extraction, index.js calls these lib.js functions. No behavior change.

Step 2: Snapshot tests (`npm run test:snapshot`)

File: test/integration/snapshot.test.js

Test fixture: Copy JSONL into test/fixtures/flux-chat.jsonl.

Tests:

stripNonDiegetic — Feed messages containing code blocks, tables, HTML, <details> sections. Snapshot the cleaned output.
formatChatMessages — Load the JSONL, process chunks (messages 0-20, 20-50). Snapshot the formatted text. Verifies filtering, stripping, and formatting stability.
substitutePromptTemplate — Build a prompt using processed messages, mock character card, empty existing memories. Snapshot the final prompt. Verifies the prompt the LLM receives is correct.
parseMemories round-trip — Parse a sample LLM response fixture, re-serialize, verify no data loss.

All deterministic. Run in milliseconds.

Step 3: Live LLM tests (`npm run test:live`)

File: test/integration/live.test.js

Flow: Load JSONL → formatChatMessages → substitutePromptTemplate → call LLM → parseMemories → assert quality.

LLM backend configured via env var: TEST_LLM_URL (default: http://127.0.0.1:1234/v1). Works with LM Studio, Ollama, KoboldCpp, llama.cpp.

Assertions (structural, not exact content):

Response contains at least 1 <memory> block
Each block has chat and date attributes
Each block has at least 1 bullet
No character card trait leakage (bullets don't parrot the character description)
Total bullet count is reasonable for the input size

File structure

test/
  fixtures/
    flux-chat.jsonl
  unit/                      (existing, unchanged)
    parsing.test.js
    escaping.test.js
    format-detection.test.js
    utils.test.js
  integration/
    snapshot.test.js
    live.test.js

Changes to existing files

lib.js — Add 3 exported functions
index.js — Replace inline logic with lib.js calls (refactor, no behavior change)
package.json — No changes needed (scripts already defined)

3.7 KiB Raw Blame History