0.27.0 (#137)

* move memory agent to directory structure * chromadb settings rework * memory agent improvements embedding presets support switching embeddings without restart support custom sentence transformer embeddings * toggle to hide / show disabled clients * add memory debug tools * chromadb no longer needs its dedicated config entry * add missing emits * fix initial value * hidden disabled clients no longer cause enumeration issues with client actions * improve memory agent error handling and hot reloading * more memory agent error handling * DEBUG_MEMORY_REQUESTS off * relock * sim suite: fix issue with removing or changing characters * relock * fix issue where actor dialogue editor would break with multiple characters in the scene * remove cruft * implement interrupt function * margin adjustments * fix rubber banding issue in world editor when editing certain text fields * status notification when re-importing vectorb due to embeddings change * properly open new client context on agent actions * move jiggle apply to the end of prompt tune stack * narrator agent length limit and jiggle settings added - also improve post generation cleanup * progress story prompt improvements * narrator prompt and cleanup tweaks * prompt tweak * revert * autocomplete dialogue improvements * Unified process (#141) * progress to unified process * --dev arg * use gunicorn to serve built frontend * gunicorn config adjustments * remove dist from gitignore * revert * uvicorn instead * save decode * graceful shutdown * refactor unified process * clean up frontend log messages * more logging fixes * 0.27.0 * startup message * clean up scripts a bit * fixes to update.bat * fixes to install.bat * sim suite supports generation cancellation * debug * simplify narrator prompts * prompt tweaks * unified docker file * update docker compose config for unified docker file * cruft * fix startup in linux docker * download punkt so its available * prompt tweaks * fix bug when editing scene outline would wipe message history * add o1 models * add sampler, scheduler and cfg config to a1111 visualizer * update installation docs * visualizer configurable timeout * memory agent docs * docs * relock * relock * fix issue where changing embeddings on immutable scene would hang * remove debug message * take torch install out of poetry since conditionals don't work. * torch gets installed through some dependency so put it back into poetry, but reinstall with cuda if cuda support exists * fix install syntax * no need for torchvision * torch cuda install added to linux install script * add torch cuda install to update.bat * docs * docs * relock * fix install.sh * handle torch+cuda install in docker * docs * typo
2025-09-02 02:19:12 +00:00 · 2024-09-23 12:55:34 +03:00 · 2024-09-23 12:55:34 +03:00 · bb1cf6941b
commit bb1cf6941b
parent 2c8b4b8186
95 changed files with 4339 additions and 2721 deletions
--- a/86
+++ b/86
@ -0,0 +1,86 @@
 # Stage 1: Frontend build
 FROM node:21 AS frontend-build
 ENV NODE_ENV=development
 WORKDIR /app
 # Copy the frontend directory contents into the container at /app
 COPY ./talemate_frontend /app
 # Install all dependencies and build
 RUN npm install && npm run build
 # Stage 2: Backend build
 FROM python:3.11-slim AS backend-build
 WORKDIR /app
 # Install system dependencies
 RUN apt-get update && apt-get install -y \
    bash \
    gcc \
    && rm -rf /var/lib/apt/lists/*
 # Install poetry
 RUN pip install poetry
 # Copy poetry files
 COPY pyproject.toml poetry.lock* /app/
 # Create a virtual environment
 RUN python -m venv /app/talemate_env
 # Activate virtual environment and install dependencies
 RUN . /app/talemate_env/bin/activate && \
    poetry config virtualenvs.create false && \
    poetry install --no-dev --no-root
 # Copy the Python source code
 COPY ./src /app/src
 # Conditional PyTorch+CUDA install
 ARG CUDA_AVAILABLE=false
 RUN . /app/talemate_env/bin/activate && \
    if [ "$CUDA_AVAILABLE" = "true" ]; then \
        echo "Installing PyTorch with CUDA support..." && \
        pip uninstall torch torchaudio -y && \
        pip install torch~=2.4.1 torchaudio~=2.4.1 --index-url https://download.pytorch.org/whl/cu121; \
    fi
 # Stage 3: Final image
 FROM python:3.11-slim
 WORKDIR /app
 RUN apt-get update && apt-get install -y \
    bash \
    && rm -rf /var/lib/apt/lists/*
 # Copy virtual environment from backend-build stage
 COPY --from=backend-build /app/talemate_env /app/talemate_env
 # Copy Python source code
 COPY --from=backend-build /app/src /app/src
 # Copy Node.js build artifacts from frontend-build stage
 COPY --from=frontend-build /app/dist /app/talemate_frontend/dist
 # Copy the frontend WSGI file if it exists
 COPY frontend_wsgi.py /app/frontend_wsgi.py
 # Copy base config
 COPY config.example.yaml /app/config.yaml
 # Copy essentials
 COPY scenes templates chroma* /app/
 # Set PYTHONPATH to include the src directory
 ENV PYTHONPATH=/app/src:$PYTHONPATH
 # Make ports available to the world outside this container
 EXPOSE 5050
 EXPOSE 8080
 # Use bash as the shell, activate the virtual environment, and run backend server
 CMD ["poetry run src/talemate/server/run.py runserver --host 0.0.0.0 --port 5050 --frontend-host 0.0.0.0 --frontend-port 8080"]
--- a/Dockerfile.backend
+++ b/Dockerfile.backend
@ -1,25 +0,0 @@
 # Use an official Python runtime as a parent image
 FROM python:3.11-slim
 # Set the working directory in the container
 WORKDIR /app
 # Copy the current directory contents into the container at /app
 COPY ./src /app/src
 # Copy poetry files
 COPY pyproject.toml /app/
 # If there's a poetry lock file, include the following line
 COPY poetry.lock /app/
 # Install poetry
 RUN pip install poetry
 # Install dependencies
 RUN poetry install --no-dev
 # Make port 5050 available to the world outside this container
 EXPOSE 5050
 # Run backend server
 CMD ["poetry", "run", "python", "src/talemate/server/run.py", "runserver", "--host", "0.0.0.0", "--port", "5050"]
--- a/Dockerfile.frontend
+++ b/Dockerfile.frontend
@ -1,23 +0,0 @@
 # Use an official node runtime as a parent image
 FROM node:20
 # Make sure we are in a development environment (this isn't a production ready Dockerfile)
 ENV NODE_ENV=development
 # Echo that this isn't a production ready Dockerfile
 RUN echo "This Dockerfile is not production ready. It is intended for development purposes only."
 # Set the working directory in the container
 WORKDIR /app
 # Copy the frontend directory contents into the container at /app
 COPY ./talemate_frontend /app
 # Install all dependencies
 RUN npm install
 # Make port 8080 available to the world outside this container
 EXPOSE 8080
 # Run frontend server
 CMD ["npm", "run", "serve"]
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -1,27 +1,21 @@
 version: '3.8'
 services:
-  talemate-backend:
+  talemate:
    build:
      context: .
-      dockerfile: Dockerfile.backend
+      dockerfile: Dockerfile
      args:
        - CUDA_AVAILABLE=${CUDA_AVAILABLE:-false}
    ports:
-      - "5050:5050"
+      - "${FRONTEND_PORT:-8080}:8080"
      - "${BACKEND_PORT:-5050}:5050"
    volumes:
      # can uncomment for dev purposes
      #- ./src/talemate:/app/src/talemate
      - ./config.yaml:/app/config.yaml
      - ./scenes:/app/scenes
      - ./templates:/app/templates
      - ./chroma:/app/chroma
    environment:
      - PYTHONUNBUFFERED=1
-  
+      - PYTHONPATH=/app/src:$PYTHONPATH
-  talemate-frontend:
+    command: ["/bin/bash", "-c", "source /app/talemate_env/bin/activate && python src/talemate/server/run.py runserver --host 0.0.0.0 --port 5050 --frontend-host 0.0.0.0 --frontend-port 8080"]
    build:
      context: .
      dockerfile: Dockerfile.frontend
    ports:
      - "8080:8080"
    #volumes:
    #  - ./talemate_frontend:/app
--- a/docs/getting-started/installation/docker.md
+++ b/docs/getting-started/installation/docker.md
@ -10,7 +10,12 @@
 1. copy config file
    1. linux: `cp config.example.yaml config.yaml` 
    1. windows: `copy config.example.yaml config.yaml`
-1. `docker compose up`
+1. If your host has a CUDA compatible Nvidia GPU
    1. Windows (via PowerShell): `$env:CUDA_AVAILABLE="true"; docker compose up`
    1. Linux: `CUDA_AVAILABLE=true docker compose up`
 1. If your host does **NOT** have a CUDA compatible Nvidia GPU
    1. Windows: `docker compose up`
    1. Linux: `docker compose up`
 1. Navigate your browser to http://localhost:8080
 !!! note
--- a/docs/getting-started/installation/linux.md
+++ b/docs/getting-started/installation/linux.md
@ -2,13 +2,21 @@
 ## Quick install instructions
 !!! warning
-    python 3.12 and node.js v21 are currently not supported.
+    python 3.12 is currently not supported.
 ### Dependencies
 1. node.js and npm - see instructions [here](https://nodejs.org/en/download/package-manager/)
 1. python 3.10 or 3.11 - see instructions [here](https://www.python.org/downloads/)
 ### Installation
 1. `git clone https://github.com/vegu-ai/talemate.git`
 1. `cd talemate`
 1. `source install.sh`
-1. Start the backend: `python src/talemate/server/run.py runserver --host 0.0.0.0 --port 5050`.
+    - When asked if you want to install pytorch with CUDA support choose `y` if you have
-1. Open a new terminal, navigate to the `talemate_frontend` directory, and start the frontend server by running `npm run serve`.
+        a CUDA compatible Nvidia GPU and have installed the necessary drivers.
 1. `source start.sh`
 If everything went well, you can proceed to [connect a client](../../connect-a-client).
--- a/docs/getting-started/installation/windows.md
+++ b/docs/getting-started/installation/windows.md
@ -1,10 +1,11 @@
 ## Quick install instructions
 !!! warning
-    python 3.12 and node.js v21 are currently not supported.
+    python 3.12 is currently not supported
 1. Download and install Python 3.10 or Python 3.11 from the [official Python website](https://www.python.org/downloads/windows/).
-1. Download and install Node.js v20 from the [official Node.js website](https://nodejs.org/en/download/). This will also install npm.
+    - [Click here for direct link to python 3.11.9 download](https://www.python.org/downloads/release/python-3119/)
 1. Download and install Node.js from the [official Node.js website](https://nodejs.org/en/download/prebuilt-installer). This will also install npm.
 1. Download the Talemate project to your local machine. Download from [the Releases page](https://github.com/vegu-ai/talemate/releases).
 1. Unpack the download and run `install.bat` by double clicking it. This will set up the project on your local machine.
 1. Once the installation is complete, you can start the backend and frontend servers by running `start.bat`.
@ -17,13 +18,12 @@ If everything went well, you can proceed to [connect a client](../../connect-a-c
 ### How to Install Python 3.10 or 3.11
 1. Visit the official Python website's download page for Windows at [https://www.python.org/downloads/windows/](https://www.python.org/downloads/windows/).
-2. Click on the link for the Latest Python 3 Release - Python 3.10.x.
+2. Find the latest version of Python 3.10 or 3.11 and click on one of the download links. (You will likely want the Windows installer (64-bit))
 3. Scroll to the bottom and select either Windows x86-64 executable installer for 64-bit or Windows x86 executable installer for 32-bit.
 4. Run the installer file and follow the setup instructions. Make sure to check the box that says Add Python 3.10 to PATH before you click Install Now.
 ### How to Install npm
-1. Download Node.js from the official site [https://nodejs.org/en/download/](https://nodejs.org/en/download/).
+1. Download Node.js from the official site [https://nodejs.org/en/download/prebuilt-installer](https://nodejs.org/en/download/prebuilt-installer).
 2. Run the installer (the .msi installer is recommended).
 3. Follow the prompts in the installer (Accept the license agreement, click the NEXT button a bunch of times and accept the default installation settings).
--- a/docs/getting-started/load-a-scene.md
+++ b/docs/getting-started/load-a-scene.md
@ -6,6 +6,9 @@ To load the introductory `Infinity Quest` scenario, simply click on its entry in
 ![Load infinity quest](/talemate/img/0.26.0/getting-started-load-screen.png)
 !!! info "First time may take a moment"
    When you load the a scenario for the first time, Talemate will need to initialize the long term memory model. Which likely means a download. Just be patient and it will be ready soon.
 ## Interacting with the scenario
 After a moment of loading, you will see the scenario's introductory message and be able to send a text interaction.
--- a/docs/img/0.27.0/automatic1111-settings.png
+++ b/docs/img/0.27.0/automatic1111-settings.png
--- a/docs/img/0.27.0/cancel-generation.png
+++ b/docs/img/0.27.0/cancel-generation.png
--- a/docs/img/0.27.0/embedding-settings-edit.png
+++ b/docs/img/0.27.0/embedding-settings-edit.png
--- a/docs/img/0.27.0/embedding-settings-new-1.png
+++ b/docs/img/0.27.0/embedding-settings-new-1.png
--- a/docs/img/0.27.0/memory-agent-settings.png
+++ b/docs/img/0.27.0/memory-agent-settings.png
--- a/docs/img/0.27.0/open-debug-tools.png
+++ b/docs/img/0.27.0/open-debug-tools.png
--- a/docs/img/0.27.0/testing-memory-1.png
+++ b/docs/img/0.27.0/testing-memory-1.png
--- a/docs/img/0.27.0/testing-memory-2.png
+++ b/docs/img/0.27.0/testing-memory-2.png
--- a/docs/img/0.27.0/visual-agent-settings.png
+++ b/docs/img/0.27.0/visual-agent-settings.png
--- a/docs/user-guide/agents/memory/chromadb.md
+++ b/docs/user-guide/agents/memory/chromadb.md
@ -1,60 +0,0 @@
 # ChromaDB
 Talemate uses ChromaDB to maintain long-term memory. The default embeddings used are really fast but also not incredibly accurate. If you want to use more accurate embeddings you can use the instructor embeddings or the openai embeddings. See below for instructions on how to enable these.
 In my testing so far, instructor-xl has proved to be the most accurate (even more-so than openai)
 ### Local instructor embeddings
 If you want chromaDB to use the more accurate (but much slower) instructor embeddings add the following to `config.yaml`:
 **Note**: The `xl` model takes a while to load even with cuda. Expect a minute of loading time on the first scene you load.
 ```yaml
 chromadb:
    embeddings: instructor
    instructor_device: cpu
    instructor_model: hkunlp/instructor-xl
 ```
 ### Instructor embedding models
 - `hkunlp/instructor-base` (smallest / fastest)
 - `hkunlp/instructor-large` 
 - `hkunlp/instructor-xl` (largest / slowest) - requires about 5GB of memory
 You will need to restart the backend for this change to take effect.
 **NOTE** - The first time you do this it will need to download the instructor model you selected. This may take a while, and the talemate backend will be un-responsive during that time.
 Once the download is finished, if talemate is still un-responsive, try reloading the front-end to reconnect. When all fails just restart the backend as well. I'll try to make this more robust in the future.
 ### GPU support
 If you want to use the instructor embeddings with GPU support, you will need to install pytorch with CUDA support. 
 To do this on windows, run `install-pytorch-cuda.bat` from the project directory. Then change your device in the config to `cuda`:
 ```yaml
 chromadb:
    embeddings: instructor
    instructor_device: cuda
    instructor_model: hkunlp/instructor-xl
 ```
 ## OpenAI embeddings
 First make sure your openai key is specified in the `config.yaml` file
 ```yaml
 openai:
  api_key: <your-key-here>
 ```
 Then add the following to `config.yaml` for chromadb:
 ```yaml
 chromadb:
    embeddings: openai
    openai_model: text-embedding-3-small
 ```
--- a/docs/user-guide/agents/memory/embeddings.md
+++ b/docs/user-guide/agents/memory/embeddings.md
@ -0,0 +1,81 @@
 # Embeddings
 You can manage your available embeddings through the application settings.
 ![Open settings](/talemate/img/0.26.0/open-settings.png)
 In the settings dialogue go to **:material-tune: Presets** and then **:material-cube-unfolded: Embeddings**.
 ## Pre-configured Embeddings
 ### all-MiniLM-L6-v2
 The default ChromaDB embedding. Also the default for the Memory agent unless changed.
 Fast, but the least accurate.
 ### Alibaba-NLP/Gte-Base-En-V1.5
 Sentence transformer model that is decently fast and accurate and will likely become the default for the Memory agent in the future.
 ### Instructor Models
 Instructor embeddings, coming in three sizes: `base`, `large`, and `xl`. XL is the most accurate but also has the biggest memory footprint and is the slowest. Using `cuda` is recommended for the `xl` and `large` models.
 ### OpenAI text-embedding-3-small
 OpenAI's current text embedding model. Fast and accurate, but not free.
 ## Adding an Embedding
 You can add new embeddings by clicking the **:material-plus: Add new** button.
 Select the embedding type and then enter the model name. When using sentence-transformer, make sure the modelname matches the name of the model repository on Huggingface, so for example `Alibaba-NLP/gte-base-en-v1.5`.
 ![Add new embedding](/talemate/img/0.27.0/embedding-settings-new-1.png)
 !!! warning "New embeddings require a download"
    When you add a new embedding model and use it for the first time in the Memory agent, Talemate will download the model from Huggingface. This can take a while, depending on the size of the model and your internet connection.
    You can track the download in the talemate process window. A better UX based download progress bar is planned for a future release.
 ## Editing an Embedding
 ![Edit embedding](/talemate/img/0.27.0/embedding-settings-edit.png)
 Select the existing embedding from the left side bar and you may change the following properties:
 ##### Trust Remote Code
 For custom sentence-transformer models, you may need to toggle this on. This can be a security risk, so only do this if you trust the model's creator. It basically allows remote code execution.
 !!! warning
    Only trust models from reputable sources.
 ##### Device
 The device to use for the embeddings. This can be either `cpu` or `cuda`. Note that this can also be overridden in the Memory agent settings.
 ##### Distance
 The maximum distance for results to be considered a match. Different embeddings may require different distances, so if you find low accuracy, try changing this value.
 ##### Distance Mod
 A multiplier for the distance. This can be used to fine-tune the distance without changing the actual distance value. Generally you should leave this at 1.
 ##### Distance Function
 The function to use for calculating the distance. The default is `Cosine Similarity`, but you can also use `Inner Product` or `Squared L2`. The selected embedding may require a specific distance function, so if you find low accuracy, try changing this value.
 ##### Fast
 This is just a tag to mark the embedding as fast. It doesn't actually do anything, but can be useful for sorting later on.
 ##### GPU Recommendation
 This is a tag to mark the embedding as needing a GPU. It doesn't actually do anything, but can be useful for sorting later on.
 ##### Local
 This is a tag to mark the embedding as local. It doesn't actually do anything, but can be useful for sorting later on.
--- a/docs/user-guide/agents/memory/index.md
+++ b/docs/user-guide/agents/memory/index.md
@ -2,4 +2,4 @@
 Manages long term memory via embeddings.
-Currently only supports [ChromaDB](/talemate/user-guide/agents/memory/chromadb) as a memory story.
+Currently only supports ChromaDB as a backend, but support for additional backends is planned.
--- a/docs/user-guide/agents/memory/settings.md
+++ b/docs/user-guide/agents/memory/settings.md
@ -0,0 +1,14 @@
 # Settings
 ![Memory agent settings](/talemate/img/0.27.0/memory-agent-settings.png)
 ##### Embeddings
 Select which embedding to use. Embeddings themselves are managed through the [Application Settings](/talemate/agents/memory/embeddings).
 !!! info "openAI"
    If you are using the OpenAI API, you will need to have an API key and set it in the application config. See [here](/apis/openai.md) for setting up the OpenAI API key.
 ###### Device
 The device to use for the embeddings. This can be either `cpu` or `cuda`.
--- a/docs/user-guide/agents/memory/testing.md
+++ b/docs/user-guide/agents/memory/testing.md
@ -0,0 +1,23 @@
 # Testing Embeddings
 You can test the performance of the selected embedding, by using talemate normally and then inspecting the memory request in the debug tools view.
 ![Open debug tools](/talemate/img/0.27.0/open-debug-tools.png)
 Once the debug tools are open, select the :material-processor: Memory tab.
 Then wait for the next talemate generation (for example conversation) and all the memory requests will be shown in the list.
 ![Testing memory 1](/talemate/img/0.27.0/testing-memory-1.png)
 In this particular example we are asking Kaira when we first met, and the expectation is for the memory agent to find and return the memory of the first meeting.
 Click the memory request to see the details.
 ![Testing memory 2](/talemate/img/0.27.0/testing-memory-2.png)
 Up to 10 results are shown, however only those that fall within the acceptable distance are included in the context. 
 Selected entries will have their distance function colored green, while the others will be going from yellow to red.
 If you find that accuracy is lacking you may need to tweak the [Embedding settings](/talemate/user-guide/agents/memory/embeddings).
--- a/docs/user-guide/agents/visualizer/automatic1111.md
+++ b/docs/user-guide/agents/visualizer/automatic1111.md
@ -15,7 +15,7 @@ Once your AUTOAMTIC1111 API is running (check with your browser) you can set the
 ## Settings
-![Visual agent automatic1111 settings](/talemate/img/0.26.0/visual-agent-a1111-settings.png)
+![Visual agent automatic1111 settings](/talemate/img/0.27.0/automatic1111-settings.png)
 ##### API URL
@ -25,6 +25,18 @@ The url of the API, if following this example, should be `http://localhost:7861`
 The number of steps to use for image generation. More steps will result in higher quality images but will take longer to generate.
 ##### Sampling Method
 Which sampling method to use for image generation. 
 ##### Schedule Type
 Which scheduler to use for image generation.
 ##### CFG Scale
 CFG scale for image generation.
 ##### Model type
 Differentiates between `SD1.5` and `SDXL` models. This will dictate the resolution of the image generation and actually matters for the quality so make sure this is set to the correct model type for the model you are using.
--- a/docs/user-guide/agents/visualizer/settings.md
+++ b/docs/user-guide/agents/visualizer/settings.md
@ -1,6 +1,6 @@
 # Settings
-![Visual agent settings](/talemate/img/0.26.0/visual-agent-settings.png)
+![Visual agent settings](/talemate/img/0.27.0/visual-agent-settings.png)
 ##### Client
@ -27,6 +27,10 @@ The style to use for image generation. Prompts will be automatically adjusted to
 More styles will be added in the future and support for custom styles will be added as well.
 ##### Image generation timeout
 The maximum time to wait for image generation to complete. If the image generation takes longer than this, it will be cancelled. Defaults to 300 seconds.
 ##### Automatic Setup
 Some clients support both text and image generation. If this setting is enabled, the visualizer will automatically set itself up to use the backend of the client you have selected. This is currently only supported by KoboldCpp.
--- a/docs/user-guide/interacting.md
+++ b/docs/user-guide/interacting.md
@ -56,3 +56,13 @@ You can turn this off by disabling the auto progress setting, either in the game
 ![Tool bar](/talemate/img/0.26.0/getting-started-ui-element-tools.png)
 A set of tools to help you interact with the scenario. Find out more about the various actions in the [Scene Tools](/talemate/user-guide/scenario-tools) section of the user guide.
 ## Cancel Generation
 Sometimes Talemate will be generating a response (or go through a chain of generations) and you want to cancel it. You can do this by hitting the **:material-stop-circle-outline:** button that will appear in the scene tools bar.
 ![Cancel generation](/talemate/img/0.27.0/cancel-generation.png)
 !!! info
    While the generation is cancelled immediately, the current inference request will still be processed by the LLM backend. The Talemate UI will be responsive but the LLM api may require some time to finish the request.
--- a/frontend_wsgi.py
+++ b/frontend_wsgi.py
@ -0,0 +1,28 @@
 import os
 from fastapi import FastAPI, Request
 from fastapi.responses import HTMLResponse, FileResponse
 from fastapi.staticfiles import StaticFiles
 from starlette.exceptions import HTTPException
 # Get the directory of the current file
 current_dir = os.path.dirname(os.path.abspath(__file__))
 # Construct the path to the dist directory
 dist_dir = os.path.join(current_dir, "talemate_frontend", "dist")
 app = FastAPI()
 # Serve static files, but exclude the root path
 app.mount("/", StaticFiles(directory=dist_dir, html=True), name="static")
@app.get("/", response_class=HTMLResponse)
 async def serve_root():
    index_path = os.path.join(dist_dir, "index.html")
    if os.path.exists(index_path):
        with open(index_path, "r") as f:
            content = f.read()
        return HTMLResponse(content=content)
    else:
        raise HTTPException(status_code=404, detail="index.html not found")
 # This is the ASGI application
 application = app
--- a/install-pytorch-cuda.bat
+++ b/install-pytorch-cuda.bat
@ -1,6 +0,0 @@
 REM activate the virtual environment
 call talemate_env\Scripts\activate
 REM install pytouch+cuda
 pip uninstall torch -y
 pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
--- a/install.bat
+++ b/install.bat
@ -48,12 +48,33 @@ python -m pip install "poetry==1.7.1" "rapidfuzz>=3" -U
 REM use poetry to install dependencies
 python -m poetry install
 REM installing torch
 echo Installiing PyTorch... 
 echo Checking for CUDA availability...
 REM we use nvcc to check for CUDA availability
 REM if cuda exists: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
 nvcc --version >nul 2>&1
 IF ERRORLEVEL 1 (
    echo CUDA not found. Keeping PyTorch installation without CUDA support...
 ) ELSE (
    echo CUDA found. Installing PyTorch with CUDA support...
    REM uninstalling existing torch, torchvision, torchaudio
    python -m pip uninstall torch torchaudio -y
    python -m pip install torch~=2.4.1 torchaudio~=2.4.1 --index-url https://download.pytorch.org/whl/cu121
 )
 REM copy config.example.yaml to config.yaml only if config.yaml doesn't exist
 IF NOT EXIST config.yaml copy config.example.yaml config.yaml
 REM navigate to the frontend directory
 echo Installing frontend dependencies...
 cd talemate_frontend
-npm install
+call npm install
 echo Building frontend...
 call npm run build
 REM return to the root directory
 cd ..
--- a/install.sh
+++ b/install.sh
@ -1,26 +1,49 @@
 #!/bin/bash
 # create a virtual environment
 echo "Creating a virtual environment..."
 python3 -m venv talemate_env
 # activate the virtual environment
 echo "Activating the virtual environment..."
 source talemate_env/bin/activate
 # install poetry
 echo "Installing poetry..."
 pip install poetry
 # use poetry to install dependencies
 echo "Installing dependencies..."
 poetry install
 # get input on whether to install torch with CUDA support
 read -p "Do you want to install PyTorch with CUDA support? (y/n): " cuda
 # install torch with CUDA support if the user wants to
 # pip install torch~=2.4.1 torchaudio~=2.4.1 --index-url https://download.pytorch.org/whl/cu121
 # if not, torch with cpu is alrady installed so nothing needs to be done
 if [ $cuda == "y" ]; then
    echo "Installing PyTorch with CUDA support..."
    # uninstall torch and torchaudio
    pip uninstall torch torchaudio -y
    pip install torch~=2.4.1 torchaudio~=2.4.1 --index-url https://download.pytorch.org/whl/cu121
 fi
 # copy config.example.yaml to config.yaml only if config.yaml doesn't exist
 if [ ! -f config.yaml ]; then
    echo "Copying config.example.yaml to config.yaml..."
    cp config.example.yaml config.yaml
 fi
 # navigate to the frontend directory
 echo "Updating the frontend..."
 cd talemate_frontend
 npm install
 # build the frontend
 echo "Building the frontend..."
 npm run build
 # return to the root directory
 cd ..
--- a/poetry.lock
+++ b/poetry.lock
--- a/pyproject.toml
+++ b/pyproject.toml
@ -4,7 +4,7 @@ build-backend = "poetry.masonry.api"
 [tool.poetry]
 name = "talemate"
-version = "0.25.6"
+version = "0.27.0"
 description = "AI-backed roleplay and narrative tools"
 authors = ["FinalWombat"]
 license = "GNU Affero General Public License v3.0"
@ -50,10 +50,11 @@ numpy = "^1"
 # ChromaDB
 chromadb = ">=0.4.17,<1"
 InstructorEmbedding = "^1.0.1"
-torch = ">=2.1.0"
+torch = "^2.4.0"
-torchaudio = ">=2.3.0"
+torchaudio = "^2.4.0"
 # locked for instructor embeddings
-sentence-transformers="==2.2.2"
+#sentence-transformers="==2.2.2"
 sentence_transformers=">=2.7.0"
 [tool.poetry.dev-dependencies]
 pytest = "^6.2"
--- a/scenes/simulation-suite/game.py
+++ b/scenes/simulation-suite/game.py
@ -448,7 +448,7 @@ def game(TM):
        def call_remove_ai_character(self, call:str, inject:str) -> str:
            TM.signals.status("busy", "Simulation suite removing character.", as_scene_message=True)
-            character_name = TM.agents.creator.determine_character_name(instructions=f"{inject} - what is the name of the character being removed?", allowed_names=TM.scene.npc_character_names())
+            character_name = TM.agents.creator.determine_character_name(instructions=f"{inject} - what is the name of the character being removed?", allowed_names=TM.scene.npc_character_names)
            npc = TM.scene.get_character(character_name)
@ -466,7 +466,7 @@ def game(TM):
        def call_change_ai_character(self, call:str, inject:str) -> str:
            TM.signals.status("busy", "Simulation suite altering character.", as_scene_message=True)
-            character_name = TM.agents.creator.determine_character_name(instructions=f"{inject} - what is the name of the character receiving the changes (before the change)?", allowed_names=TM.scene.npc_character_names())
+            character_name = TM.agents.creator.determine_character_name(instructions=f"{inject} - what is the name of the character receiving the changes (before the change)?", allowed_names=TM.scene.npc_character_names)
            if character_name in self.added_npcs:
                # we dont want to change the character if it was just added
@ -577,4 +577,14 @@ def game(TM):
            TM.signals.status("success", "Simulation suite updated world state.", as_scene_message=True)
    SimulationSuite().run()
-    
+    
 def on_generation_cancelled(TM, exc):
    """
    Called when user pressed the cancel button during the simulation suite
    loop.
    """
    TM.signals.status("success", "Simulation suite instructions cancelled", as_scene_message=True)
    rounds = TM.game_state.get_var("instr.rounds", 0)
    TM.log.debug("SIMULATION SUITE: command cancelled", rounds=rounds)
--- a/src/talemate/init.py
+++ b/src/talemate/init.py
@ -2,4 +2,6 @@ from .agents import Agent
 from .client import TextGeneratorWebuiClient
 from .tale_mate import *
-VERSION = "0.26.0"
+from .version import VERSION
 __version__ = VERSION
--- a/src/talemate/agents/base.py
+++ b/src/talemate/agents/base.py
@ -17,6 +17,10 @@ import talemate.util as util
 from talemate.agents.context import ActiveAgent
 from talemate.emit import emit
 from talemate.events import GameLoopStartEvent
 from talemate.context import active_scene
 from talemate.client.context import (
    ClientContext
 )
 __all__ = [
    "Agent",
@ -81,17 +85,23 @@ def set_processing(fn):
    @wraps(fn)
    async def wrapper(self, *args, **kwargs):
-        with ActiveAgent(self, fn):
+        with ClientContext():
-            try:
+            scene = active_scene.get()
-                await self.emit_status(processing=True)
+            
-                return await fn(self, *args, **kwargs)
+            if scene:
-            finally:
+                scene.continue_actions()
            with ActiveAgent(self, fn):
                try:
-                    await self.emit_status(processing=False)
+                    await self.emit_status(processing=True)
-                except RuntimeError as exc:
+                    return await fn(self, *args, **kwargs)
-                    # not sure why this happens
+                finally:
-                    # some concurrency error?
+                    try:
-                    log.error("error emitting agent status", exc=exc)
+                        await self.emit_status(processing=False)
                    except RuntimeError as exc:
                        # not sure why this happens
                        # some concurrency error?
                        log.error("error emitting agent status", exc=exc)
    return wrapper
--- a/src/talemate/agents/context.py
+++ b/src/talemate/agents/context.py
@ -1,4 +1,5 @@
 import contextvars
 import uuid
 from typing import TYPE_CHECKING, Callable
 import pydantic
@ -14,6 +15,7 @@ class ActiveAgentContext(pydantic.BaseModel):
    agent: object
    fn: Callable
    agent_stack: list = pydantic.Field(default_factory=list)
    agent_stack_uid: str | None = None
    class Config:
        arbitrary_types_allowed = True
@ -36,8 +38,10 @@ class ActiveAgent:
        if previous_agent:
            self.agent.agent_stack = previous_agent.agent_stack + [str(self.agent)]
            self.agent.agent_stack_uid = previous_agent.agent_stack_uid
        else:
            self.agent.agent_stack = [str(self.agent)]
            self.agent.agent_stack_uid = str(uuid.uuid4())
        self.token = active_agent.set(self.agent)
--- a/src/talemate/agents/conversation.py
+++ b/src/talemate/agents/conversation.py
@ -106,7 +106,7 @@ class ConversationAgent(Agent):
                        min=32,
                        max=512,
                        step=32,
-                    ),  #
+                    ),
                    "instructions": AgentActionConfig(
                        type="text",
                        label="Instructions",
--- a/src/talemate/agents/memory/init.py
+++ b/src/talemate/agents/memory/init.py
@ -2,20 +2,31 @@ from __future__ import annotations
 import asyncio
 import functools
-import os
+import hashlib
-import shutil
+import uuid
-from typing import TYPE_CHECKING, Callable, List, Optional, Union
+from typing import Callable
 import structlog
 from chromadb.config import Settings
 import talemate.events as events
 import talemate.util as util
-from talemate.agents.base import set_processing
+from talemate.agents.base import (
    Agent,
    AgentAction,
    AgentActionConfig,
    AgentDetail,
    set_processing,
 )
 from talemate.config import load_config
-from talemate.context import scene_is_loading
+from talemate.context import scene_is_loading, active_scene
 from talemate.emit import emit
 from talemate.emit.signals import handlers
 from talemate.agents.memory.context import memory_request, MemoryRequest
 from talemate.agents.memory.exceptions import (
    EmbeddingsModelLoadError,
    SetDBError,
 )
 try:
    import chromadb
@ -30,8 +41,7 @@ if not chromadb:
    log.info("ChromaDB not found, disabling Chroma agent")
-from .base import Agent, AgentDetail
+from talemate.agents.registry import register
 class MemoryDocument(str):
    def __new__(cls, text, meta, id, raw):
@ -51,7 +61,44 @@ class MemoryAgent(Agent):
    """
    agent_type = "memory"
-    verbose_name = "Long-term memory"
+    verbose_name = "Memory"
    def __init__(self, scene, **kwargs):
        self.db = None
        self.scene = scene
        self.memory_tracker = {}
        self.config = load_config()
        self._ready_to_add = False
        handlers["config_saved"].connect(self.on_config_saved)
        self.actions = {
            "_config": AgentAction(
                enabled=True,
                label="Configure",
                description="Memory agent configuration",
                config={
                    "embeddings": AgentActionConfig(
                        type="text",
                        value="default",
                        label="Embeddings",
                        choices=self.get_presets,
                        description="Which embeddings to use",
                    ),
                    "device": AgentActionConfig(
                        type="text",
                        value="cpu",
                        label="Device",
                        description="Which device to use for embeddings (for local embeddings)",
                        note="Making changes to the embeddings or the device while a scene is loaded will cause the memory database to be re-imported. Depending on the size of the model and scene this may take a while.",
                        choices=[
                            {"value": "cpu", "label": "CPU"},
                            {"value": "cuda", "label": "CUDA"},
                        ]
                    ),
                },
            ),
        }
    @property
    def readonly(self):
@ -65,29 +112,130 @@ class MemoryAgent(Agent):
    @property
    def db_name(self):
        raise NotImplementedError()
    @property
    def get_presets(self):
        return [
            {"value": k, "label": f"{v['embeddings']}: {v['model']}"} for k,v in self.config.get("presets", {}).get("embeddings", {}).items()
        ]
    @property
    def embeddings_config(self):
        _embeddings = self.actions["_config"].config["embeddings"].value
        return self.config.get("presets", {}).get("embeddings", {}).get(_embeddings, {})
    @property
    def embeddings(self):
        return self.embeddings_config.get("embeddings", "sentence-transformer")
    @property
    def using_openai_embeddings(self):
        return self.embeddings == "openai"
-    @classmethod
+    @property
-    def config_options(cls, agent=None):
+    def using_instructor_embeddings(self):
-        return {}
+        return self.embeddings == "instructor"
    @property
    def using_sentence_transformer_embeddings(self):
        return self.embeddings == "default" or self.embeddings == "sentence-transformer"
    @property
    def using_local_embeddings(self):
        return self.embeddings in [
            "instructor",
            "sentence-transformer",
            "default"
        ]
    @property
    def max_distance(self) -> float:
        distance = float(self.embeddings_config.get("distance", 1.0))
        distance_mod = float(self.embeddings_config.get("distance_mod", 1.0))
        return distance * distance_mod
    @property
    def model(self):
        return self.embeddings_config.get("model")
    @property
    def distance_function(self):
        return self.embeddings_config.get("distance_function", "l2")
-    def __init__(self, scene, **kwargs):
+    @property
-        self.db = None
+    def device(self) -> str:
-        self.scene = scene
+        return self.actions["_config"].config["device"].value
        self.memory_tracker = {}
        self.config = load_config()
        self._ready_to_add = False
-        handlers["config_saved"].connect(self.on_config_saved)
+    @property
    def trust_remote_code(self) -> bool:
        return self.embeddings_config.get("trust_remote_code", False)
    @property
    def fingerprint(self) -> str:
        """
        Returns a unique fingerprint for the current configuration
        """
        return f"{self.embeddings}-{self.model.replace('/','-')}-{self.distance_function}-{self.device}-{self.trust_remote_code}".lower()   
    async def apply_config(self, *args, **kwargs):
        _fingerprint = self.fingerprint
        await super().apply_config(*args, **kwargs)
        fingerprint_changed = _fingerprint != self.fingerprint
        # have embeddings or device changed?
        if fingerprint_changed:
            log.warning("memory agent", status="embedding function changed", old=_fingerprint, new=self.fingerprint)
            await self.handle_embeddings_change()
    @set_processing
    async def handle_embeddings_change(self):
        scene = active_scene.get()
        if not scene or not scene.get_helper("memory"):
            return
        self.close_db(scene)
        emit("status", "Re-importing context database", status="busy")
        await scene.commit_to_memory()
        if not scene.immutable_save:
            await scene.save(auto=True)
        emit("status", "Context database re-imported", status="success")
    def on_config_saved(self, event):
        loop = asyncio.get_running_loop()
        openai_key = self.openai_api_key
        fingerprint = self.fingerprint
        self.config = load_config()
        if fingerprint != self.fingerprint:
            log.warning("memory agent", status="embedding function changed", old=fingerprint, new=self.fingerprint)
            loop.run_until_complete(self.handle_embeddings_change())
        if openai_key != self.openai_api_key:
            loop = asyncio.get_running_loop()
            loop.run_until_complete(self.emit_status())
    @set_processing
    async def set_db(self):
-        raise NotImplementedError()
+        loop = asyncio.get_running_loop()
        try:
            await loop.run_in_executor(None, self._set_db)
        except EmbeddingsModelLoadError:
            raise
        except Exception as e:
            log.error("memory agent", error="failed to set db", details=e)
            if "torchvision::nms does not exist" in str(e):
                raise SetDBError("The embeddings you are trying to use require the `torchvision` package to be installed")
            raise SetDBError(str(e))
    def close_db(self):
        raise NotImplementedError()
@ -180,12 +328,6 @@ class MemoryAgent(Agent):
        """
        raise NotImplementedError()
    def _delete(self, meta: dict):
        """
        Delete an object from the memory
        """
        raise NotImplementedError()
    @set_processing
    async def delete(self, meta: dict):
        """
@ -201,13 +343,20 @@ class MemoryAgent(Agent):
        loop = asyncio.get_running_loop()
        await loop.run_in_executor(None, self._delete, meta)
    def _delete(self, meta: dict):
        """
        Delete an object from the memory
        """
        raise NotImplementedError()
    @set_processing
    async def get(self, text, character=None, **query):
-        loop = asyncio.get_running_loop()
+        with MemoryRequest(query=text, query_params=query) as active_memory_request:
-
+            active_memory_request.max_distance = self.max_distance
-        return await loop.run_in_executor(
+            return await asyncio.to_thread(self._get, text, character, **query)
-            None, functools.partial(self._get, text, character, **query)
+            #return await loop.run_in_executor(
-        )
+            #    None, functools.partial(self._get, text, character, **query)
            #)
    def _get(self, text, character=None, **query):
        raise NotImplementedError()
@ -229,22 +378,6 @@ class MemoryAgent(Agent):
        super().connect(scene)
        scene.signals["archive_add"].connect(self.on_archive_add)
    def add_chunks(self, lines: list[str], chunk_size=200):
        current_chunk = []
        current_size = 0
        for line in lines:
            current_size += util.count_tokens(line)
            if current_size > chunk_size:
                self.add("\n".join(current_chunk))
                current_chunk = [line]
                current_size = util.count_tokens(line)
            else:
                current_chunk.append(line)
        if current_chunk:
            self.add("\n".join(current_chunk))
    async def memory_context(
        self,
        name: str,
@ -274,6 +407,7 @@ class MemoryAgent(Agent):
                break
        return memory_context
    @set_processing
    async def query(
        self,
        query: str,
@ -294,6 +428,7 @@ class MemoryAgent(Agent):
        except IndexError:
            return None
    @set_processing
    async def multi_query(
        self,
        queries: list[str],
@ -334,9 +469,6 @@ class MemoryAgent(Agent):
        return memory_context
 from .registry import register
@register(condition=lambda: chromadb is not None)
 class ChromaDBMemoryAgent(MemoryAgent):
    requires_llm_client = False
@ -372,9 +504,21 @@ class ChromaDBMemoryAgent(MemoryAgent):
            "embeddings": AgentDetail(
                icon="mdi-cube-unfolded",
                value=self.embeddings,
                description="The embeddings type.",
            ).model_dump(),
            "model": AgentDetail(
                icon="mdi-brain",
                value=self.model,
                description="The embeddings model.",
            ).model_dump(),
        }
        if self.using_local_embeddings:
            details["device"] = AgentDetail(
                icon="mdi-memory",
                value=self.device,
                description="The device to use for embeddings.",
            ).model_dump()
        if self.embeddings == "openai" and not self.openai_api_key:
            # return "No OpenAI API key set"
@ -387,48 +531,6 @@ class ChromaDBMemoryAgent(MemoryAgent):
        return details
    @property
    def embeddings(self):
        """
        Returns which embeddings to use
        will read from TM_CHROMADB_EMBEDDINGS env variable and default to 'default' using
        the default embeddings specified by chromadb.
        other values are
        - openai: use openai embeddings
        - instructor: use instructor embeddings
        for `openai`:
        you will also need to provide an `OPENAI_API_KEY` env variable
        for `instructor`:
        you will also need to provide which instructor model to use with the `TM_INSTRUCTOR_MODEL` env variable, which defaults to hkunlp/instructor-xl
        additionally you can provide the `TM_INSTRUCTOR_DEVICE` env variable to specify which device to use, which defaults to cpu
        """
        embeddings = self.config.get("chromadb").get("embeddings")
        assert embeddings in [
            "default",
            "openai",
            "instructor",
        ], f"Unknown embeddings {embeddings}"
        return embeddings
    @property
    def USE_OPENAI(self):
        return self.embeddings == "openai"
    @property
    def USE_INSTRUCTOR(self):
        return self.embeddings == "instructor"
    @property
    def db_name(self):
        return getattr(self, "collection_name", "<unnamed>")
@ -437,38 +539,25 @@ class ChromaDBMemoryAgent(MemoryAgent):
    def openai_api_key(self):
        return self.config.get("openai", {}).get("api_key")
-    def make_collection_name(self, scene):
+    def make_collection_name(self, scene) -> str:
-        if self.USE_OPENAI:
+        # generate plain text collection name
-            model_name = self.config.get("chromadb").get(
+        collection_name = f"{self.fingerprint}"
-                "openai_model", "text-embedding-3-small"
+        
-            )
+        # chromadb collection names have the following rules:
-            if model_name == "text-embedding-ada-002":
+        # Expected collection name that (1) contains 3-63 characters, (2) starts and ends with an alphanumeric character, (3) otherwise contains only alphanumeric characters, underscores or hyphens (-), (4) contains no two consecutive periods (..) and (5) is not a valid IPv4 address
                suffix = "-openai"
            else:
                suffix = f"-openai-{model_name}"
        elif self.USE_INSTRUCTOR:
            suffix = "-instructor"
            model = self.config.get("chromadb").get(
                "instructor_model", "hkunlp/instructor-xl"
            )
            if "xl" in model:
                suffix += "-xl"
            elif "large" in model:
                suffix += "-large"
        else:
            suffix = ""
        return f"{scene.memory_id}-tm{suffix}"
        # Step 1: Hash the input string using MD5
        md5_hash = hashlib.md5(collection_name.encode()).hexdigest()
        # Step 2: Ensure the result is exactly 32 characters long
        hashed_collection_name = md5_hash[:32]
        return f"{scene.memory_id}-tm-{hashed_collection_name}"
    async def count(self):
        await asyncio.sleep(0)
        return self.db.count()
    @set_processing
    async def set_db(self):
        loop = asyncio.get_running_loop()
        await loop.run_in_executor(None, self._set_db)
    def _set_db(self):
        self._ready_to_add = False
@ -485,20 +574,21 @@ class ChromaDBMemoryAgent(MemoryAgent):
        log.info(
            "chromadb agent", status="setting up db", collection_name=collection_name
        )
-
+        
-        if self.USE_OPENAI:
+        distance_function = self.distance_function
        collection_metadata = {"hnsw:space": distance_function}
        device =  self.actions["_config"].config["device"].value
        model_name = self.model
        if self.using_openai_embeddings:
            if not openai_key:
                raise ValueError(
                    "You must provide an the openai ai key in the config if you want to use it for chromadb embeddings"
                )
            model_name = self.config.get("chromadb").get(
                "openai_model", "text-embedding-3-small"
            )
            log.info(
-                "crhomadb",
+                "chromadb",
-                status="using openai",
+                embeddings="OpenAI",
                openai_key=openai_key[:5] + "...",
                model=model_name,
            )
@ -507,38 +597,49 @@ class ChromaDBMemoryAgent(MemoryAgent):
                model_name=model_name,
            )
            self.db = self.db_client.get_or_create_collection(
-                collection_name, embedding_function=openai_ef
+                collection_name, embedding_function=openai_ef, metadata=collection_metadata
            )
-        elif self.USE_INSTRUCTOR:
+        elif self.using_instructor_embeddings:
            instructor_device = self.config.get("chromadb").get(
                "instructor_device", "cpu"
            )
            instructor_model = self.config.get("chromadb").get(
                "instructor_model", "hkunlp/instructor-xl"
            )
            log.info(
                "chromadb",
-                status="using instructor",
+                embeddings="Instructor-XL",
-                model=instructor_model,
+                model=model_name,
-                device=instructor_device,
+                device=device,
            )
            # ef = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-mpnet-base-v2")
            ef = embedding_functions.InstructorEmbeddingFunction(
-                model_name=instructor_model, device=instructor_device
+                model_name=model_name, device=device
            )
            log.info("chromadb", status="embedding function ready")
            self.db = self.db_client.get_or_create_collection(
-                collection_name, embedding_function=ef
+                collection_name, embedding_function=ef, metadata=collection_metadata
            )
            log.info("chromadb", status="instructor db ready")
        else:
-            log.info("chromadb", status="using default embeddings")
+            log.info(
-            self.db = self.db_client.get_or_create_collection(collection_name)
+                "chromadb", 
                embeddins="SentenceTransformer", 
                model=model_name,
                device=device,
                distance_function=distance_function
            )
            try:
                ef = embedding_functions.SentenceTransformerEmbeddingFunction(
                    model_name=model_name,
                    trust_remote_code=self.trust_remote_code,
                    device=device
                )
            except ValueError as e:
                if "`trust_remote_code=True` to remove this error" in str(e):
                    raise EmbeddingsModelLoadError(model_name, "Model requires `Trust remote code` to be enabled")
                raise EmbeddingsModelLoadError(model_name, str(e))
            self.db = self.db_client.get_or_create_collection(
                collection_name, embedding_function=ef, metadata=collection_metadata
            )
        self.scene._memory_never_persisted = self.db.count() == 0
        log.info("chromadb agent", status="db ready")
@ -698,23 +799,25 @@ class ChromaDBMemoryAgent(MemoryAgent):
        _results = self.db.query(query_texts=[text], where=where, n_results=limit)
-        # import json
+        #import json
-        # print(json.dumps(_results["ids"], indent=2))
+        #print(json.dumps(_results["ids"], indent=2))
-        # print(json.dumps(_results["distances"], indent=2))
+        #print(json.dumps(_results["distances"], indent=2))
        results = []
-        max_distance = 1.5
+        max_distance = self.max_distance
-        if self.USE_INSTRUCTOR:
+        
-            max_distance = 1
+        closest = None
-        elif self.USE_OPENAI:
+        
-            max_distance = 1
+        active_memory_request = memory_request.get()
        for i in range(len(_results["distances"][0])):
            distance = _results["distances"][0][i]
            doc = _results["documents"][0][i]
            meta = _results["metadatas"][0][i]
            active_memory_request.add_result(doc, distance, meta)
            if not meta:
                log.warning("chromadb agent get", error="no meta", doc=doc)
@ -725,6 +828,11 @@ class ChromaDBMemoryAgent(MemoryAgent):
            # skip pin_only entries
            if meta.get("pin_only", False):
                continue
            if closest is None:
                closest = {"distance": distance, "doc": doc}
            elif distance < closest["distance"]:
                closest = {"distance": distance, "doc": doc}
            if distance < max_distance:
                date_prefix = self.convert_ts_to_date_prefix(ts)
@ -736,14 +844,19 @@ class ChromaDBMemoryAgent(MemoryAgent):
                doc = MemoryDocument(doc, meta, _results["ids"][0][i], raw)
                results.append(doc)
-            else:
+                active_memory_request.accept_result(str(doc), distance, meta)
                break
            # log.debug("crhomadb agent get", result=results[-1], distance=distance)
            if len(results) > limit:
                break
-
+            
        log.debug("chromadb agent get", closest=closest, max_distance=max_distance)
        self.last_query = {
            "query": text,
            "closest": closest,
        }
        return results
    def convert_ts_to_date_prefix(self, ts):
--- a/src/talemate/agents/memory/context.py
+++ b/src/talemate/agents/memory/context.py
@ -0,0 +1,99 @@
 """
 Context manager that collects and tracks memory agent requests 
 for profiling and debugging purposes
 """
 import contextvars
 import pydantic
 import structlog
 import time
 from talemate.emit import emit
 from talemate.agents.context import active_agent
 __all__ =  [
    "MemoryRequest",
    "start_memory_request"
    "MemoryRequestState"
    "memory_request"
 ]
 log = structlog.get_logger()
 DEBUG_MEMORY_REQUESTS = False
 class MemoryRequestResult(pydantic.BaseModel):
    doc: str
    distance: float
    meta: dict = pydantic.Field(default_factory=dict)
 class MemoryRequestState(pydantic.BaseModel):
    query:str
    results: list[MemoryRequestResult] = pydantic.Field(default_factory=list)
    accepted_results: list[MemoryRequestResult] = pydantic.Field(default_factory=list)
    query_params: dict = pydantic.Field(default_factory=dict)
    closest_distance: float | None = None
    furthest_distance: float | None = None
    max_distance: float | None = None
    def add_result(self, doc:str, distance:float, meta:dict):
        self.results.append(MemoryRequestResult(doc=doc, distance=distance, meta=meta))
        self.closest_distance = min(self.closest_distance, distance) if self.closest_distance is not None else distance
        self.furthest_distance = max(self.furthest_distance, distance) if self.furthest_distance is not None else distance
    def accept_result(self, doc:str, distance:float, meta:dict):
        self.accepted_results.append(MemoryRequestResult(doc=doc, distance=distance, meta=meta))
    @property
    def closest_text(self):
        return str(self.results[0].doc) if self.results else None
 memory_request = contextvars.ContextVar("memory_request", default=None)
 class MemoryRequest:
    def __init__(self, query:str, query_params:dict=None):
        self.query = query
        self.query_params = query_params
    def __enter__(self):
        self.state = MemoryRequestState(query=self.query, query_params=self.query_params)
        self.token = memory_request.set(self.state)
        self.time_start = time.time()
        return self.state
    def __exit__(self, *args):
        self.time_end = time.time()
        if DEBUG_MEMORY_REQUESTS:
            max_length = 50
            query = self.state.query[:max_length]+"..." if len(self.state.query) > max_length else self.state.query
            log.debug("MemoryRequest", number_of_results=len(self.state.results), query=query)
            log.debug("MemoryRequest", number_of_accepted_results=len(self.state.accepted_results), query=query)
            for result in self.state.results:
                # distance to 2 decimal places
                log.debug("MemoryRequest RESULT", distance=f"{result.distance:.2f}", doc=result.doc[:max_length]+"...")
        agent_context = active_agent.get()
        emit("memory_request", data=self.state.model_dump(), meta={
            "agent_stack": agent_context.agent_stack if agent_context else [],
            "agent_stack_uid": agent_context.agent_stack_uid if agent_context else None,
            "duration": self.time_end - self.time_start,
            }, websocket_passthrough=True)
        memory_request.reset(self.token)
        return False
 # decorator that opens a memory request context
 async def start_memory_request(query):
    async def decorator(fn):
        async def wrapper(*args, **kwargs):
            with MemoryRequest(query):
                return await fn(*args, **kwargs)
        return wrapper
    return decorator
--- a/src/talemate/agents/memory/exceptions.py
+++ b/src/talemate/agents/memory/exceptions.py
@ -0,0 +1,18 @@
 __all__ = [
    'EmbeddingsModelLoadError',
    'MemoryAgentError',
    'SetDBError'
 ]
 class MemoryAgentError(Exception):
    pass
 class SetDBError(OSError, MemoryAgentError):
    def __init__(self, details:str):
        super().__init__(f"Memory Agent - Failed to set up the database: {details}")
 class EmbeddingsModelLoadError(ValueError, MemoryAgentError):
    def __init__(self, model_name:str, details:str):
        super().__init__(f"Memory Agent - Failed to load embeddings model {model_name}: {details}")
--- a/src/talemate/agents/narrator.py
+++ b/src/talemate/agents/narrator.py
@ -8,6 +8,10 @@ from typing import TYPE_CHECKING, Callable, List, Optional, Union
 import structlog
 import talemate.client as client
 from talemate.client.context import (
    client_context_attribute,
    set_client_context_attribute,
 )
 import talemate.emit.async_signals
 import talemate.util as util
 from talemate.agents.base import Agent, AgentAction, AgentActionConfig, AgentEmission
@ -18,6 +22,8 @@ from talemate.events import GameLoopActorIterEvent
 from talemate.prompts import Prompt
 from talemate.scene_message import NarratorMessage
 from talemate.instance import get_agent
 from .registry import register
 if TYPE_CHECKING:
@ -75,15 +81,32 @@ class NarratorAgent(Agent):
        self.actions = {
            "generation_override": AgentAction(
                enabled=True,
-                label="Generation Override",
+                label="Generation Settings",
                description="Override generation parameters",
                config={
                    "length": AgentActionConfig(
                        type="number",
                        label="Max. Generation Length (tokens)",
                        description="Maximum number of tokens to generate for narrative text. Some narrative actions generate longer or shorter texts. This value is used as a maximum limit.",
                        value=192,
                        min=32,
                        max=1024,
                        step=32,
                    ), 
                    "instructions": AgentActionConfig(
                        type="text",
                        label="Instructions",
                        value="Never wax poetic.",
                        description="Extra instructions to give to the AI for narrative generation.",
                    ),
                    "jiggle": AgentActionConfig(
                        type="number",
                        label="Jiggle (Increased Randomness)",
                        description="If > 0.0 will cause certain generation parameters to have a slight random offset applied to them. The bigger the number, the higher the potential offset.",
                        value=0.0,
                        min=0.0,
                        max=1.0,
                        step=0.1,
                    ),
                },
            ),
            "auto_break_repetition": AgentAction(
@ -138,12 +161,24 @@ class NarratorAgent(Agent):
        }
    @property
-    def extra_instructions(self):
+    def extra_instructions(self) -> str:
        if self.actions["generation_override"].enabled:
            return self.actions["generation_override"].config["instructions"].value
        return ""
-    def clean_result(self, result):
+    @property
    def jiggle(self) -> float:
        if self.actions["generation_override"].enabled:
            return self.actions["generation_override"].config["jiggle"].value
        return 0.0
    @property
    def max_generation_length(self) -> int:
        if self.actions["generation_override"].enabled:
            return self.actions["generation_override"].config["length"].value
        return 128
    def clean_result(self, result:str, ensure_dialog_format:bool=True, force_narrative:bool=True) -> str:
        """
        Cleans the result of a narration
        """
@ -157,13 +192,36 @@ class NarratorAgent(Agent):
        cleaned = []
        for line in result.split("\n"):
            log.debug("clean_result", line=line)
            character_dialogue_detected = False
            for character_name in character_names:
-                if line.startswith(f"{character_name}:"):
+                if line.lower().startswith(f"{character_name}:"):
                    character_dialogue_detected = True
                elif line.startswith(f"{character_name.upper()}"):
                    character_dialogue_detected = True
                if character_dialogue_detected:
                    break
            if character_dialogue_detected:
                break
            cleaned.append(line)
        result = "\n".join(cleaned)
-        # result = util.strip_partial_sentences(result)
+        
        result = util.strip_partial_sentences(result)
        if force_narrative:
            if "*" not in result and '"' not in result:
                result = f"*{result.strip()}*"
        if ensure_dialog_format:
            result = util.ensure_dialog_format(result)
        return result
    def connect(self, scene):
@ -259,17 +317,18 @@ class NarratorAgent(Agent):
            },
        )
-        response = response.strip("*")
+        response = self.clean_result(response.strip())
        response = util.strip_partial_sentences(response)
        response = f"*{response.strip('*')}*"
        return response
    @set_processing
-    async def progress_story(self, narrative_direction: str = None):
+    async def progress_story(self, narrative_direction: str | None = None):
        """
-        Narrate the scene
+        Narrate scene progression, moving the plot forward.
        Arguments:
        - narrative_direction: A string describing the direction the narrative should take. If not provided, will attempt to subtly move the story forward.
        """
        scene = self.scene
@ -302,13 +361,7 @@ class NarratorAgent(Agent):
        self.scene.log.info("progress_story", response=response)
        response = self.clean_result(response.strip())
-
+        
        response = response.strip().strip("*")
        response = f"*{response}*"
        if response.count("*") % 2 != 0:
            response = response.replace("*", "")
            response = f"*{response}*"
        return response
    @set_processing
@ -331,11 +384,11 @@ class NarratorAgent(Agent):
                "extra_instructions": self.extra_instructions,
            },
        )
-        log.info("narrate_query", response=response)
+        response = self.clean_result(
-        response = self.clean_result(response.strip())
+            response.strip(), 
-        log.info("narrate_query (after clean)", response=response)
+            ensure_dialog_format=False, 
-        if as_narrative:
+            force_narrative=as_narrative
-            response = f"*{response}*"
+        )
        return response
@ -357,8 +410,7 @@ class NarratorAgent(Agent):
            },
        )
-        response = self.clean_result(response.strip())
+        response = self.clean_result(response.strip(), ensure_dialog_format=False, force_narrative=True)
        response = f"*{response}*"
        return response
@ -434,7 +486,6 @@ class NarratorAgent(Agent):
        log.info("narrate_time_passage", response=response)
        response = self.clean_result(response.strip())
        response = f"*{response}*"
        return response
@ -496,7 +547,6 @@ class NarratorAgent(Agent):
        )
        response = self.clean_result(response.strip().strip("*"))
        response = f"*{response}*"
        return response
@ -520,7 +570,6 @@ class NarratorAgent(Agent):
        )
        response = self.clean_result(response.strip().strip("*"))
        response = f"*{response}*"
        return response
@ -544,7 +593,6 @@ class NarratorAgent(Agent):
        log.info("paraphrase", narration=narration, response=response)
        response = self.clean_result(response.strip().strip("*"))
        response = f"*{response}*"
        return response
@ -629,10 +677,21 @@ class NarratorAgent(Agent):
            kind=kind,
            agent_function_name=agent_function_name,
        )
-        character_names = [f"\n{c.name}:" for c in self.scene.get_characters()]
+        
        # depending on conversation format in the context, stopping strings
        # for character names may change format
        conversation_agent = get_agent("conversation")
        if conversation_agent.conversation_format == "movie_script":
            character_names = [f"\n{c.name.upper()}\n" for c in self.scene.get_characters()]
        else: 
            character_names = [f"\n{c.name}:" for c in self.scene.get_characters()]
        if prompt_param.get("extra_stopping_strings") is None:
            prompt_param["extra_stopping_strings"] = []
        prompt_param["extra_stopping_strings"] += character_names
        self.set_generation_overrides(prompt_param)
    def allow_repetition_break(
        self, kind: str, agent_function_name: str, auto: bool = False
@ -641,3 +700,17 @@ class NarratorAgent(Agent):
            return False
        return True
    def set_generation_overrides(self, prompt_param: dict):
        if not self.actions["generation_override"].enabled:
            return
        prompt_param["max_tokens"] = min(prompt_param.get("max_tokens", 256), self.max_generation_length)
        if self.jiggle > 0.0:
            nuke_repetition = client_context_attribute("nuke_repetition")
            if nuke_repetition == 0.0:
                # we only apply the agent override if some other mechanism isn't already
                # setting the nuke_repetition value
                nuke_repetition = self.jiggle
                set_client_context_attribute("nuke_repetition", nuke_repetition)
--- a/src/talemate/agents/visual/init.py
+++ b/src/talemate/agents/visual/init.py
@ -78,6 +78,15 @@ class VisualBase(Agent):
                        label="Default Style",
                        description="The default style to use for visual processing",
                    ),
                    "timeout": AgentActionConfig(
                        type="number",
                        value=300,
                        label="Image generation timeout",
                        min=1,
                        max=900,
                        step=50,
                        description="Timeout in seconds. If the backend does not generate an image within this time, it will be considered failed.",
                    ),
                },
            ),
            "automatic_setup": AgentAction(
@ -95,6 +104,7 @@ class VisualBase(Agent):
                label="Process in Background",
                description="Process renders in the background",
            ),
            "_prompts": AgentAction(
                enabled=True,
                container=True,
@ -165,6 +175,10 @@ class VisualBase(Agent):
            self.actions["_config"].config["default_style"].value, Style()
        )
    @property
    def generate_timeout(self):
        return self.actions["_config"].config["timeout"].value
    @property
    def ready(self):
        return self.backend_ready
--- a/src/talemate/agents/visual/automatic1111.py
+++ b/src/talemate/agents/visual/automatic1111.py
@ -21,6 +21,43 @@ from .style import STYLE_MAP, Style
 log = structlog.get_logger("talemate.agents.visual.automatic1111")
 SAMPLING_METHODS = [
    {"value": "DPM++ 2M", "label": "DPM++ 2M"},
    {"value": "DPM++ SDE", "label": "DPM++ SDE"},
    {"value": "DPM++ 2M SDE", "label": "DPM++ 2M SDE"},
    {"value": "DPM++ 2M SDE Heun", "label": "DPM++ 2M SDE Heun"},
    {"value": "DPM++ 2S a", "label": "DPM++ 2S a"},
    {"value": "DPM++ 3M SDE", "label": "DPM++ 3M SDE"},
    {"value": "Euler a", "label": "Euler a"},
    {"value": "Euler", "label": "Euler"},
    {"value": "LMS", "label": "LMS"},
    {"value": "Heun", "label": "Heun"},
    {"value": "DPM2", "label": "DPM2"},
    {"value": "DPM2 a", "label": "DPM2 a"},
    {"value": "DPM fast", "label": "DPM fast"},
    {"value": "DPM adaptive", "label": "DPM adaptive"},
    {"value": "Restart", "label": "Restart"},
 ]
 SAMPLING_METHODS = sorted(SAMPLING_METHODS, key=lambda x: x["label"])
 SAMPLING_SCHEDULES = [
    {"value": "Automatic", "label": "Automatic"},
    {"value": "Uniform", "label": "Uniform"},
    {"value": "Karras", "label": "Karras"},
    {"value": "Exponential", "label": "Exponential"},
    {"value": "polyPolyexponentialexponential", "label": "Polyexponential"},
    {"value": "SGM Uniform", "label": "SGM Uniform"},
    {"value": "KL Optimal", "label": "KL Optimal"},
    {"value": "Align Your Steps", "label": "Align Your Steps"},
    {"value": "Simple", "label": "Simple"},
    {"value": "Normal", "label": "Normal"},
    {"value": "DDIM", "label": "DDIM"},
    {"value": "Beta", "label": "Beta"},
 ]
 SAMPLING_SCHEDULES = sorted(SAMPLING_SCHEDULES, key=lambda x: x["label"])
@register(backend_name="automatic1111", label="AUTOMATIC1111")
 class Automatic1111Mixin:
@ -52,6 +89,29 @@ class Automatic1111Mixin:
                    step=1,
                    description="number of render steps",
                ),
                "sampling_method": AgentActionConfig(
                    type="text",
                    choices=SAMPLING_METHODS,
                    label="Sampling Method",
                    description="The sampling method to use",
                    value="DPM++ 2M",
                ),
                "schedule_type": AgentActionConfig(
                    type="text",
                    value="automatic",
                    choices=SAMPLING_SCHEDULES,
                    label="Schedule Type",
                    description="The sampling schedule to use",
                ),
                "cfg": AgentActionConfig(
                    type="number",
                    value=7,
                    label="CFG Scale",
                    description="CFG scale",
                    min=1,
                    max=30,
                    step=0.5,
                ),
                "model_type": AgentActionConfig(
                    type="text",
                    value="sdxl",
@ -76,6 +136,18 @@ class Automatic1111Mixin:
        else:
            return self.automatic1111_default_render_settings
    @property
    def automatic1111_sampling_method(self):
        return self.actions["automatic1111"].config["sampling_method"].value
    @property
    def automatic1111_schedule_type(self):
        return self.actions["automatic1111"].config["schedule_type"].value
    @property
    def automatic1111_cfg(self):
        return self.actions["automatic1111"].config["cfg"].value    
    async def automatic1111_generate(self, prompt: Style, format: str):
        url = self.api_url
        resolution = self.resolution_from_format(
@ -88,13 +160,16 @@ class Automatic1111Mixin:
            "steps": render_settings.steps,
            "width": resolution.width,
            "height": resolution.height,
            "cfg_scale": self.automatic1111_cfg,
            "sampler_name": self.automatic1111_sampling_method,
            "scheduler": self.automatic1111_schedule_type
        }
        log.info("automatic1111_generate", payload=payload, url=url)
        async with httpx.AsyncClient() as client:
            response = await client.post(
-                url=f"{url}/sdapi/v1/txt2img", json=payload, timeout=90
+                url=f"{url}/sdapi/v1/txt2img", json=payload, timeout=self.generate_timeout
            )
        r = response.json()
--- a/src/talemate/agents/visual/comfyui.py
+++ b/src/talemate/agents/visual/comfyui.py
@ -287,7 +287,7 @@ class ComfyUIMixin:
        log.info("comfyui_generate", payload=payload, url=url)
        async with httpx.AsyncClient() as client:
-            response = await client.post(url=f"{url}/prompt", json=payload, timeout=90)
+            response = await client.post(url=f"{url}/prompt", json=payload, timeout=self.generate_timeout)
        log.info("comfyui_generate", response=response.text)
--- a/src/talemate/agents/visual/openai_image.py
+++ b/src/talemate/agents/visual/openai_image.py
@ -112,6 +112,7 @@ class OpenAIImageMixin:
            quality=self.openai_quality,
            n=1,
            response_format="b64_json",
            timeout=self.generate_timeout,
        )
        await self.emit_image(response.data[0].b64_json)
--- a/src/talemate/client/base.py
+++ b/src/talemate/client/base.py
@ -6,6 +6,7 @@ import ipaddress
 import logging
 import random
 import time
 import asyncio
 from typing import Callable, Union
 import pydantic
@ -22,7 +23,7 @@ from talemate.client.context import client_context_attribute
 from talemate.client.model_prompts import model_prompt
 from talemate.context import active_scene
 from talemate.emit import emit
-from talemate.exceptions import SceneInactiveError
+from talemate.exceptions import SceneInactiveError, GenerationCancelled
 # Set up logging level for httpx to WARNING to suppress debug logs.
 logging.getLogger("httpx").setLevel(logging.WARNING)
@ -469,12 +470,6 @@ class ClientBase:
    def tune_prompt_parameters(self, parameters: dict, kind: str):
        parameters["stream"] = False
        if client_context_attribute(
            "nuke_repetition"
        ) > 0.0 and self.jiggle_enabled_for(kind):
            self.jiggle_randomness(
                parameters, offset=client_context_attribute("nuke_repetition")
            )
        fn_tune_kind = getattr(self, f"tune_prompt_parameters_{kind}", None)
        if fn_tune_kind:
@ -485,6 +480,13 @@ class ClientBase:
            agent_context.agent.inject_prompt_paramters(
                parameters, kind, agent_context.action
            )
        if client_context_attribute(
            "nuke_repetition"
        ) > 0.0 and self.jiggle_enabled_for(kind):
            self.jiggle_randomness(
                parameters, offset=client_context_attribute("nuke_repetition")
            )
    def tune_prompt_parameters_conversation(self, parameters: dict):
        conversation_context = client_context_attribute("conversation")
@ -553,6 +555,59 @@ class ClientBase:
                "status", message="Error during generation (check logs)", status="error"
            )
            return ""
    def _generate_task(self, prompt: str, parameters: dict, kind: str):
        """
        Creates an asyncio task to generate text from the given prompt and parameters.
        """
        return asyncio.create_task(self.generate(prompt, parameters, kind))
    def _poll_interrupt(self):
        """
        Creatates a task that continiously checks active_scene.cancel_requested and
        will complete the task if it is requested.
        """
        async def poll():
            while True:
                scene = active_scene.get()
                if not scene or not scene.active or scene.cancel_requested:
                    break
                await asyncio.sleep(0.3)
            return GenerationCancelled("Generation cancelled")
        return asyncio.create_task(poll())
    async def _cancelable_generate(self, prompt: str, parameters: dict, kind: str) -> str | GenerationCancelled:
        """
        Queues the generation task and the poll task to be run concurrently.
        If the poll task completes before the generation task, the generation task
        will be cancelled.
        If the generation task completes before the poll task, the poll task will
        be cancelled.
        """
        task_poll = self._poll_interrupt()
        task_generate = self._generate_task(prompt, parameters, kind)
        done, pending = await asyncio.wait(
            [task_poll, task_generate],
            return_when=asyncio.FIRST_COMPLETED
        )
        # cancel the remaining task
        for task in pending:
            task.cancel()
        # return the result of the completed task
        return done.pop().result()
    async def send_prompt(
        self,
@ -609,8 +664,15 @@ class ClientBase:
                parameters=prompt_param,
            )
            prompt_sent = self.repetition_adjustment(finalized_prompt)
-            response = await self.generate(prompt_sent, prompt_param, kind)
+            
-
+            response = await self._cancelable_generate(prompt_sent, prompt_param, kind)
            if isinstance(response, GenerationCancelled):
                # generation was cancelled
                raise response
            #response = await self.generate(prompt_sent, prompt_param, kind)
            response, finalized_prompt = await self.auto_break_repetition(
                finalized_prompt, prompt_param, response, kind, retries
            )
@ -646,8 +708,10 @@ class ClientBase:
            )
            return response
        except GenerationCancelled as e:
            raise
        except Exception as e:
-            self.log.error("send_prompt error", e=e)
+            self.log.exception("send_prompt error", e=e)
            emit(
                "status", message="Error during generation (check logs)", status="error"
            )
--- a/src/talemate/client/context.py
+++ b/src/talemate/client/context.py
@ -22,6 +22,9 @@ def model_to_dict_without_defaults(model_instance):
    for field_name, field in model_instance.__class__.__fields__.items():
        if field.default == model_dict.get(field_name):
            del model_dict[field_name]
        # special case for conversation context, dont copy if talking_character is None
        if field_name == "conversation" and model_dict.get(field_name).get("talking_character") is None:
            del model_dict[field_name]
    return model_dict
--- a/src/talemate/client/openai.py
+++ b/src/talemate/client/openai.py
@ -31,6 +31,8 @@ SUPPORTED_MODELS = [
    "gpt-4o-2024-05-13",
    "gpt-4o",
    "gpt-4o-mini",
    "o1-preview",
    "o1-mini",
 ]
 # any model starting with gpt-4- is assumed to support 'json_object'
@ -39,6 +41,8 @@ JSON_OBJECT_RESPONSE_MODELS = [
    "gpt-4o",
    "gpt-4o-mini",
    "gpt-3.5-turbo-0125",
    "o1-preview",
    "o1-mini",
 ]
@ -70,7 +74,7 @@ def num_tokens_from_messages(messages: list[dict], model: str = "gpt-3.5-turbo-0
            "Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613."
        )
        return num_tokens_from_messages(messages, model="gpt-3.5-turbo-0613")
-    elif "gpt-4" in model:
+    elif "gpt-4" in model or "o1" in model:
        print(
            "Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613."
        )
--- a/src/talemate/config.py
+++ b/src/talemate/config.py
@ -6,6 +6,7 @@ from typing import TYPE_CHECKING, Any, ClassVar, Dict, Optional, TypeVar, Union
 import pydantic
 import structlog
 import yaml
 from enum import Enum
 from pydantic import BaseModel, Field
 from typing_extensions import Annotated
@ -179,13 +180,6 @@ class TTSConfig(BaseModel):
    voices: list[TTSVoiceSamples] = pydantic.Field(default_factory=list)
 class ChromaDB(BaseModel):
    instructor_device: str = "cpu"
    instructor_model: str = "default"
    openai_model: str = "text-embedding-3-small"
    embeddings: str = "default"
 class RecentScene(BaseModel):
    name: str
    path: str
@ -193,6 +187,66 @@ class RecentScene(BaseModel):
    date: str
    cover_image: Union[Asset, None] = None
 class EmbeddingFunctionPreset(BaseModel):
    embeddings: str = "sentence-transformer"
    model: str = "all-MiniLM-L6-v2"
    trust_remote_code: bool = False
    device: str = "cpu"
    distance: float = 1.5
    distance_mod: int = 1
    distance_function: str = "l2"
    fast: bool = True
    gpu_recommendation: bool = False
    local: bool = True
    custom: bool = False
 def generate_chromadb_presets() -> dict[str, EmbeddingFunctionPreset]:
    """
    Returns a dict of default embedding presets
    """
    return {
        "default": EmbeddingFunctionPreset(),
        "Alibaba-NLP/gte-base-en-v1.5": EmbeddingFunctionPreset(
            embeddings="sentence-transformer",
            model="Alibaba-NLP/gte-base-en-v1.5",
            trust_remote_code=True,
            distance=1,
            distance_function="cosine",
        ),
        "openai": EmbeddingFunctionPreset(
            embeddings="openai",
            model="text-embedding-3-small",
            distance=1,
            local=False,
        ),
        "hkunlp/instructor-xl": EmbeddingFunctionPreset(
            embeddings="instructor",
            model="hkunlp/instructor-xl",
            distance=1,
            local=True,
            fast=False,
            gpu_recommendation=True,
        ),
        "hkunlp/instructor-large": EmbeddingFunctionPreset(
            embeddings="instructor",
            model="hkunlp/instructor-large",
            distance=1,
            local=True,
            fast=False,
            gpu_recommendation=True,
        ),
        "hkunlp/instructor-base": EmbeddingFunctionPreset(
            embeddings="instructor",
            model="hkunlp/instructor-base",
            distance=1,
            local=True,
            fast=True,
        ),
    }
 class InferenceParameters(BaseModel):
    temperature: float = 1.0
@ -247,6 +301,9 @@ class InferencePresets(BaseModel):
 class Presets(BaseModel):
    inference_defaults: InferencePresets = InferencePresets()
    inference: InferencePresets = InferencePresets()
    embeddings_defaults: dict[str, EmbeddingFunctionPreset] = pydantic.Field(default_factory=generate_chromadb_presets)
    embeddings: dict[str, EmbeddingFunctionPreset] = pydantic.Field(default_factory=generate_chromadb_presets)
 def gnerate_intro_scenes():
@ -400,8 +457,6 @@ class Config(BaseModel):
    google: GoogleConfig = GoogleConfig()
    chromadb: ChromaDB = ChromaDB()
    elevenlabs: ElevenLabsConfig = ElevenLabsConfig()
    coqui: CoquiConfig = CoquiConfig()
@ -474,11 +529,14 @@ def save_config(config, file_path: str = "./config.yaml"):
    # we dont want to persist the following, so we drop them:
    # - presets.inference_defaults
-    try:
+    # - presets.embeddings_defaults
    if "inference_defaults" in config["presets"]:
        config["presets"].pop("inference_defaults")
-    except KeyError:
+        
-        pass
+    if "embeddings_defaults" in config["presets"]:
-
+        config["presets"].pop("embeddings_defaults")
    # for normal presets we only want to persist if they have changed
    for preset_name, preset in list(config["presets"]["inference"].items()):
        if not preset.get("changed"):
--- a/src/talemate/emit/signals.py
+++ b/src/talemate/emit/signals.py
@ -20,6 +20,7 @@ AgentStatus = signal("agent_status")
 RequestAgentStatus = signal("request_agent_status")
 ClientBootstraps = signal("client_bootstraps")
 PromptSent = signal("prompt_sent")
 MemoryRequest = signal("memory_request")
 RemoveMessage = signal("remove_message")
@ -71,4 +72,5 @@ handlers = {
    "image_generation_failed": ImageGenerationFailed,
    "autocomplete_suggestion": AutocompleteSuggestion,
    "spice_applied": SpiceApplied,
    "memory_request": MemoryRequest,
 }
--- a/src/talemate/exceptions.py
+++ b/src/talemate/exceptions.py
@ -33,6 +33,12 @@ class ResetScene(TalemateInterrupt):
    pass
 class GenerationCancelled(TalemateInterrupt):
    """
    Interrupt current scene and return action to the user 
    """
    pass 
 class RenderPromptError(TalemateError):
    """
@ -68,4 +74,4 @@ class UnknownDataSpec(TalemateError):
    Exception to raise when the data spec is unknown
    """
-    pass
+    pass
--- a/src/talemate/game/engine/init.py
+++ b/src/talemate/game/engine/init.py
@ -19,8 +19,10 @@ nest_asyncio.apply()
 DEV_MODE = True
 def empty_function(*args, **kwargs):
    pass
-def compile_scene_module(module_code: str, **kwargs):
+def compile_scene_module(module_code: str, **kwargs) -> dict[str, callable]:
    # Compile the module code using RestrictedPython
    compiled_code = compile_restricted(
        module_code, filename="<scene instructions>", mode="exec"
@ -45,7 +47,10 @@ def compile_scene_module(module_code: str, **kwargs):
    # Execute the compiled code with the restricted globals
    exec(compiled_code, restricted_globals, safe_locals)
-    return safe_locals.get("game")
+    return {
        "game": safe_locals.get("game"),
        "on_generation_cancelled": safe_locals.get("on_generation_cancelled", empty_function)
    }
 class GameInstructionsMixin:
@ -153,11 +158,18 @@ class GameInstructionsMixin:
        # read thje file into _module property
        with open(module_path, "r") as f:
            module_code = f.read()
            scene_modules = compile_scene_module(module_code)
            if "game" not in scene_modules:
                raise ValueError(f"`game` function not found in scene module {module_path}")
            scene._module = GameInstructionScope(
                director=self,
                log=log,
                scene=scene,
-                module_function=compile_scene_module(module_code),
+                module_function=scene_modules["game"],
                on_generation_cancelled=scene_modules.get("on_generation_cancelled", empty_function)
            )
    async def scene_has_module(self, scene: "Scene"):
--- a/src/talemate/game/scope.py
+++ b/src/talemate/game/scope.py
@ -8,6 +8,7 @@ import talemate.game.engine.api as scoped_api
 from talemate.client.base import ClientBase
 from talemate.emit import emit
 from talemate.instance import get_agent
 from talemate.exceptions import GenerationCancelled
 if TYPE_CHECKING:
    from talemate.agents.director import DirectorAgent
@ -53,6 +54,7 @@ class GameInstructionScope:
        log: object,
        scene: "Scene",
        module_function: callable,
        on_generation_cancelled: callable = None,
    ):
        client = director.client
@ -73,6 +75,7 @@ class GameInstructionScope:
        self.agents.world_state = scoped_api.agent_world_state.create(scene)
        self.agents.visual = scoped_api.agent_visual.create(scene)
        self.module_function = module_function
        self.on_generation_cancelled = on_generation_cancelled
        # set assert_scene_active as a method on all scoped api objects
@ -89,7 +92,11 @@ class GameInstructionScope:
        self.agents.visual.assert_scene_active = assert_scene_active
    def __call__(self):
-        self.module_function(self)
+        try:
            self.module_function(self)
        except GenerationCancelled as exc:
            if callable(self.on_generation_cancelled):
                self.on_generation_cancelled(self, exc)
    def emit_status(self, status: str, message: str, **kwargs):
        if kwargs:
--- a/src/talemate/prompts/templates/creator/autocomplete-dialogue.jinja2
+++ b/src/talemate/prompts/templates/creator/autocomplete-dialogue.jinja2
@ -11,7 +11,11 @@
 {% endfor %}
 <|CLOSE_SECTION|>
 <|SECTION:TASK|>
 {% if not can_coerce -%}
 Continue {{ character.name }}'s unfinished line in this screenplay.
 {%- else -%}
 Write {{ character.name }}'s next line in this screenplay.
 {%- endif %}
 Your response MUST only be the new parts of {{ character.name }}'s dialogue, not the entire line.
 Your response MUST be short. (10 - 15 words)
@ -20,11 +24,12 @@ Your response MUST NOT include the provided text, only the new parts.
 All actions and prose must be contained within * markers.
 All spoken word must be contained within " markers.
 Continue this text: {{ character.name }}: {{ input }}
 {% if not can_coerce -%}
 Continue this text: {{ character.name }}: {{ input }}
 Continuation: 
 <|CLOSE_SECTION|>
 {%- else -%}
 <|CLOSE_SECTION|>
-{{ bot_token }}{{ input }}
+{{ bot_token }}{{ character.name.upper() }}
 {{ input }}
 {%- endif -%}
--- a/src/talemate/prompts/templates/creator/contextual-generate.jinja2
+++ b/src/talemate/prompts/templates/creator/contextual-generate.jinja2
@ -96,7 +96,12 @@ You must only generate one line of example dialogue for {{ character_name }}.
 {% endif -%}
 {#- SCENE INTRO -#}
 {% elif context_typ == "scene intro" %}
-{{ action_task }} introduction for the scene. This is the first text that is shown to {{ scene.get_player_character().name }} when they enter the scene. It must setup an interesting entry point for them to participate in the scene and interact with {% if scene.num_npc_characters() %}the other characters and the environment.{% else %}the environment.{% endif %}
+{{ action_task }} introduction for the scene. This is the first text that is shown to {{ scene.get_player_character().name or "the reader" }} when they start the scene. 
 It must contain enough context for the reader to dive right in. Assume that the reader has not looked at the character or scene descriptions.
 It must setup an interesting entry point for the reader to participate in the scene and interact with {% if scene.num_npc_characters() %}the other characters and the environment.{% else %}the environment.{% endif %}
 {#- GENERAL CONTEXT -#}
 {% else %}
 {% if context_name.endswith("?") -%}
--- a/src/talemate/prompts/templates/narrator/narrate-character.jinja2
+++ b/src/talemate/prompts/templates/narrator/narrate-character.jinja2
@ -13,9 +13,8 @@
 <|SECTION:INFORMATION|>
 {{ query_memory("How old is {character.name}?") }}
-{{ query_memory("What does {character.name} look like?") }}
+{{ query_memory("What does {character.name} look like? Provide a visual description.") }}
-{{ query_scene("Where is {character.name} and what is {character.name} doing?") }}
+{{ query_scene("Where is {character.name}? What is {character.name} doing? What is {character.name} wearing?") }}
 {{ query_scene("what is {character.name} wearing? Be explicit.") }}
 <|CLOSE_SECTION|>
 <|SECTION:TASK|>
--- a/src/talemate/prompts/templates/narrator/narrate-progress.jinja2
+++ b/src/talemate/prompts/templates/narrator/narrate-progress.jinja2
@ -14,28 +14,21 @@ Player Character: {{ player_character.name }}
 {% endfor %}
 <|CLOSE_SECTION|>
 <|SECTION:TASK|>
-YOU MUST WRITE FROM THE PERSPECTIVE OF THE NARRATOR.
+Maintain the existing writing style consistently throughout your narration.
-
+Advance the scene through vivid narration. Focus on the protagonist's actions, thoughts, and surroundings.
-Continue the current dialogue by narrating the progression of the scene.
+Maintain continuity with the overall context. Prioritize scene progression.
-
+Use sensory details and internal monologue for immersion.
-If the scene is over, narrate the beginning of the next scene.
+Adopt an informal, conversational tone similar to 90s adventure games.
-
+Narrate as an omniscient storyteller, describing scenes and characters' inner experiences.
-Consider the entire context and honor the sequentiality of the scene. Continue based on the final state of the dialogue.
+Generate descriptive prose and internal thoughts. Avoid direct speech.
-
+Begin the next scene if the current one has ended.
-Progression of the scene is important. The last line is the most important, the first line is the least important.
+Speak only as the narrator, guiding the reader through the story world.
 Be creative and generate something new and interesting, but stay true to the setting and context of the story so far.
 Use an informal and colloquial register with a conversational tone. Overall, the narrative is informal, conversational, natural, and spontaneous, with a sense of immediacy.
 Narration style should be that of a 90s point and click adventure game. You are omniscient and can describe the scene in detail.
 YOU MUST WRITE FROM THE PERSPECTIVE OF THE NARRATOR.
 Only generate new narration. Avoid including any character's internal thoughts or dialogue.
 Remember: You are the all-seeing narrator. Immerse the reader in the story through your descriptions and insights.
 {% if narrative_direction %}
 Directions for new narration: {{ narrative_direction }}
 These are directions and the events described have not happened yet, you are writing the narrative based on the directions.
 {% endif %}
 Write 2 to 4 sentences. {{ extra_instructions }}
--- a/src/talemate/prompts/templates/narrator/narrate-query.jinja2
+++ b/src/talemate/prompts/templates/narrator/narrate-query.jinja2
@ -21,20 +21,20 @@ Instruction: Analyze Context, History and Dialogue and then answer the question:
 {% else -%}
 Instruction: {{ query }}
 {% endif %}
-When evaluating both story and context, story is more important. You can fill in gaps using imagination as long as it is based on the existing context. 
+Answer queries about the current scene or world without advancing the plot.
 Use the established context to inform your responses, anchoring them to line {{ final_line_number }}.
 Provide information that maintains continuity with everything up to and including line {{ final_line_number }}.
 Use vivid, descriptive language. Convey information through sensory details and implied thoughts.
 Respond as an omniscient, all-seeing narrator with deep knowledge of the story world.
 Maintain an informal, conversational tone similar to 90s adventure games.
 Respond with 1-2 sentences of concise narration fitting the scene's context.
 Avoid direct speech or dialogue. Focus on descriptive prose and implied experiences.
 Embody the narrator's role completely, using a unique narrative voice.
-Progression of the dialogue is important. The last line is the most important, the first line is the least important.
+Remember: You are the narrator. Answer questions confidently and decisively through your perspective, without progressing beyond line {{ final_line_number }}.
-
+Context: This scene is set within {{ scene.context }}.
-Respect the scene progression and answer in the context of line {{ final_line_number }}.
+Final Line Number: {{ final_line_number }}
-
+Question(s): {{query}}
 Use your imagination to fill in gaps in order to answer the question in a confident and decisive manner. Avoid uncertainty and vagueness.
 You answer as the narrator.
 Use an informal and colloquial register with a conversational tone. Overall, the narrative is informal, conversational, natural, and spontaneous, with a sense of immediacy.
 Question: {{ query }}
 Content Context: This is a specific scene from {{ scene.context }}
 Your answer should be in the style of short, concise narration that fits the context of the scene. (1 to 2 sentences)
 {{ extra_instructions }}
 {% include "rerun-context.jinja2" -%}
 <|CLOSE_SECTION|>
--- a/src/talemate/prompts/templates/narrator/narrate-time-passage.jinja2
+++ b/src/talemate/prompts/templates/narrator/narrate-time-passage.jinja2
@ -12,10 +12,12 @@ Player Character: {{ scene.get_player_character().name }}
 {% endfor %}
 <|CLOSE_SECTION|>
 <|SECTION:TASK|>
-Narrate the passage of time that just occured, subtly move the story forward, and set up the next scene.
+Narrate the passage of time that just occured, subtly move the story forward, and set up the next scene. Your main goal is to fill in what happened during the time passage.
 {% if narrative %}
 Directions for new narration: {{ narrative }}
 These are directions and the events described have not happened yet, you are writing the narrative based on the directions.
 {% endif %}
 {{ extra_instructions }}
--- a/src/talemate/server/api.py
+++ b/src/talemate/server/api.py
@ -156,6 +156,9 @@ async def websocket_endpoint(websocket, path):
                elif action_type == "edit_message":
                    log.info("edit_message", data=data)
                    handler.edit_message(data.get("id"), data.get("text"))
                elif action_type == "interrupt":
                    log.info("interrupt")
                    handler.scene.interrupt()
                elif action_type == "request_app_config":
                    log.info("request_app_config")
                    await message_queue.put(
--- a/src/talemate/server/run.py
+++ b/src/talemate/server/run.py
@ -1,17 +1,93 @@
 import argparse
 import asyncio
 import os
 import signal
 import sys
 import structlog
 import websockets
 import re
 import talemate.config
 from talemate.server.api import websocket_endpoint
 from talemate.version import VERSION
 log = structlog.get_logger("talemate.server.run")
 STARTUP_TEXT = f"""
 ████████╗ █████╗ ██╗     ███████╗███╗   ███╗ █████╗ ████████╗███████╗
 ╚══██╔══╝██╔══██╗██║     ██╔════╝████╗ ████║██╔══██╗╚══██╔══╝██╔════╝
   ██║   ███████║██║     █████╗  ██╔████╔██║███████║   ██║   █████╗  
   ██║   ██╔══██║██║     ██╔══╝  ██║╚██╔╝██║██╔══██║   ██║   ██╔══╝  
   ██║   ██║  ██║███████╗███████╗██║ ╚═╝ ██║██║  ██║   ██║   ███████╗
   ╚═╝   ╚═╝  ╚═╝╚══════╝╚══════╝╚═╝     ╚═╝╚═╝  ╚═╝   ╚═╝   ╚══════╝
 v{VERSION}
 """
 async def install_punkt():
    import nltk
    log.info("Downloading NLTK punkt tokenizer")
    await asyncio.get_event_loop().run_in_executor(None, nltk.download, "punkt")
    log.info("Download complete")
 async def log_stream(stream, log_func):
    while True:
        line = await stream.readline()
        if not line:
            break
        decoded_line = line.decode().strip()
        # Check if the original line started with "INFO:" (Uvicorn startup messages)
        if decoded_line.startswith("INFO:"):
            # Use info level for Uvicorn startup messages
            log.info("uvicorn", message=decoded_line)
        else:
            # Use the provided log_func for other messages
            log_func("uvicron", message=decoded_line)
 async def run_frontend(host: str = "localhost", port: int = 8080):
    if sys.platform == "win32":
        activate_cmd = ".\\talemate_env\\Scripts\\activate.bat"
        frontend_cmd = f"{activate_cmd} && uvicorn --host {host} --port {port} frontend_wsgi:application"
    else:
        frontend_cmd = f"/bin/bash -c 'source talemate_env/bin/activate && uvicorn --host {host} --port {port} frontend_wsgi:application'"
    frontend_cwd = None
    process = await asyncio.create_subprocess_shell(
        frontend_cmd,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
        cwd=frontend_cwd,
        shell=True,
        preexec_fn=os.setsid if sys.platform != "win32" else None
    )
    asyncio.create_task(install_punkt())
    log.info(f"talemate frontend started", host=host, port=port, server="uvicorn", process=process.pid)
    try:
        stdout_task = asyncio.create_task(log_stream(process.stdout, log.info))
        stderr_task = asyncio.create_task(log_stream(process.stderr, log.error))
        await asyncio.gather(stdout_task, stderr_task)
        await process.wait()
    finally:
        if process.returncode is None:
            if sys.platform == "win32":
                process.terminate()
            else:
                os.killpg(os.getpgid(process.pid), signal.SIGTERM)
            await process.wait()
 async def cancel_all_tasks(loop):
    tasks = [t for t in asyncio.all_tasks(loop) if t is not asyncio.current_task()]
    [task.cancel() for task in tasks]
    await asyncio.gather(*tasks, return_exceptions=True)
 def run_server(args):
    """
    Run the talemate web server using the provided arguments.
@ -31,13 +107,33 @@ def run_server(args):
    if config.game.world_state.templates.state_reinforcement:
        Collection.create_from_legacy_config(config)
    loop = asyncio.get_event_loop()
    start_server = websockets.serve(
        websocket_endpoint, args.host, args.port, max_size=2**23
    )
-    asyncio.get_event_loop().run_until_complete(start_server)
+    
-    log.info("talemate backend started", host=args.host, port=args.port)
+    loop.run_until_complete(start_server)
-    asyncio.get_event_loop().run_forever()
+    
    if not args.backend_only:
        frontend_task = loop.create_task(run_frontend(args.frontend_host, args.frontend_port))
    else:
        frontend_task = None
    log.info("talemate backend started", host=args.host, port=args.port)
    try:
        loop.run_forever()
    except KeyboardInterrupt:
        pass
    finally:
        log.info("Shutting down...")
        if frontend_task:
            frontend_task.cancel()
        loop.run_until_complete(cancel_all_tasks(loop))
        loop.run_until_complete(loop.shutdown_asyncgens())
        loop.close()
        log.info("Shutdown complete")
 def main():
    parser = argparse.ArgumentParser(description="talemate server")
@ -49,8 +145,21 @@ def main():
    )
    runserver_parser.add_argument("--host", default="localhost", help="Hostname")
    runserver_parser.add_argument("--port", type=int, default=6000, help="Port")
    runserver_parser.add_argument("--backend-only", action="store_true", help="Run the backend only")
    # frontend host and port
    runserver_parser.add_argument("--frontend-host", default="localhost", help="Frontend Hostname")
    runserver_parser.add_argument("--frontend-port", type=int, default=8080, help="Frontend Port")
    args = parser.parse_args()
    # wipe screen if backend only mode is not enabled
    # reason: backend only is run usually in dev mode and may be worth keeping the console output
    if not args.backend_only:
        # this needs to work on windows and linux
        os.system("cls" if os.name == "nt" else "clear")
    print(STARTUP_TEXT)
    if args.command == "runserver":
        run_server(args)
@ -60,4 +169,4 @@ def main():
 if __name__ == "__main__":
-    main()
+    main()
--- a/src/talemate/server/websocket_server.py
+++ b/src/talemate/server/websocket_server.py
@ -15,6 +15,7 @@ from talemate.emit import Emission, Receiver, abort_wait_for_input, emit
 from talemate.files import list_scenes_directory
 from talemate.load import load_scene
 from talemate.scene_assets import Asset
 from talemate.agents.memory.exceptions import MemoryAgentError
 from talemate.server import (
    assistant,
    character_importer,
@ -203,9 +204,15 @@ class WebsocketHandler(Receiver):
            scene.active = True
            with ActiveScene(scene):
-                scene = await load_scene(
+                try:
-                    scene, path_or_data, conversation_helper.agent.client, reset=reset
+                    scene = await load_scene(
-                )
+                        scene, path_or_data, conversation_helper.agent.client, reset=reset
                    )
                except MemoryAgentError as e:
                    emit("status", message=str(e), status="error")
                    log.error("load_scene", error=str(e))
                    return
            self.scene = scene
@ -375,6 +382,7 @@ class WebsocketHandler(Receiver):
                        "type": emission.typ,
                        "message": emission.message,
                        "data": emission.data,
                        "meta": emission.meta,
                    }
                )
            except Exception as e:
--- a/src/talemate/tale_mate.py
+++ b/src/talemate/tale_mate.py
@ -33,6 +33,7 @@ from talemate.exceptions import (
    RestartSceneLoop,
    TalemateError,
    TalemateInterrupt,
    GenerationCancelled,
 )
 from talemate.game.state import GameState
 from talemate.instance import get_agent
@ -739,6 +740,15 @@ class Scene(Emitter):
        self.active_pins = []
        # Add an attribute to store the most recent AI Actor
        self.most_recent_ai_actor = None
        # if the user has requested to cancel the current action
        # or series of agent actions this will be true
        #
        # A check to self.continue_actions() will be made
        #
        # if self.cancel_requested is True self.continue_actions() will raise
        # a GenerationCancelled exception
        self.cancel_requested = False
        self.signals = {
            "ai_message": signal("ai_message"),
@ -1802,6 +1812,24 @@ class Scene(Emitter):
                item = f"{actor.character.name}: {actor.character.greeting_text}"
                emit("character", item, character=actor.character)
        max_backscroll = (
            self.config.get("game", {}).get("general", {}).get("max_backscroll", 512)
        )
        # history is not empty, so we are continuing a scene
        # need to emit current messages
        for item in self.history[-max_backscroll:]:
            char_name = item.split(":")[0]
            try:
                actor = self.get_character(char_name).actor
            except AttributeError:
                # If the character is not an actor, then it is the narrator
                emit(item.typ, item)
                continue
            emit("character", item, character=actor.character)
            if not actor.character.is_player:
                self.most_recent_ai_actor = actor
    async def _run_game_loop(self, init: bool = True):
        await self.ensure_memory_db()
@ -1966,7 +1994,11 @@ class Scene(Emitter):
                    await self.save(auto=True)
                self.emit_status()
-
+            except GenerationCancelled:
                signal_game_loop = False
                skip_to_player = True
                self.next_actor = None
                self.log.warning("Generation cancelled, skipping to player")
            except TalemateInterrupt:
                raise
            except LLMAccuracyError as e:
@ -2015,7 +2047,8 @@ class Scene(Emitter):
                self.saved = False
                self.emit_status()
-
+            except GenerationCancelled:
                continue
            except TalemateInterrupt:
                raise
            except LLMAccuracyError as e:
@ -2248,3 +2281,12 @@ class Scene(Emitter):
    @property
    def json(self):
        return json.dumps(self.serialize, indent=2, cls=save.SceneEncoder)
    def interrupt(self):
        self.cancel_requested = True
    def continue_actions(self):
        if self.cancel_requested:
            self.cancel_requested = False
            raise GenerationCancelled("action cancelled")
--- a/src/talemate/version.py
+++ b/src/talemate/version.py
@ -0,0 +1,3 @@
 __all__ = ["VERSION"]
 VERSION = "0.27.0"
--- a/start-backend.bat
+++ b/start-backend.bat
@ -1 +1 @@
-start cmd /k "cd talemate_env\Scripts && activate && cd ../../ && python src\talemate\server\run.py runserver --host 0.0.0.0 --port 5050"
+start cmd /k "cd talemate_env\Scripts && activate && cd ../../ && python src\talemate\server\run.py runserver --host 0.0.0.0 --port 5050 --backend-only"
--- a/start-backend.sh
+++ b/start-backend.sh
@ -1,3 +1,3 @@
 #!/bin/sh
 . talemate_env/bin/activate
-python src/talemate/server/run.py runserver --host 0.0.0.0 --port 5050
+python src/talemate/server/run.py runserver --host 0.0.0.0 --port 5050 --backend-only
--- a/start-developer.bat
+++ b/start-developer.bat
@ -0,0 +1,2 @@
 start cmd /k "cd talemate_frontend && npm run serve"
 start cmd /k "cd talemate_env\Scripts && activate && cd ../../ && python src\talemate\server\run.py runserver --host 0.0.0.0 --port 5050"
--- a/start-frontend.bat
+++ b/start-frontend.bat
--- a/start.bat
+++ b/start.bat
@ -1,2 +1 @@
 start cmd /k "cd talemate_frontend && npm run serve"
 start cmd /k "cd talemate_env\Scripts && activate && cd ../../ && python src\talemate\server\run.py runserver --host 0.0.0.0 --port 5050"
--- a/start.sh
+++ b/start.sh
@ -0,0 +1,3 @@
 #!/bin/sh
 . talemate_env/bin/activate
 python src/talemate/server/run.py runserver --host 0.0.0.0 --port 5050
--- a/talemate_frontend/package-lock.json
+++ b/talemate_frontend/package-lock.json
@ -1,12 +1,12 @@
 {
  "name": "talemate_frontend",
-  "version": "0.26.0",
+  "version": "0.27.0",
  "lockfileVersion": 3,
  "requires": true,
  "packages": {
    "": {
      "name": "talemate_frontend",
-      "version": "0.26.0",
+      "version": "0.27.0",
      "dependencies": {
        "@codemirror/lang-markdown": "^6.2.5",
        "@codemirror/theme-one-dark": "^6.1.2",
@ -2490,22 +2490,12 @@
      "version": "8.56.10",
      "resolved": "https://registry.npmjs.org/@types/eslint/-/eslint-8.56.10.tgz",
      "integrity": "sha512-Shavhk87gCtY2fhXDctcfS3e6FdxWkCx1iUZ9eEUbh7rTqlZT0/IzOkCOVt0fCjcFuZ9FPYfuezTBImfHCDBGQ==",
-      "devOptional": true,
+      "dev": true,
      "dependencies": {
        "@types/estree": "*",
        "@types/json-schema": "*"
      }
    },
    "node_modules/@types/eslint-scope": {
      "version": "3.7.7",
      "resolved": "https://registry.npmjs.org/@types/eslint-scope/-/eslint-scope-3.7.7.tgz",
      "integrity": "sha512-MzMFlSLBqNF2gcHWO0G1vP/YQyfvrxZ0bF+u7mzUdZ1/xK4A4sru+nraZz5i3iEIk1l1uyicaDVTB4QbbEkAYg==",
      "devOptional": true,
      "dependencies": {
        "@types/eslint": "*",
        "@types/estree": "*"
      }
    },
    "node_modules/@types/estree": {
      "version": "1.0.5",
      "resolved": "https://registry.npmjs.org/@types/estree/-/estree-1.0.5.tgz",
@ -4046,9 +4036,9 @@
      "dev": true
    },
    "node_modules/body-parser": {
-      "version": "1.20.2",
+      "version": "1.20.3",
-      "resolved": "https://registry.npmjs.org/body-parser/-/body-parser-1.20.2.tgz",
+      "resolved": "https://registry.npmjs.org/body-parser/-/body-parser-1.20.3.tgz",
-      "integrity": "sha512-ml9pReCu3M61kGlqoTm2umSXTlRTuGTx0bfYj+uIUKKYycG5NtSbeetV3faSU6R7ajOPw0g/J1PvK4qNy7s5bA==",
+      "integrity": "sha512-7rAxByjUMqQ3/bHJy7D6OGXvx/MMc4IqBn/X0fcM1QUcAItpZrBEYhWGem+tzXH90c+G01ypMcYJBO9Y30203g==",
      "dev": true,
      "dependencies": {
        "bytes": "3.1.2",
@ -4059,7 +4049,7 @@
        "http-errors": "2.0.0",
        "iconv-lite": "0.4.24",
        "on-finished": "2.4.1",
-        "qs": "6.11.0",
+        "qs": "6.13.0",
        "raw-body": "2.5.2",
        "type-is": "~1.6.18",
        "unpipe": "1.0.0"
@ -5516,9 +5506,9 @@
      }
    },
    "node_modules/encodeurl": {
-      "version": "1.0.2",
+      "version": "2.0.0",
-      "resolved": "https://registry.npmjs.org/encodeurl/-/encodeurl-1.0.2.tgz",
+      "resolved": "https://registry.npmjs.org/encodeurl/-/encodeurl-2.0.0.tgz",
-      "integrity": "sha512-TPJXq8JqFaVYm2CWmPvnP2Iyo4ZSM7/QKcSmuMLDObfpH5fi7RUGmd/rTDf+rut/saiDiQEeVTNgAmJEdAOx0w==",
+      "integrity": "sha512-Q0n9HRi4m6JuGIV1eFlmvJB7ZEVxu93IrMyiMsGC0lrMJMWzRgx6WGquyfQgZVb31vhGgXnfmPNNXmxnOkRBrg==",
      "dev": true,
      "engines": {
        "node": ">= 0.8"
@ -5534,9 +5524,9 @@
      }
    },
    "node_modules/enhanced-resolve": {
-      "version": "5.17.0",
+      "version": "5.17.1",
-      "resolved": "https://registry.npmjs.org/enhanced-resolve/-/enhanced-resolve-5.17.0.tgz",
+      "resolved": "https://registry.npmjs.org/enhanced-resolve/-/enhanced-resolve-5.17.1.tgz",
-      "integrity": "sha512-dwDPwZL0dmye8Txp2gzFmA6sxALaSvdRDjPH0viLcKrtlOL3tw62nWWweVD1SdILDTJrbrL6tdWVN58Wo6U3eA==",
+      "integrity": "sha512-LMHl3dXhTcfv8gM4kEzIUeTQ+7fpdA0l2tUf34BddXPkz2A5xJ5L/Pchd5BL6rdccM9QGvu0sWZzK1Z1t4wwyg==",
      "devOptional": true,
      "dependencies": {
        "graceful-fs": "^4.2.4",
@ -6238,37 +6228,37 @@
      }
    },
    "node_modules/express": {
-      "version": "4.19.2",
+      "version": "4.21.0",
-      "resolved": "https://registry.npmjs.org/express/-/express-4.19.2.tgz",
+      "resolved": "https://registry.npmjs.org/express/-/express-4.21.0.tgz",
-      "integrity": "sha512-5T6nhjsT+EOMzuck8JjBHARTHfMht0POzlA60WV2pMD3gyXw2LZnZ+ueGdNxG+0calOJcWKbpFcuzLZ91YWq9Q==",
+      "integrity": "sha512-VqcNGcj/Id5ZT1LZ/cfihi3ttTn+NJmkli2eZADigjq29qTlWi/hAQ43t/VLPq8+UX06FCEx3ByOYet6ZFblng==",
      "dev": true,
      "dependencies": {
        "accepts": "~1.3.8",
        "array-flatten": "1.1.1",
-        "body-parser": "1.20.2",
+        "body-parser": "1.20.3",
        "content-disposition": "0.5.4",
        "content-type": "~1.0.4",
        "cookie": "0.6.0",
        "cookie-signature": "1.0.6",
        "debug": "2.6.9",
        "depd": "2.0.0",
-        "encodeurl": "~1.0.2",
+        "encodeurl": "~2.0.0",
        "escape-html": "~1.0.3",
        "etag": "~1.8.1",
-        "finalhandler": "1.2.0",
+        "finalhandler": "1.3.1",
        "fresh": "0.5.2",
        "http-errors": "2.0.0",
-        "merge-descriptors": "1.0.1",
+        "merge-descriptors": "1.0.3",
        "methods": "~1.1.2",
        "on-finished": "2.4.1",
        "parseurl": "~1.3.3",
-        "path-to-regexp": "0.1.7",
+        "path-to-regexp": "0.1.10",
        "proxy-addr": "~2.0.7",
-        "qs": "6.11.0",
+        "qs": "6.13.0",
        "range-parser": "~1.2.1",
        "safe-buffer": "5.2.1",
-        "send": "0.18.0",
+        "send": "0.19.0",
-        "serve-static": "1.15.0",
+        "serve-static": "1.16.2",
        "setprototypeof": "1.2.0",
        "statuses": "2.0.1",
        "type-is": "~1.6.18",
@ -6450,13 +6440,13 @@
      }
    },
    "node_modules/finalhandler": {
-      "version": "1.2.0",
+      "version": "1.3.1",
-      "resolved": "https://registry.npmjs.org/finalhandler/-/finalhandler-1.2.0.tgz",
+      "resolved": "https://registry.npmjs.org/finalhandler/-/finalhandler-1.3.1.tgz",
-      "integrity": "sha512-5uXcUVftlQMFnWC9qu/svkWv3GTd2PfUhK/3PLkYNAe7FbqJMt3515HaxE6eRL74GdsriiwujiawdaB1BpEISg==",
+      "integrity": "sha512-6BN9trH7bp3qvnrRyzsBz+g3lZxTNZTbVO2EV1CS0WIcDbawYVdYvGflME/9QP0h0pYlCDBCTjYa9nZzMDpyxQ==",
      "dev": true,
      "dependencies": {
        "debug": "2.6.9",
-        "encodeurl": "~1.0.2",
+        "encodeurl": "~2.0.0",
        "escape-html": "~1.0.3",
        "on-finished": "2.4.1",
        "parseurl": "~1.3.3",
@ -8033,10 +8023,13 @@
      }
    },
    "node_modules/merge-descriptors": {
-      "version": "1.0.1",
+      "version": "1.0.3",
-      "resolved": "https://registry.npmjs.org/merge-descriptors/-/merge-descriptors-1.0.1.tgz",
+      "resolved": "https://registry.npmjs.org/merge-descriptors/-/merge-descriptors-1.0.3.tgz",
-      "integrity": "sha512-cCi6g3/Zr1iqQi6ySbseM1Xvooa98N0w31jzUYrXPX2xqObmFGHJ0tQ5u74H3mVh7wLouTseZyYIq39g8cNp1w==",
+      "integrity": "sha512-gaNvAS7TZ897/rVaZ0nMtAyxNyi/pdbjbAwUpFQpN70GqnVfOiXpeUUMKRBmzXaSQ8DdTX4/0ms62r2K+hE6mQ==",
-      "dev": true
+      "dev": true,
      "funding": {
        "url": "https://github.com/sponsors/sindresorhus"
      }
    },
    "node_modules/merge-source-map": {
      "version": "1.1.0",
@ -8072,9 +8065,9 @@
      }
    },
    "node_modules/micromatch": {
-      "version": "4.0.7",
+      "version": "4.0.8",
-      "resolved": "https://registry.npmjs.org/micromatch/-/micromatch-4.0.7.tgz",
+      "resolved": "https://registry.npmjs.org/micromatch/-/micromatch-4.0.8.tgz",
-      "integrity": "sha512-LPP/3KorzCwBxfeUuZmaR6bG2kdeHSbe0P2tY3FLRU4vYrjYz5hI4QZwV0njUx3jeuKe67YukQ1LSPZBKDqO/Q==",
+      "integrity": "sha512-PXwfBhYu0hBCPw8Dn0E+WDYb7af3dSLVWKi3HGv84IdF4TyFoC0ysxFd0Goxw7nSv4T/PzEJQxsYsEiFCKo2BA==",
      "dev": true,
      "dependencies": {
        "braces": "^3.0.3",
@ -8536,10 +8529,13 @@
      }
    },
    "node_modules/object-inspect": {
-      "version": "1.13.1",
+      "version": "1.13.2",
-      "resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.13.1.tgz",
+      "resolved": "https://registry.npmjs.org/object-inspect/-/object-inspect-1.13.2.tgz",
-      "integrity": "sha512-5qoj1RUiKOMsCCNLV1CBiPYE10sziTsnmNxkAI/rZhiD63CF7IqdFGC/XzjWjpSgLf0LxXX3bDFIh0E18f6UhQ==",
+      "integrity": "sha512-IRZSRuzJiynemAXPYtPe5BoI/RESNYR7TYm50MC5Mqbd3Jmw5y790sErYw3V6SryFJD64b74qQQs9wn5Bg/k3g==",
      "dev": true,
      "engines": {
        "node": ">= 0.4"
      },
      "funding": {
        "url": "https://github.com/sponsors/ljharb"
      }
@ -8933,9 +8929,9 @@
      "dev": true
    },
    "node_modules/path-to-regexp": {
-      "version": "0.1.7",
+      "version": "0.1.10",
-      "resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-0.1.7.tgz",
+      "resolved": "https://registry.npmjs.org/path-to-regexp/-/path-to-regexp-0.1.10.tgz",
-      "integrity": "sha512-5DFkuoqlv1uYQKxy8omFBeJPQcdoE07Kv2sferDCrAq1ohOU+MSDswDIbnx3YAM60qIOnYa53wBhXW0EbMonrQ==",
+      "integrity": "sha512-7lf7qcQidTku0Gu3YDPc8DJ1q7OOucfa/BSsIwjuh56VU7katFvuM8hULfkwB3Fns/rsVF7PwPKVw1sl5KQS9w==",
      "dev": true
    },
    "node_modules/path-type": {
@ -9701,12 +9697,12 @@
      }
    },
    "node_modules/qs": {
-      "version": "6.11.0",
+      "version": "6.13.0",
-      "resolved": "https://registry.npmjs.org/qs/-/qs-6.11.0.tgz",
+      "resolved": "https://registry.npmjs.org/qs/-/qs-6.13.0.tgz",
-      "integrity": "sha512-MvjoMCJwEarSbUYk5O+nmoSzSutSsTwF85zcHPQ9OrlFoZOYIjaqBAJIqIXjptyD5vThxGq52Xu/MaJzRkIk4Q==",
+      "integrity": "sha512-+38qI9SOr8tfZ4QmJNplMUxqjbe7LKvvZgWdExBOmd+egZTtjLB67Gu0HRX3u/XOq7UU2Nx6nsjvS16Z9uwfpg==",
      "dev": true,
      "dependencies": {
-        "side-channel": "^1.0.4"
+        "side-channel": "^1.0.6"
      },
      "engines": {
        "node": ">=0.6"
@ -10209,9 +10205,9 @@
      }
    },
    "node_modules/send": {
-      "version": "0.18.0",
+      "version": "0.19.0",
-      "resolved": "https://registry.npmjs.org/send/-/send-0.18.0.tgz",
+      "resolved": "https://registry.npmjs.org/send/-/send-0.19.0.tgz",
-      "integrity": "sha512-qqWzuOjSFOuqPjFe4NOsMLafToQQwBSOEpS+FwEt3A2V3vKubTquT3vmLTQpFgMXp8AlFWFuP1qKaJZOtPpVXg==",
+      "integrity": "sha512-dW41u5VfLXu8SJh5bwRmyYUbAoSB3c9uQh6L8h/KtsFREPWpbX1lrljJo186Jc4nmci/sGUZ9a0a0J2zgfq2hw==",
      "dev": true,
      "dependencies": {
        "debug": "2.6.9",
@ -10247,6 +10243,15 @@
      "integrity": "sha512-Tpp60P6IUJDTuOq/5Z8cdskzJujfwqfOTkrwIwj7IRISpnkJnT6SyJ4PCPnGMoFjC9ddhal5KVIYtAt97ix05A==",
      "dev": true
    },
    "node_modules/send/node_modules/encodeurl": {
      "version": "1.0.2",
      "resolved": "https://registry.npmjs.org/encodeurl/-/encodeurl-1.0.2.tgz",
      "integrity": "sha512-TPJXq8JqFaVYm2CWmPvnP2Iyo4ZSM7/QKcSmuMLDObfpH5fi7RUGmd/rTDf+rut/saiDiQEeVTNgAmJEdAOx0w==",
      "dev": true,
      "engines": {
        "node": ">= 0.8"
      }
    },
    "node_modules/send/node_modules/ms": {
      "version": "2.1.3",
      "resolved": "https://registry.npmjs.org/ms/-/ms-2.1.3.tgz",
@ -10341,15 +10346,15 @@
      }
    },
    "node_modules/serve-static": {
-      "version": "1.15.0",
+      "version": "1.16.2",
-      "resolved": "https://registry.npmjs.org/serve-static/-/serve-static-1.15.0.tgz",
+      "resolved": "https://registry.npmjs.org/serve-static/-/serve-static-1.16.2.tgz",
-      "integrity": "sha512-XGuRDNjXUijsUL0vl6nSD7cwURuzEgglbOaFuZM9g3kwDXOWVTck0jLzjPzGD+TazWbboZYu52/9/XPdUgne9g==",
+      "integrity": "sha512-VqpjJZKadQB/PEbEwvFdO43Ax5dFBZ2UECszz8bQ7pi7wt//PWe1P6MN7eCnjsatYtBT6EuiClbjSWP2WrIoTw==",
      "dev": true,
      "dependencies": {
-        "encodeurl": "~1.0.2",
+        "encodeurl": "~2.0.0",
        "escape-html": "~1.0.3",
        "parseurl": "~1.3.3",
-        "send": "0.18.0"
+        "send": "0.19.0"
      },
      "engines": {
        "node": ">= 0.8.0"
@ -11550,12 +11555,11 @@
      "dev": true
    },
    "node_modules/webpack": {
-      "version": "5.92.0",
+      "version": "5.94.0",
-      "resolved": "https://registry.npmjs.org/webpack/-/webpack-5.92.0.tgz",
+      "resolved": "https://registry.npmjs.org/webpack/-/webpack-5.94.0.tgz",
-      "integrity": "sha512-Bsw2X39MYIgxouNATyVpCNVWBCuUwDgWtN78g6lSdPJRLaQ/PUVm/oXcaRAyY/sMFoKFQrsPeqvTizWtq7QPCA==",
+      "integrity": "sha512-KcsGn50VT+06JH/iunZJedYGUJS5FGjow8wb9c0v5n1Om8O1g4L6LjtfxwlXIATopoQu+vOXXa7gYisWxCoPyg==",
      "devOptional": true,
      "dependencies": {
        "@types/eslint-scope": "^3.7.3",
        "@types/estree": "^1.0.5",
        "@webassemblyjs/ast": "^1.12.1",
        "@webassemblyjs/wasm-edit": "^1.12.1",
@ -11564,7 +11568,7 @@
        "acorn-import-attributes": "^1.9.5",
        "browserslist": "^4.21.10",
        "chrome-trace-event": "^1.0.2",
-        "enhanced-resolve": "^5.17.0",
+        "enhanced-resolve": "^5.17.1",
        "es-module-lexer": "^1.2.1",
        "eslint-scope": "5.1.1",
        "events": "^3.2.0",
--- a/talemate_frontend/package.json
+++ b/talemate_frontend/package.json
@ -1,6 +1,6 @@
 {
  "name": "talemate_frontend",
-  "version": "0.26.0",
+  "version": "0.27.0",
  "private": true,
  "scripts": {
    "serve": "vue-cli-service serve",
--- a/talemate_frontend/src/components/AIClient.vue
+++ b/talemate_frontend/src/components/AIClient.vue
@ -1,92 +1,104 @@
 <template>
  <v-list-subheader class="text-uppercase"><v-icon>mdi-network-outline</v-icon>
    Clients
    <v-btn @click="hideDisabled = !hideDisabled" size="x-small" v-if="numDisabledClients > 0">
      <template v-slot:prepend>
        <v-icon>{{ hideDisabled ? 'mdi-eye' : 'mdi-eye-off' }}</v-icon>
      </template>
      {{ hideDisabled ? 'Show disabled' : 'Hide disabled' }} ({{ numDisabledClients }})
    </v-btn>
  </v-list-subheader>
  <div v-if="isConnected()">
-    <v-list density="compact" v-for="(client, index) in state.clients" :key="index">
+    <div v-for="(client, index) in state.clients" :key="index">
-      <v-list-item>
+      <v-list density="compact" v-if="client.status !== 'disabled' || !hideDisabled">
-        <v-list-item-title>
+        <v-list-item>
-          <v-progress-circular v-if="client.status === 'busy'" indeterminate="disable-shrink" color="primary"
+          <v-list-item-title>
-            size="14"></v-progress-circular>
+            <v-progress-circular v-if="client.status === 'busy'" indeterminate="disable-shrink" color="primary"
-          
+              size="14"></v-progress-circular>
-          <v-icon v-else-if="client.status == 'warning'" color="orange" size="14">mdi-checkbox-blank-circle</v-icon>
+            
-          <v-icon v-else-if="client.status == 'error'" color="red-darken-1" size="14">mdi-checkbox-blank-circle</v-icon>
+            <v-icon v-else-if="client.status == 'warning'" color="orange" size="14">mdi-checkbox-blank-circle</v-icon>
-          <v-btn v-else-if="client.status == 'disabled'" size="x-small" class="mr-1" variant="tonal" density="comfortable" rounded="sm" @click.stop="toggleClient(client)" icon="mdi-power-standby"></v-btn>
+            <v-icon v-else-if="client.status == 'error'" color="red-darken-1" size="14">mdi-checkbox-blank-circle</v-icon>
-          <v-icon v-else color="green" size="14">mdi-checkbox-blank-circle</v-icon>
+            <v-btn v-else-if="client.status == 'disabled'" size="x-small" class="mr-1" variant="tonal" density="comfortable" rounded="sm" @click.stop="toggleClient(client)" icon="mdi-power-standby"></v-btn>
-          <span :class="client.status == 'disabled' ? 'text-grey-darken-2 ml-1' : 'ml-1'"> {{ client.name }}</span>       
+            <v-icon v-else color="green" size="14">mdi-checkbox-blank-circle</v-icon>
-        </v-list-item-title>
+            <span :class="client.status == 'disabled' ? 'text-grey-darken-2 ml-1' : 'ml-1'"> {{ client.name }}</span>       
-        <div v-if="client.enabled">
+          </v-list-item-title>
-
+          <div v-if="client.enabled">
-          <v-list-item-subtitle class="text-caption" v-if="client.data.error_action != null">
+  
-            <v-btn class="mt-1 mb-1" variant="tonal" :prepend-icon="client.data.error_action.icon" size="x-small" color="warning" @click.stop="callErrorAction(client, client.data.error_action)">
+            <v-list-item-subtitle class="text-caption" v-if="client.data.error_action != null">
-              {{ client.data.error_action.title }}
+              <v-btn class="mt-1 mb-1" variant="tonal" :prepend-icon="client.data.error_action.icon" size="x-small" color="warning" @click.stop="callErrorAction(client, client.data.error_action)">
-            </v-btn>
+                {{ client.data.error_action.title }}
-          </v-list-item-subtitle> 
+              </v-btn>
-          <v-list-item-subtitle class="text-caption">
+            </v-list-item-subtitle> 
-            {{ client.model_name }}
+            <v-list-item-subtitle class="text-caption">
-          </v-list-item-subtitle>
+              {{ client.model_name }}
-          <v-list-item-subtitle class="text-caption">
+            </v-list-item-subtitle>
-            {{ client.type }} 
+            <v-list-item-subtitle class="text-caption">
-            <v-chip label size="x-small" variant="outlined" class="ml-1">ctx {{ client.max_token_length }}</v-chip>
+              {{ client.type }} 
-          </v-list-item-subtitle>
+              <v-chip label size="x-small" variant="outlined" class="ml-1">ctx {{ client.max_token_length }}</v-chip>
-          <div density="compact">
+            </v-list-item-subtitle>
-            <v-slider
+            <div density="compact">
-              hide-details
+              <v-slider
-              v-model="client.max_token_length"
+                hide-details
-              :min="1024"
+                v-model="client.max_token_length"
-              :max="128000"
+                :min="1024"
-              :step="1024"
+                :max="128000"
-              @update:modelValue="saveClientDelayed(client)"
+                :step="1024"
-              @click.stop
+                @update:modelValue="saveClientDelayed(client)"
-              density="compact"
+                @click.stop
-            ></v-slider>
+                density="compact"
              ></v-slider>
            </div>
            <v-list-item-subtitle class="text-center">
              <!-- LLM prompt template warning -->
              <v-tooltip text="No LLM prompt template for this model. Using default. Templates can be added in ./templates/llm-prompt" v-if="client.status === 'idle' && client.data && !client.data.has_prompt_template && client.data.meta.requires_prompt_template" max-width="200">
                <template v-slot:activator="{ props }">
                  <v-icon x-size="14" class="mr-1" v-bind="props" color="orange">mdi-alert</v-icon>
                </template>
              </v-tooltip>
              <!-- coercion status -->
              <v-tooltip :text="'Coercion active: ' + client.double_coercion" v-if="client.double_coercion" max-width="200">
                <template v-slot:activator="{ props }">
                  <v-icon x-size="14" class="mr-1" v-bind="props" color="primary">mdi-account-lock-open</v-icon>
                </template>
              </v-tooltip>
              <!-- disable/enable -->
              <v-tooltip :text="client.enabled ? 'Disable':'Enable'">
                <template v-slot:activator="{ props }">
                  <v-btn size="x-small" class="mr-1" v-bind="props" variant="tonal" density="comfortable" rounded="sm" @click.stop="toggleClient(client)" icon="mdi-power-standby"></v-btn>
                </template>
              </v-tooltip>
              <!-- edit client button -->
              <v-tooltip text="Edit client">
                <template v-slot:activator="{ props }">
                  <v-btn size="x-small" class="mr-1" v-bind="props" variant="tonal" density="comfortable" rounded="sm" @click.stop="editClient(index)" icon="mdi-cogs"></v-btn>
                </template>
              </v-tooltip>
              <!-- assign to all agents button -->
              <v-tooltip text="Assign to all agents">
                <template v-slot:activator="{ props }">
                  <v-btn size="x-small" class="mr-1" v-bind="props" variant="tonal" density="comfortable" rounded="sm" @click.stop="assignClientToAllAgents(index)" icon="mdi-transit-connection-variant"></v-btn>
                </template>
              </v-tooltip>
              <!-- delete the client button -->
              <v-tooltip text="Delete client">
                <template v-slot:activator="{ props }">
                  <v-btn size="x-small" class="mr-1" v-bind="props" variant="tonal" density="comfortable" rounded="sm" @click.stop="deleteClient(index)" icon="mdi-close-thick"></v-btn>
                </template>
              </v-tooltip>
            </v-list-item-subtitle>
          </div>
-          <v-list-item-subtitle class="text-center">
+        </v-list-item>
      </v-list>
    </div>
            <!-- LLM prompt template warning -->
            <v-tooltip text="No LLM prompt template for this model. Using default. Templates can be added in ./templates/llm-prompt" v-if="client.status === 'idle' && client.data && !client.data.has_prompt_template && client.data.meta.requires_prompt_template" max-width="200">
              <template v-slot:activator="{ props }">
                <v-icon x-size="14" class="mr-1" v-bind="props" color="orange">mdi-alert</v-icon>
              </template>
            </v-tooltip>
            <!-- coercion status -->
            <v-tooltip :text="'Coercion active: ' + client.double_coercion" v-if="client.double_coercion" max-width="200">
              <template v-slot:activator="{ props }">
                <v-icon x-size="14" class="mr-1" v-bind="props" color="primary">mdi-account-lock-open</v-icon>
              </template>
            </v-tooltip>
            <!-- disable/enable -->
            <v-tooltip :text="client.enabled ? 'Disable':'Enable'">
              <template v-slot:activator="{ props }">
                <v-btn size="x-small" class="mr-1" v-bind="props" variant="tonal" density="comfortable" rounded="sm" @click.stop="toggleClient(client)" icon="mdi-power-standby"></v-btn>
              </template>
            </v-tooltip>
            <!-- edit client button -->
            <v-tooltip text="Edit client">
              <template v-slot:activator="{ props }">
                <v-btn size="x-small" class="mr-1" v-bind="props" variant="tonal" density="comfortable" rounded="sm" @click.stop="editClient(index)" icon="mdi-cogs"></v-btn>
              </template>
            </v-tooltip>
            <!-- assign to all agents button -->
            <v-tooltip text="Assign to all agents">
              <template v-slot:activator="{ props }">
                <v-btn size="x-small" class="mr-1" v-bind="props" variant="tonal" density="comfortable" rounded="sm" @click.stop="assignClientToAllAgents(index)" icon="mdi-transit-connection-variant"></v-btn>
              </template>
            </v-tooltip>
            <!-- delete the client button -->
            <v-tooltip text="Delete client">
              <template v-slot:activator="{ props }">
                <v-btn size="x-small" class="mr-1" v-bind="props" variant="tonal" density="comfortable" rounded="sm" @click.stop="deleteClient(index)" icon="mdi-close-thick"></v-btn>
              </template>
            </v-tooltip>
          </v-list-item-subtitle>
        </div>
      </v-list-item>
    </v-list>
    <ClientModal :dialog="state.dialog" :formTitle="state.formTitle" @save="saveClient" @error="propagateError" @update:dialog="updateDialog"></ClientModal>
    <v-alert type="warning" variant="tonal" v-if="state.clients.length === 0">You have no LLM clients configured. Add one.</v-alert>
    <v-btn @click="openModal" elevation="0" prepend-icon="mdi-plus-box">Add client</v-btn>
@ -104,6 +116,7 @@ export default {
    return {
      saveDelayTimeout: null,
      clientStatusCheck: null,
      hideDisabled: true,
      clientImmutable: {},
      state: {
        clients: [],
@ -123,6 +136,14 @@ export default {
      }
    }
  },
  computed: {
    visibleClients: function() {
      return this.state.clients.filter(client => !this.hideDisabled || client.status !== 'disabled');
    },
    numDisabledClients: function() {
      return this.state.clients.filter(client => client.status === 'disabled').length;
    }
  },
  inject: [
    'getWebsocket',
    'registerMessageHandler',
@ -138,6 +159,8 @@ export default {
    'clients-updated',
    'client-assigned',
    'open-app-config',
    'save',
    'error',
  ],
  methods: {
@ -318,4 +341,9 @@ export default {
    this.registerMessageHandler(this.handleMessage);
  },
 }
-</script>
+</script>
 <style scoped>
 .hidden {
  display: none !important;
 }
 </style>
--- a/talemate_frontend/src/components/AppConfig.vue
+++ b/talemate_frontend/src/components/AppConfig.vue
@ -270,7 +270,12 @@
                <!-- PRESETS -->
                <v-window-item value="presets">
-                    <AppConfigPresets :immutable-config="app_config" ref="presets"></AppConfigPresets>
+                    <AppConfigPresets 
                    ref="presets"
                    :immutable-config="app_config" 
                    :agentStatus="agentStatus"
                    :sceneActive="sceneActive"
                    ></AppConfigPresets>
                </v-window-item>
                <!-- CREATOR -->
@ -342,6 +347,10 @@ export default {
    components: {
        AppConfigPresets,
    },
    props: {
        agentStatus: Object,
        sceneActive: Boolean,
    },
    data() {
        return {
            tab: 'game',
@ -450,7 +459,17 @@ export default {
            // check if presets component is present
            if(this.$refs.presets) {
                // update app_config.presets from $refs.presets.config
-                this.app_config.presets = this.$refs.presets.config;
+
                let inferenceConfig = this.$refs.presets.inference_config();
                let embeddingsConfig = this.$refs.presets.embeddings_config();
                if(inferenceConfig) {
                    this.app_config.presets.inference = inferenceConfig;
                }
                if(embeddingsConfig) {
                    this.app_config.presets.embeddings = embeddingsConfig;
                }
            }
            this.sendRequest({
--- a/talemate_frontend/src/components/AppConfigPresets.vue
+++ b/talemate_frontend/src/components/AppConfigPresets.vue
@ -1,129 +1,66 @@
 <template>
-
+    <v-tabs color="secondary" v-model="tab" :disabled="busy">
-    <v-alert density="compact" type="warning" variant="text">
+        <v-tab v-for="t in tabs" :key="t.value" :value="t.value">
-        <p>
+            <v-icon start>{{ t.icon }}</v-icon>
-            This interface is a work in progress and right now serves as a very basic way to edit inference parameter presets.
+            {{ t.title }}
-        </p>
+        </v-tab>
-        <p class="text-caption text-grey">
+    </v-tabs>
-            Not all clients support all parameters, and generally it is assumed that the client implementation
+    <v-window v-model="tab">
-            handles the parameters in a sane way, especially if values are passed for all of them. <span class="text-primary">All presets are used</span> and will be selected depending on the action the agent is performing. If you don't know what these mean, it is recommended to leave them as they are.
+        <v-window-item value="inference">
-        </p>
+            <AppConfigPresetsInference ref="inference" :immutableConfig="immutableConfig" @update="() => $emit('update', config)"></AppConfigPresetsInference>
-    </v-alert>
+        </v-window-item>
-
+        <v-window-item value="embeddings">
-    <v-row>
+            <AppConfigPresetsEmbeddings 
-        <v-col cols="4">
+            ref="embeddings" 
-            <!-- list with all presets by key, read from `config` -->
+            @busy="() => busy = true"
-            <v-list slim selectable v-model:selected="selected" color="primary">
+            @done="() => busy = false"
-                <v-list-item v-for="(preset, preset_key) in config.inference" :key="preset_key" :value="preset_key" prepend-icon="mdi-tune">
+            :memoryAgentStatus="agentStatus.memory || null" :immutableConfig="immutableConfig"
-                    <v-list-item-title>{{ toLabel(preset_key) }}</v-list-item-title>
+            :sceneActive="sceneActive"
-                </v-list-item>
+            @update="() => $emit('update', config)"
-            </v-list>
+            ></AppConfigPresetsEmbeddings>
-        </v-col>
+        </v-window-item>
-        <v-col cols="8">
+    </v-window>
            <!--
            class InferenceParameters(BaseModel):
                temperature: float = 1.0
                temperature_last: bool = True
                top_p: float | None = 1.0
                top_k: int | None = 0
                min_p: float | None = 0.1
                presence_penalty: float | None = 0.2
                frequency_penalty: float | None = 0.2
                repetition_penalty: float | None= 1.1
                repetition_penalty_range: int | None = 1024
            Display editable form for the selected preset
            Will use sliders for float and int values, and checkboxes for bool values
            -->
            <div v-if="selected.length === 1">
                <v-form>
                    <v-card>
                        <v-card-title>
                            <v-row no-gutters>
                                <v-col cols="8">
                                    {{ toLabel(selected[0]) }}
                                </v-col>
                                <v-col cols="4" class="text-right">
                                    <v-btn variant="text" size="small" color="warning" prepend-icon="mdi-refresh" @click="config.inference[selected[0]] = {...immutableConfig.presets.inference_defaults[selected[0]]}">Reset</v-btn>
                                </v-col>
                            </v-row>
                        </v-card-title>
                        <v-card-text>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].temperature" min="0.1" max="2.0" step="0.05" label="Temperature" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].top_p" min="0.1" max="1.0" step="0.05" label="Top P" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].top_k" min="0" max="1024" step="1" label="Top K" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].min_p" min="0" max="1.0" step="0.01" label="Min P" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].presence_penalty" min="0" max="1.0" step="0.01" label="Presence Penalty" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].frequency_penalty" min="0" max="1.0" step="0.01" label="Frequency Penalty" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].repetition_penalty" min="1.0" max="1.20" step="0.01" label="Repetition Penalty" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].repetition_penalty_range" min="0" max="4096" step="256" label="Repetition Penalty Range" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-checkbox density="compact" v-model="config.inference[selected[0]].temperature_last" label="Sample temperature last" @update:model-value="setPresetChanged(selected[0])"></v-checkbox>
                        </v-card-text>
                    </v-card>
                </v-form>
            </div>
            <div v-else>
                <v-alert color="grey" variant="text">Select a preset to edit</v-alert>
            </div>
        </v-col>
    </v-row>
 </template>
 <script>
 import AppConfigPresetsInference from './AppConfigPresetsInference.vue';
 import AppConfigPresetsEmbeddings from './AppConfigPresetsEmbeddings.vue';
 export default {
    name: 'AppConfigPresets',
    components: {
        AppConfigPresetsInference,
        AppConfigPresetsEmbeddings,
    },
    props: {
        immutableConfig: Object,
-    },
+        agentStatus: Object,
-    watch: {
+        sceneActive: Boolean,
        immutableConfig: {
            handler: function(newVal) {
                if(!newVal) {
                    this.config = {};
                    return;
                }
                this.config = {...newVal.presets};
            },
            immediate: true,
            deep: true,
        },
    },
    emits: [
        'update',
    ],
    data() {
        return {
-            selected: [],
+            tabs: [
-            config: {
+                { title: 'Inference', icon: 'mdi-matrix', value: 'inference' },
-                inference: {},
+                { title: 'Embeddings', icon: 'mdi-cube-unfolded', value: 'embeddings' },
-            },
+            ],
            tab: 'inference',
            busy: false,
        }
    },
    methods: {
-
+        inference_config() {
-        setPresetChanged(presetName) {
+            if(this.$refs.inference) {
-            // this ensures that the change gets saved
+                return this.$refs.inference.config.inference;
-            this.config.inference[presetName].changed = true;
+            }
            return null;
        },
-
+        embeddings_config() {
-        toLabel(key) {
+            if(this.$refs.embeddings) {
-            return key.replace(/_/g, ' ').replace(/\b\w/g, l => l.toUpperCase());
+                return this.$refs.embeddings.config.embeddings;
            }
            return null;
        },
    },
 }
--- a/talemate_frontend/src/components/AppConfigPresetsEmbeddings.vue
+++ b/talemate_frontend/src/components/AppConfigPresetsEmbeddings.vue
@ -0,0 +1,268 @@
 <template>
    <v-row>
        <v-col cols="4">
            <!-- list with all presets by key, read from `config` -->
            <v-list slim selectable v-model:selected="selected" color="primary" :disabled="busy">
                <!-- add new -->
                <v-list-item @click.stop="addNewPreset" prepend-icon="mdi-plus" :value="'$NEW'">
                    <v-list-item-title>Add new</v-list-item-title>
                </v-list-item>
                <!-- existing -->
                <v-list-item v-for="(preset, preset_key) in config.embeddings" :key="preset_key" :value="preset_key" prepend-icon="mdi-tune">
                    <v-list-item-title>{{ preset.model }}</v-list-item-title>
                    <v-list-item-subtitle>{{ preset.embeddings }}</v-list-item-subtitle>
                </v-list-item>
            </v-list>
        </v-col>
        <v-col cols="8">
            <!--
            class EmbeddingFunctionPreset(BaseModel):
                embeddings: str = "sentence-transformer"
                model: str = "all-MiniLM-L6-v2"
                trust_remote_code: bool = False
                device: str = "cpu"
                distance: float = 1.5
                distance_mod: int = 1
                distance_function: str = "l2"
                fast: bool = True
                gpu_recommendation: bool = False
                local: bool = True
            Display editable form for the selected preset
            Will use sliders for float and int values, and checkboxes for bool values
            -->
            <div v-if="newPreset !== null">
                <v-card class="overflow-y-auto">
                    <v-form ref="formNewPreset" v-model="formNewPresetValid">
                        <v-card-title>
                            Add new embeddings preset
                        </v-card-title>
                        <v-card-text>
                            <v-select v-model="newPreset.embeddings" :items="embeddings" label="Embeddings" :rules="[rulesNewPreset.required]"></v-select>
                            <v-text-field v-model="newPreset.model" label="Model"  :rules="[rulesNewPreset.required, rulesNewPreset.exists]"></v-text-field>
                        </v-card-text>
                    </v-form>
                    <v-card-actions>
                        <v-btn color="primary" @click="commitNewPreset" prepend-icon="mdi-check-circle-outline">Continue</v-btn>
                        <v-btn color="error" @click="newPreset = null; selected=[]" prepend-icon="mdi-close">Cancel</v-btn>
                    </v-card-actions>
                </v-card>
            </div>
            <div v-else-if="selected.length === 1">
                <v-form class="overflow-y-auto">
                    <v-card class="overflow-y-auto">
                        <v-card-title>
                            <v-row no-gutters>
                                <v-col cols="8">
                                    {{ toLabel(selected[0]) }}
                                </v-col>
                                <v-col cols="4" class="text-right" v-if="config.embeddings[selected[0]].custom === false">
                                    <v-btn variant="text" size="small" color="warning" prepend-icon="mdi-refresh" @click="config.embeddings[selected[0]] = {...immutableConfig.presets.embeddings_defaults[selected[0]]}">Reset</v-btn>
                                </v-col>
                                <v-col cols="4" class="text-right" v-else>
                                    <v-btn variant="text" size="small" color="delete" prepend-icon="mdi-close-box-outline" @click="deleteCustomPreset(selected[0])">Delete</v-btn>
                                </v-col>
                            </v-row>
                        </v-card-title>
                        <v-card-text>
                            <v-select disabled v-model="config.embeddings[selected[0]].embeddings" :items="embeddings" label="Embeddings" @update:model-value="setPresetChanged(selected[0])"></v-select>
                            <v-text-field disabled v-model="config.embeddings[selected[0]].model" label="Model" @update:model-value="setPresetChanged(selected[0])"></v-text-field>
                            <v-checkbox :disabled="busy" v-if="isSentenceTransformer" v-model="config.embeddings[selected[0]].trust_remote_code" hide-details label="Trust Remote Code" @update:model-value="setPresetChanged(selected[0])"></v-checkbox>
                            <!-- trust remote code can be dangerous, if it is enabled display a v-alert message about the implications -->
                            <v-alert :disabled="busy" class="mb-4" density="compact" v-if="config.embeddings[selected[0]].trust_remote_code" color="warning" icon="mdi-alert" variant="text">Trusting remote code can be dangerous, only enable if you trust the source</v-alert>
                            <v-select :disabled="busy" v-if="isLocal" v-model="config.embeddings[selected[0]].device" :items="devices" label="Device" @update:model-value="setPresetChanged(selected[0])"></v-select>
                            <v-slider :disabled="busy" thumb-label="always" density="compact" v-model="config.embeddings[selected[0]].distance" min="0.1" max="10.0" step="0.1" label="Distance" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider :disabled="busy" thumb-label="always" density="compact" v-model="config.embeddings[selected[0]].distance_mod" min="1" max="1000" step="10" label="Distance Mod" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-select :disabled="busy" v-model="config.embeddings[selected[0]].distance_function" :items="distanceFunctions" label="Distance Function" @update:model-value="setPresetChanged(selected[0])"></v-select>
                            <v-row>
                                <v-col cols="3">
                                    <v-checkbox :disabled="busy" v-model="config.embeddings[selected[0]].fast" label="Fast" @update:model-value="setPresetChanged(selected[0])"></v-checkbox>
                                </v-col>
                                <v-col cols="6">
                                    <v-checkbox :disabled="busy" v-if="isLocal" v-model="config.embeddings[selected[0]].gpu_recommendation" label="GPU Recommendation" @update:model-value="setPresetChanged(selected[0])"></v-checkbox>
                                </v-col>
                                <v-col cols="3">
                                    <v-checkbox :disabled="busy" v-model="config.embeddings[selected[0]].local" label="Local" @update:model-value="setPresetChanged(selected[0])"></v-checkbox>
                                </v-col>
                            </v-row>
                            <v-alert v-if="isCurrentyLoaded" color="unsaved" icon="mdi-refresh" density="compact" variant="text">This embedding is currently loaded by the Memory agent and saving changes will cause the associated databse to be recreated and repopulated immediately after saving. Depending on the size of the model and scene this may take a while.</v-alert>
                            <p v-if="busy">
                                <v-progress-linear color="primary" height="2" indeterminate></v-progress-linear>
                            </p>
                        </v-card-text>
                    </v-card>
                </v-form>
            </div>
            <div v-else>
                <v-alert color="grey" variant="text">Select a preset to edit</v-alert>
            </div>
        </v-col>
    </v-row>
 </template>
 <script>
 export default {
    name: 'AppConfigPresets',
    components: {
    },
    props: {
        immutableConfig: Object,
        memoryAgentStatus: Object,
        sceneActive: Boolean,
    },
    watch: {
        immutableConfig: {
            handler: function(newVal) {
                console.log('immutableConfig changed', newVal);
                if(!newVal) {
                    this.config = {};
                    return;
                }
                this.config = {...newVal.presets};
            },
            immediate: true,
            deep: true,
        },
        busy: {
            handler: function(newVal) {
                if(newVal) {
                    this.$emit('busy');
                } else {
                    this.$emit('done');
                }
            },
            immediate: true,
        }
    },
    emits: [
        'update',
        'busy',
        'done',
    ],
    computed: {
        isLocal() {
            if(this.selected.length === 0) {
                return false;
            }
            return this.config.embeddings[this.selected[0]].local;
        },
        isSentenceTransformer() {
            if(this.selected.length === 0) {
                return false;
            }
            return this.config.embeddings[this.selected[0]].embeddings === 'sentence-transformer';
        },
        isCurrentyLoaded() {
            console.log('isCurrentyLoaded', this.memoryAgentStatus, this.selected, this.sceneActive);
            if(!this.memoryAgentStatus || !this.selected.length || !this.sceneActive) {
                return false;
            }
            return this.memoryAgentStatus.details.model.value == this.config.embeddings[this.selected[0]].model;
        },
        busy() {
            return this.memoryAgentStatus && this.memoryAgentStatus.status === 'busy';
        }
    },
    data() {
        return {
            selected: [],
            newPreset: null,
            rulesNewPreset: {
                required: value => !!value || 'Required.',
                exists: value => !this.config.embeddings[value] || 'Already exists.',
            },
            formNewPresetValid: false,
            config: {
                embeddings: {},
            },
            embeddings: [
                {title: 'SentenceTransformer', value: 'sentence-transformer'},
                {title: 'Instructor', value: 'instructor'},
                {title: 'OpenAI', value: 'openai'},
            ],
            distanceFunctions: [
                {title: 'Cosine similarity', value: 'cosine'},
                {title: 'Inner product', value: 'ip'},
                {title: 'Squared L2', value: 'l2'},
            ],
            devices: [
                {title: 'CPU', value: 'cpu'},
                {title: 'CUDA', value: 'cuda'},
            ],
        }
    },
    methods: {
        setPresetChanged(presetName) {
            // this ensures that the change gets saved
            this.config.embeddings[presetName].changed = true;
        },
        deleteCustomPreset(presetName) {
            this.selected = [];
            delete this.config.embeddings[presetName];
        },
        addNewPreset() {
            this.newPreset = {
                embeddings: 'sentence-transformer',
                model: '',
                custom: true,
                trust_remote_code: false,
                device: 'cpu',
                distance: 1.5,
                distance_mod: 1,
                distance_function: 'l2',
                fast: true,
                gpu_recommendation: false,
                local: true,
                changed: true,
            }
        },
        commitNewPreset() {
            this.$refs.formNewPreset.validate();
            if(!this.formNewPresetValid) {
                return;
            }
            this.config.embeddings[this.newPreset.model] = this.newPreset;
            this.selected = [this.newPreset.model];
            this.newPreset = null;
        },
        toLabel(key) {
            return key.replace(/_/g, ' ').replace(/\b\w/g, l => l.toUpperCase());
        },
    },
 }
 </script>
--- a/talemate_frontend/src/components/AppConfigPresetsInference.vue
+++ b/talemate_frontend/src/components/AppConfigPresetsInference.vue
@ -0,0 +1,131 @@
 <template>
    <v-alert density="compact" type="warning" variant="text">
        <p>
            This interface is a work in progress and right now serves as a very basic way to edit inference parameter presets.
        </p>
        <p class="text-caption text-grey">
            Not all clients support all parameters, and generally it is assumed that the client implementation
            handles the parameters in a sane way, especially if values are passed for all of them. <span class="text-primary">All presets are used</span> and will be selected depending on the action the agent is performing. If you don't know what these mean, it is recommended to leave them as they are.
        </p>
    </v-alert>
    <v-row>
        <v-col cols="4">
            <!-- list with all presets by key, read from `config` -->
            <v-list slim selectable v-model:selected="selected" color="primary">
                <v-list-item v-for="(preset, preset_key) in config.inference" :key="preset_key" :value="preset_key" prepend-icon="mdi-tune">
                    <v-list-item-title>{{ toLabel(preset_key) }}</v-list-item-title>
                </v-list-item>
            </v-list>
        </v-col>
        <v-col cols="8">
            <!--
            class InferenceParameters(BaseModel):
                temperature: float = 1.0
                temperature_last: bool = True
                top_p: float | None = 1.0
                top_k: int | None = 0
                min_p: float | None = 0.1
                presence_penalty: float | None = 0.2
                frequency_penalty: float | None = 0.2
                repetition_penalty: float | None= 1.1
                repetition_penalty_range: int | None = 1024
            Display editable form for the selected preset
            Will use sliders for float and int values, and checkboxes for bool values
            -->
            <div v-if="selected.length === 1">
                <v-form>
                    <v-card>
                        <v-card-title>
                            <v-row no-gutters>
                                <v-col cols="8">
                                    {{ toLabel(selected[0]) }}
                                </v-col>
                                <v-col cols="4" class="text-right">
                                    <v-btn variant="text" size="small" color="warning" prepend-icon="mdi-refresh" @click="config.inference[selected[0]] = {...immutableConfig.presets.inference_defaults[selected[0]]}">Reset</v-btn>
                                </v-col>
                            </v-row>
                        </v-card-title>
                        <v-card-text>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].temperature" min="0.1" max="2.0" step="0.05" label="Temperature" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].top_p" min="0.1" max="1.0" step="0.05" label="Top P" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].top_k" min="0" max="1024" step="1" label="Top K" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].min_p" min="0" max="1.0" step="0.01" label="Min P" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].presence_penalty" min="0" max="1.0" step="0.01" label="Presence Penalty" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].frequency_penalty" min="0" max="1.0" step="0.01" label="Frequency Penalty" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].repetition_penalty" min="1.0" max="1.20" step="0.01" label="Repetition Penalty" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-slider thumb-label="always" density="compact" v-model="config.inference[selected[0]].repetition_penalty_range" min="0" max="4096" step="256" label="Repetition Penalty Range" @update:model-value="setPresetChanged(selected[0])"></v-slider>
                            <v-checkbox density="compact" v-model="config.inference[selected[0]].temperature_last" label="Sample temperature last" @update:model-value="setPresetChanged(selected[0])"></v-checkbox>
                        </v-card-text>
                    </v-card>
                </v-form>
            </div>
            <div v-else>
                <v-alert color="grey" variant="text">Select a preset to edit</v-alert>
            </div>
        </v-col>
    </v-row>
 </template>
 <script>
 export default {
    name: 'AppConfigPresets',
    components: {
    },
    props: {
        immutableConfig: Object,
    },
    watch: {
        immutableConfig: {
            handler: function(newVal) {
                if(!newVal) {
                    this.config = {};
                    return;
                }
                this.config = {...newVal.presets};
            },
            immediate: true,
            deep: true,
        },
    },
    emits: [
        'update',
    ],
    data() {
        return {
            selected: [],
            config: {
                inference: {},
            },
        }
    },
    methods: {
        setPresetChanged(presetName) {
            // this ensures that the change gets saved
            this.config.inference[presetName].changed = true;
        },
        toLabel(key) {
            return key.replace(/_/g, ' ').replace(/\b\w/g, l => l.toUpperCase());
        },
    },
 }
 </script>
--- a/talemate_frontend/src/components/DebugToolMemoryRequestLog.vue
+++ b/talemate_frontend/src/components/DebugToolMemoryRequestLog.vue
@ -0,0 +1,114 @@
 <template>
    <v-card class="ma-4">
        <v-card-text class="text-muted text-caption">
            Inspect the requests for memory retrieval.
        </v-card-text>
    </v-card>
    <v-list-item density="compact">
        <v-list-item-title>
            <v-chip size="x-small" color="primary">Max. {{ max_memory_requests }}</v-chip>
            <v-btn color="delete" class="ml-2" variant="text" size="small" @click="clearMemoryRequests" prepend-icon="mdi-close">Clear</v-btn>
            <v-slider density="compact" v-model="max_memory_requests" min="1" hide-details max="250" step="1" color="primary"></v-slider>
        </v-list-item-title>
    </v-list-item>
    <v-list-item v-for="(memory_request, index) in memory_requests" :key="index" @click="openMemoryRequestView(index)">
        <div class="ml-2 mr-2 text-muted text-caption font-italic">
        {{ memory_request.query }}
        </div>
        <v-list-item-subtitle>
            <!-- matches or not matches ?-->
            <v-chip size="x-small" class="mr-1" :color="memory_request.success ? 'success' : 'warning'" variant="text" label>{{ memory_request.accepted_results.length+" / "+memory_request.results.length+ " matches"}}</v-chip>
            <!-- closest distance -->
            <v-chip size="x-small" class="mr-1" color="info" variant="text" label>{{ to2Decimals(memory_request.closest_distance) }} - {{ to2Decimals(memory_request.furthest_distance) }}, {{ to2Decimals(memory_request.max_distance) }}
                <v-icon size="14" class="ml-1">mdi-flag-checkered</v-icon>
            </v-chip>
            <!-- duration -->
            <v-chip size="x-small" class="mr-1" color="grey-darken-1" variant="text" label>{{ memory_request.duration }}s<v-icon size="14" class="ml-1">mdi-clock</v-icon></v-chip>
        </v-list-item-subtitle>
        <v-divider class="mt-1" v-if="memory_request.new_agent_activity"></v-divider>
    </v-list-item>
    <DebugToolMemoryRequestView :memory_requests="memory_requests" ref="memory_requestView" />
 </template>
 <script>
 import DebugToolMemoryRequestView from './DebugToolMemoryRequestView.vue';
 export default {
    name: 'DebugToolMemoryRequestLog',
    data() {
        return {
            memory_requests: [],
            total: 1,
            max_memory_requests: 50,
        }
    },
    components: {
        DebugToolMemoryRequestView,
    },
    inject: [
        'getWebsocket', 
        'registerMessageHandler',
        'unregisterMessageHandler',
        'setWaitingForInput',
    ],
    methods: {
        to2Decimals(num) {
            return Math.round(num * 100) / 100;
        },
        clearMemoryRequests() {
            this.memory_requests = [];
            this.total = 0;
        },
        handleMessage(data) {
            if(data.type === "system"&& data.id === "scene.loaded") {
                this.memory_requests = [];
                this.total = 0;
                return;
            }
            if(data.type === "memory_request") {
                let memoryRequest = {...data.data}
                console.log({memoryRequest, meta: data.meta})
                memoryRequest.success = memoryRequest.accepted_results.length > 0;
                memoryRequest.agent_stack_uid = data.meta.agent_stack_uid;
                // if data.meta.agent_stack_uid is different from the previous
                // then set new_agent_activity to true
                memoryRequest.new_agent_activity = this.memory_requests.length === 0 || this.memory_requests[0].agent_stack_uid !== data.meta.agent_stack_uid;
                memoryRequest.duration = Math.round(data.meta.duration * 100) / 100;
                this.memory_requests.unshift(memoryRequest)
                while(this.memory_requests.length > this.max_memory_requests) {
                    this.memory_requests.pop();
                }
            }
        },
        openMemoryRequestView(index) {
            this.$refs.memory_requestView.open(index);
        }
    },
    mounted() {
        this.registerMessageHandler(this.handleMessage);
    },
    unmounted() {
        this.unregisterMessageHandler(this.handleMessage);
    }
 }
 </script>
--- a/talemate_frontend/src/components/DebugToolMemoryRequestView.vue
+++ b/talemate_frontend/src/components/DebugToolMemoryRequestView.vue
@ -0,0 +1,93 @@
 <template>
    <v-dialog v-model="show" max-width="800">
        <v-card v-if="memory_request !== null">
            <v-card-title>
                Memory Request
                <!-- matches or not matches ?-->
                <v-chip size="x-small" class="mr-1" :color="memory_request.success ? 'success' : 'warning'" variant="text" label>{{ memory_request.accepted_results.length+" / "+memory_request.results.length+ " matches"}}</v-chip>
                <!-- closest distance -->
                <v-chip size="x-small" class="mr-1" color="info" variant="text" label>{{ to2Decimals(memory_request.closest_distance) }} - {{ to2Decimals(memory_request.furthest_distance) }}, {{ to2Decimals(memory_request.max_distance) }}
                    <v-icon size="14" class="ml-1">mdi-flag-checkered</v-icon>
                </v-chip>
                <!-- duration -->
                <v-chip size="x-small" class="mr-1" color="grey-darken-1" variant="text" label>{{ memory_request.duration }}s<v-icon size="14" class="ml-1">mdi-clock</v-icon></v-chip>
                <!-- toggle truncateLongText -->
                <v-chip size="x-small" class="mr-1" color="primary" variant="text" @click="truncateLongText = !truncateLongText" label>
                    Truncate
                    <v-icon size="14" class="ml-1">{{ truncateLongText ? 'mdi-check-circle-outline' : 'mdi-circle-outline' }}</v-icon>
                </v-chip>
            </v-card-title>
                <v-card-text>
                    <div class="font-italic text-muted">
                        {{ truncateText(memory_request.query) }}
                    </div>
                    <v-table>
                    <thead>
                        <tr>
                            <th>Doc</th>
                            <th class="text-right">Distance</th>
                        </tr>
                    </thead>
                    <tbody>
                        <tr v-for="(result, index) in memory_request.results" :key="index">
                            <td>
                                <div :class="result.distance <= memory_request.max_distance ? '' : 'text-grey'">
                                    {{ truncateText(result.doc) }}
                                </div>
                                <div>
                                    <v-chip v-for="(meta, key) in result.meta" :key="key" size="x-small" class="mr-1" color="muted" variant="text" label>{{ key }}: {{ meta }}</v-chip>
                                </div>
                            </td>
                            <td class="text-right"><span :class="result.distance <= memory_request.max_distance ? 'text-success': 'text-warning'">{{ to2Decimals(result.distance) }}</span></td>
                        </tr>
                    </tbody>
                </v-table>
            </v-card-text>
        </v-card>
    </v-dialog>
 </template>
 <script>
 export default {
    name: 'DebugToolMemoryRequestView',
    props: {
        memory_requests: Object,
    },
    data() {
        return {
            show: false,
            selected: null,
            memory_request: null,
            truncateLongText: true,
        }
    },
    methods: {
        to2Decimals(num) {
            return Math.round(num * 100) / 100;
        },
        open(index) {
            this.select(index);
            this.show = true;
        },
        select(index) {
            this.selected = index;
            this.memory_request = this.memory_requests[index];
        },
        truncateText(text) {
            if(text.length > 255 && this.truncateLongText) {
                return text.substring(0, 255) + "...";
            }
            return text;
        }
    }
 }
 </script>
 <style scoped>
 </style>
--- a/talemate_frontend/src/components/DebugToolPromptLog.vue
+++ b/talemate_frontend/src/components/DebugToolPromptLog.vue
@ -1,11 +1,17 @@
 <template>
-    <v-list-subheader class="text-uppercase"><v-icon>mdi-post-outline</v-icon> Prompts
+
-        <v-chip size="x-small" color="primary">{{ max_prompts }}</v-chip>
+    <v-card class="ma-4">
-        <v-icon color="primary" class="ml-2" @click="clearPrompts">mdi-close</v-icon>
+        <v-card-text class="text-muted text-caption">
-    </v-list-subheader>
+            Inspect the prompts and responses generated by the AI.
        </v-card-text>
    </v-card>
    <v-list-item density="compact">
-        <v-slider density="compact" v-model="max_prompts" min="1" hide-details max="250" step="1" color="primary"></v-slider>
+        <v-list-item-title>
            <v-chip size="x-small" color="primary">Max. {{ max_prompts }}</v-chip>
            <v-btn color="delete" class="ml-2" variant="text" size="small" @click="clearPrompts" prepend-icon="mdi-close">Clear</v-btn>
            <v-slider density="compact" v-model="max_prompts" min="1" hide-details max="250" step="1" color="primary"></v-slider>
        </v-list-item-title>
    </v-list-item>
    <v-list-item v-for="(prompt, index) in prompts" :key="index" @click="openPromptView(prompt)">
--- a/talemate_frontend/src/components/DebugTools.vue
+++ b/talemate_frontend/src/components/DebugTools.vue
@ -7,18 +7,36 @@
    <v-list-item>
        <v-btn @click="openGameState" prepend-icon="mdi-card-search-outline" color="primary" variant="tonal">Game State</v-btn>
    </v-list-item>
-    <DebugToolPromptLog ref="promptLog"/>
+
    <v-tabs v-model="tab" color="primary">
        <v-tab v-for="tab in tabs" :key="tab.value" :value="tab.value">
            <template v-slot:prepend>
                <v-icon>{{ tab.icon }}</v-icon>
            </template>
            {{ tab.text }}
        </v-tab>
    </v-tabs>
    <v-window v-model="tab">
        <v-window-item value="prompts">
            <DebugToolPromptLog ref="promptLog"/>
        </v-window-item>
        <v-window-item value="memory_requests">
            <DebugToolMemoryRequestLog ref="memoryRequestLog"/>
        </v-window-item>
    </v-window>
    <DebugToolGameState ref="gameState"/>
 </template>
 <script>
 import DebugToolPromptLog from './DebugToolPromptLog.vue';
 import DebugToolGameState from './DebugToolGameState.vue';
 import DebugToolMemoryRequestLog from './DebugToolMemoryRequestLog.vue';
 export default {
    name: 'DebugTools',
    components: {
        DebugToolPromptLog,
        DebugToolMemoryRequestLog,
        DebugToolGameState,
    },
    data() {
@ -26,6 +44,11 @@ export default {
            expanded: false,
            log_socket_messages: false,
            filter_socket_messages: null,
            tab: "prompts",
            tabs: [ 
                { value: "prompts", text: "Prompts", icon: "mdi-post-outline" },
                { value: "memory_requests", text: "Memory", icon: "mdi-memory" },
            ]
        }
    },
--- a/talemate_frontend/src/components/SceneTools.vue
+++ b/talemate_frontend/src/components/SceneTools.vue
@ -50,6 +50,17 @@
                <v-icon class="ml-1 mr-3" v-else-if="isWaitingForInput()">mdi-keyboard</v-icon>
                <v-icon class="ml-1 mr-3" v-else>mdi-circle-outline</v-icon>
                <v-tooltip v-if="!isWaitingForInput()" location="top"
                    text="Interrupt the current generation(s)"
                    class="pre-wrap"
                    max-width="300px">
                    <template v-slot:activator="{ props }">
                        <v-btn class="hotkey mr-3" v-bind="props"
                            @click="interruptScene" color="primary" icon>
                            <v-icon>mdi-stop-circle-outline</v-icon>
                        </v-btn>
                    </template>
                </v-tooltip>
                <v-divider vertical></v-divider>
@ -688,6 +699,11 @@ export default {
            this.getWebsocket().send(JSON.stringify({ type: 'interact', text: `!acdlg:${this.messageInput}` }));
        },
        interruptScene() {
            this.getWebsocket().send(JSON.stringify({ type: 'interrupt' }));
        },
        handleMessage(data) {
            if (data.type === "command_status") {
--- a/talemate_frontend/src/components/StatusNotification.vue
+++ b/talemate_frontend/src/components/StatusNotification.vue
@ -26,7 +26,7 @@ export default {
                case 'busy':
                    return -1;
                case 'error':
-                    return 5000;
+                    return 8000;
                default:
                    return 3000;
            }
--- a/talemate_frontend/src/components/TalemateApp.vue
+++ b/talemate_frontend/src/components/TalemateApp.vue
@ -117,16 +117,10 @@
        </v-alert>
        <v-list>
-          <v-list-subheader class="text-uppercase"><v-icon>mdi-network-outline</v-icon>
+          <AIClient ref="aiClient" @save="saveClients" @error="uxErrorHandler" @clients-updated="saveClients" @client-assigned="saveAgents" @open-app-config="openAppConfig"></AIClient>
            Clients</v-list-subheader>
          <v-list-item>
            <AIClient ref="aiClient" @save="saveClients" @error="uxErrorHandler" @clients-updated="saveClients" @client-assigned="saveAgents" @open-app-config="openAppConfig"></AIClient>
          </v-list-item>
          <v-divider></v-divider>
          <v-list-subheader class="text-uppercase"><v-icon>mdi-transit-connection-variant</v-icon> Agents</v-list-subheader>
-          <v-list-item>
+          <AIAgent ref="aiAgent" @save="saveAgents" @agents-updated="saveAgents"></AIAgent>
            <AIAgent ref="aiAgent" @save="saveAgents" @agents-updated="saveAgents"></AIAgent>
          </v-list-item>
          <!-- More sections can be added here -->
        </v-list>
      </v-navigation-drawer>
@ -222,7 +216,7 @@
      </v-container>
    </v-main>
-    <AppConfig ref="appConfig" />
+    <AppConfig ref="appConfig" :agentStatus="agentStatus" :sceneActive="sceneActive" />
    <v-snackbar v-model="errorNotification" color="red-darken-1" :timeout="3000">
        {{ errorMessage }}
    </v-snackbar>
@ -639,6 +633,7 @@ export default {
        label: data.message,
        // active - has the agent been active in the last 5 seconds?
        recentlyActive: recentlyActive,
        details: data.client,
      }
      if(recentlyActive && !busy) {
@ -1026,6 +1021,7 @@ export default {
    toLabel(value) {
        return value.replace(/[_-]/g, ' ').replace(/\b\w/g, l => l.toUpperCase());
    },
  }
 }
 </script>
--- a/talemate_frontend/src/components/WorldStateManagerCharacterActor.vue
+++ b/talemate_frontend/src/components/WorldStateManagerCharacterActor.vue
@ -28,8 +28,8 @@
                :disabled="dialogueInstructionsBusy"
                placeholder="speak less formally, use more contractions, and be more casual." 
                v-model="dialogueInstructions" label="Acting Instructions" 
-                :color="dialogueInstructionsDirty ? 'primary' : null"
+                :color="dialogueInstructionsDirty ? 'info' : null"
-                @update:model-value="queueUpdateCharacterActor"
+                @update:model-value="queueUpdateCharacterActor()"
                rows="3" 
                auto-grow></v-textarea>
                <v-alert icon="mdi-bullhorn" density="compact" variant="text" color="grey">
@ -57,7 +57,7 @@
                    :character="character.name"
                    :rewrite-enabled="false"
                    :generation-options="generationOptions"
-                    @generate="content => { dialogueExamples.push(content); queueUpdateCharacterActor(); }"
+                    @generate="content => { dialogueExamples.push(content); queueUpdateCharacterActor(500); }"
                />
@ -113,6 +113,15 @@ export default {
            return `Automatically generate dialogue instructions for ${this.character.name}, based on their attributes and description`;
        }
    },
    watch: {
        character: {
            handler() {
                this.dialogueInstructions = this.character.actor.dialogue_instructions;
                this.dialogueExamples = this.character.actor.dialogue_examples;
            },
            deep: true
        }
    },
    props: {
        character: Object,
        templates: Object,
@ -124,12 +133,12 @@ export default {
    inject: ['getWebsocket', 'registerMessageHandler'],
    methods: {
-        queueUpdateCharacterActor() {
+        queueUpdateCharacterActor(delay = 1500) {
            this.dialogueInstructionsDirty = true;
            if (this.updateCharacterActorTimeout) {
                clearTimeout(this.updateCharacterActorTimeout);
            }
-            this.updateCharacterActorTimeout = setTimeout(this.updateCharacterActor, 500);
+            this.updateCharacterActorTimeout = setTimeout(this.updateCharacterActor, delay);
        },
        updateCharacterActor() {
--- a/talemate_frontend/src/components/WorldStateManagerCharacterAttributes.vue
+++ b/talemate_frontend/src/components/WorldStateManagerCharacterAttributes.vue
@ -70,10 +70,11 @@
                <v-textarea ref="attribute" rows="5" auto-grow
                    :label="selected"
-                    :color="dirty ? 'info' : ''"
+                    :color="dirty ? 'dirty' : ''"
                    :disabled="busy"
                    :loading="busy"
                    :hint="autocompleteInfoMessage(busy)"
                    @keyup.ctrl.enter.stop="sendAutocompleteRequest"
@ -253,7 +254,7 @@ export default {
            }
        },
-        queueUpdate(name) {
+        queueUpdate(name, delay = 1500) {
            if (this.updateTimeout !== null) {
                clearTimeout(this.updateTimeout);
            }
@ -262,7 +263,7 @@ export default {
            this.updateTimeout = setTimeout(() => {
                this.update(name);
-            }, 500);
+            }, delay);
        },
        update(name) {
--- a/talemate_frontend/src/components/WorldStateManagerCharacterDescription.vue
+++ b/talemate_frontend/src/components/WorldStateManagerCharacterDescription.vue
@ -12,13 +12,13 @@
    />
    <v-textarea ref="description" rows="5" auto-grow v-model="character.description"
-        :color="dirty ? 'info' : ''"
+        :color="dirty ? 'dirty' : ''"
        :disabled="busy"
        :loading="busy"
        @keyup.ctrl.enter.stop="sendAutocompleteRequest"
-        @update:model-value="queueUpdate"
+        @update:model-value="queueUpdate()"
        label="Description"
        :hint="'A short description of the character. '+autocompleteInfoMessage(busy)">
    </v-textarea>
@ -75,7 +75,7 @@ export default {
        }
    },
    methods: {
-        queueUpdate() {
+        queueUpdate(delay = 1500) {
            if (this.updateTimeout !== null) {
                clearTimeout(this.updateTimeout);
            }
@ -84,7 +84,7 @@ export default {
            this.updateTimeout = setTimeout(() => {
                this.update();
-            }, 500);
+            }, delay);
        },
        update() {
            this.getWebsocket().send(JSON.stringify({
--- a/talemate_frontend/src/components/WorldStateManagerCharacterDetails.vue
+++ b/talemate_frontend/src/components/WorldStateManagerCharacterDetails.vue
@ -68,7 +68,8 @@
                <v-textarea rows="5" max-rows="18" auto-grow
                    ref="detail"
-                    :color="dirty ? 'info' : ''"
+                    :label="selected"
                    :color="dirty ? 'dirty' : ''"
                    :disabled="busy"
                    :loading="busy"
@ -77,7 +78,7 @@
                    @keyup.ctrl.enter.stop="sendAutocompleteRequest"
                    @update:modelValue="queueUpdate(selected)"
-                    :label="selected"
+
                    v-model="character.details[selected]">
                </v-textarea>
@ -269,7 +270,7 @@ export default {
            }
        },
-        queueUpdate(name) {
+        queueUpdate(name, delay = 1500) {
            if (this.updateTimeout !== null) {
                clearTimeout(this.updateTimeout);
            }
@ -278,7 +279,7 @@ export default {
            this.updateTimeout = setTimeout(() => {
                this.update(name);
-            }, 500);
+            }, delay);
        },
        update(name) {
--- a/talemate_frontend/src/components/WorldStateManagerCharacterReinforcements.vue
+++ b/talemate_frontend/src/components/WorldStateManagerCharacterReinforcements.vue
@ -59,7 +59,8 @@
                    :disabled="working"
                    v-model="character.reinforcements[selected].answer"
                    @update:modelValue="queueUpdate(selected)"
-                    :color="dirty ? 'info' : ''"></v-textarea>
+                    :color="dirty ? 'dirty' : ''">
                </v-textarea>
                <v-row>
                    <v-col cols="6">
@ -70,7 +71,7 @@
                            :disabled="working"
                            class="mb-2"
                            @update:modelValue="queueUpdate(selected)"
-                            :color="dirty ? 'info' : ''"></v-text-field>
+                            :color="dirty ? 'dirty' : ''"></v-text-field>
                    </v-col>
                    <v-col cols="6">
                        <v-select
@ -81,7 +82,7 @@
                            class="mr-1 mb-1" variant="underlined"
                            density="compact"
                            @update:modelValue="queueUpdate(selected)"
-                            :color="dirty ? 'info' : ''">
+                            :color="dirty ? 'dirty' : ''">
                        </v-select>
                    </v-col>
                </v-row>
@ -93,7 +94,7 @@
                    v-model="character.reinforcements[selected].instructions"
                    @update:modelValue="queueUpdate(selected)"
                    :disabled="working"
-                    :color="dirty ? 'info' : ''"
+                    :color="dirty ? 'dirty' : ''"
                    ></v-textarea>
                <v-row>
@ -332,7 +333,7 @@ export default {
            this.character.reinforcements[name] = {...this.newReinforcment};
        },
-        queueUpdate(name) {
+        queueUpdate(name, delay = 1500) {
            if (this.updateTimeout !== null) {
                clearTimeout(this.updateTimeout);
            }
@ -341,7 +342,7 @@ export default {
            this.updateTimeout = setTimeout(() => {
                this.update(name);
-            }, 500);
+            }, delay);
        },
        update(name, updateState) {
--- a/talemate_frontend/src/components/WorldStateManagerContextDB.vue
+++ b/talemate_frontend/src/components/WorldStateManagerContextDB.vue
@ -127,8 +127,10 @@
                            </td>
                            <td>
                                <v-textarea rows="1" auto-grow density="compact" hide-details
-                                    :color="entry.dirty ? 'info' : ''" v-model="entry.text"
+                                    :color="entry.dirty ? 'dirty' : ''" v-model="entry.text"
-                                    @update:model-value="queueUodate(entry)"></v-textarea>
+                                    @update:model-value="queueUodate(entry)"
                                    >
                                </v-textarea>
                            </td>
                            <td class="text-center">
                                <v-tooltip :text="entryHasPin(entry.id) ? 'Manage pin' : 'Add pin'">
@ -303,7 +305,7 @@ export default {
            delete this.newEntry.meta[name];
        },
-        queueUodate(entry) {
+        queueUodate(entry, delay = 1500) {
            if (this.updateTimeout !== null) {
                clearTimeout(this.updateTimeout);
            }
@ -313,7 +315,7 @@ export default {
            this.updateTimeout = setTimeout(() => {
                this.update(entry);
                entry.dirty = false;
-            }, 500);
+            }, delay);
        },
        update(entry) {
@ -380,7 +382,7 @@ export default {
            }
            else if (message.action === 'context_db_updated') {
                this.$emit('request-sync')
-                this.load(message.data.id);
+                //this.load(message.data.id);
            }
            else if (message.action === 'context_db_deleted') {
                let entry_id = message.data.id;
--- a/talemate_frontend/src/components/WorldStateManagerSceneOutline.vue
+++ b/talemate_frontend/src/components/WorldStateManagerSceneOutline.vue
@ -6,7 +6,7 @@
                    v-model="scene.data.title"
                    label="Title"
                    hint="The title of the scene. This will be displayed to the user when they play the scene."
-                    :color="dirty['title'] ? 'primary' : ''"
+                    :color="dirty['title'] ? 'dirty' : ''"
                    :disabled="busy['title']"
                    :loading="busy['title']"
                    @update:model-value="queueUpdate('title')"
@ -64,7 +64,7 @@
                    max-rows="32"
                    @update:model-value="queueUpdate('intro')"
-                    :color="dirty['intro'] ? 'primary' : ''"
+                    :color="dirty['intro'] ? 'dirty' : ''"
                    :disabled="busy['intro']"
                    :loading="busy['intro']"
@ -148,7 +148,7 @@ export default {
            this.queueUpdate('intro');
        },
-        queueUpdate(name) {
+        queueUpdate(name, delay = 1500) {
            if (this.updateTimeout !== null) {
                clearTimeout(this.updateTimeout);
            }
@ -157,7 +157,7 @@ export default {
            this.updateTimeout = setTimeout(() => {
                this.update();
-            }, 500);
+            }, delay);
        },
        update() {
--- a/talemate_frontend/src/components/WorldStateManagerTemplates.vue
+++ b/talemate_frontend/src/components/WorldStateManagerTemplates.vue
@ -127,22 +127,22 @@
                            :rules="[v => !!v || 'Query is required']"
                            required
                            hint="Available template variables: {character_name}, {player_name}" 
-                            :color="dirty ? 'info' : ''"
+                            :color="dirty ? 'dirty' : ''"
-                            @update:model-value="queueSaveTemplate">
+                            @update:model-value="queueSaveTemplate()">
                            </v-text-field>
                            <v-text-field v-model="template.description" 
                            hint="A short description of what this state is for."
-                            :color="dirty ? 'info' : ''"
+                            :color="dirty ? 'dirty' : ''"
-                            @update:model-value="queueSaveTemplate"
+                            @update:model-value="queueSaveTemplate()"
                            label="Description"></v-text-field>
                            <v-row>
                                <v-col cols="12" lg="4">
                                    <v-select v-model="template.state_type"
                                    :items="stateTypes"
-                                    :color="dirty ? 'info' : ''"
+                                    :color="dirty ? 'dirty' : ''"
-                                    @update:model-value="queueSaveTemplate"
+                                    @update:model-value="queueSaveTemplate()"
                                    hint="What type of character / object is this state for?"
                                    label="State type">
                                    </v-select>
@ -151,8 +151,8 @@
                                    <v-select 
                                    v-model="template.insert" 
                                    :items="insertionModes"
-                                    :color="dirty ? 'info' : ''"
+                                    :color="dirty ? 'dirty' : ''"
-                                    @update:model-value="queueSaveTemplate"
+                                    @update:model-value="queueSaveTemplate()"
                                    label="Context Attachment Method">
                                    </v-select>
                                </v-col>
@ -166,8 +166,8 @@
                                v-model="template.instructions"
                                label="Additional instructions to the AI for generating this state."
                                hint="Available template variables: {character_name}, {player_name}" 
-                                :color="dirty ? 'info' : ''"
+                                :color="dirty ? 'dirty' : ''"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                auto-grow
                                rows="3">
                            </v-textarea>
@ -176,12 +176,12 @@
                            <v-checkbox 
                            v-model="template.auto_create" 
                            label="Automatically create" 
-                            @update:model-value="queueSaveTemplate"
+                            @update:model-value="queueSaveTemplate()"
                            messages="Automatically create instances of this template for new games / characters."></v-checkbox>
                            <v-checkbox 
                            v-model="template.favorite" 
                            label="Favorite" 
-                            @update:model-value="queueSaveTemplate"
+                            @update:model-value="queueSaveTemplate()"
                            messages="Favorited templates will be available for quick setup."></v-checkbox>
                        </v-col>
@ -198,8 +198,8 @@
                                v-model="template.attribute" 
                                label="Attribute name" 
                                :rules="[v => !!v || 'Name is required']"
-                                :color="dirty ? 'info' : ''"
+                                :color="dirty ? 'dirty' : ''"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                required>
                            </v-text-field>
@ -207,7 +207,7 @@
                                v-model="template.priority" 
                                :items="attributePriorities"
                                label="Priority"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                hint="How important is this attribute for the generation of the other attributes?"
                                messages="Higher priority attributes will be generated first.">
                            </v-select>
@ -215,14 +215,14 @@
                            <v-text-field 
                                v-model="template.description" 
                                label="Template description" 
-                                :color="dirty ? 'info' : ''"
+                                :color="dirty ? 'dirty' : ''"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                required>
                            </v-text-field>
                            <v-textarea 
                                v-model="template.instructions"
-                                :color="dirty ? 'info' : ''"
+                                :color="dirty ? 'dirty' : ''"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                auto-grow rows="5" 
                                label="Additional instructions to the AI for generating this character attribute."
                                hint="Available template variables: {character_name}, {player_name}" 
@ -232,21 +232,21 @@
                            <v-checkbox 
                                v-model="template.supports_spice" 
                                label="Supports spice" 
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                hint="When an attribute supports spice, there is a small chance that the AI will apply a random generation affector to push the attribute in a potentially unexpected direction."
                                messages="Randomly spice up this attribute during generation.">
                            </v-checkbox>
                            <v-checkbox
                                v-model="template.supports_style"
                                label="Supports writing style flavoring"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                hint="When an attribute supports style, the AI will attempt to generate the attribute in a way that matches a selected writing style."
                                messages="Generate this attribute in a way that matches a selected writing style.">
                            </v-checkbox>
                            <v-checkbox 
                                v-model="template.favorite" 
                                label="Favorite" 
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                messages="Favorited templates will appear on the top of the list.">
                            </v-checkbox>
                        </v-col>
@ -260,22 +260,22 @@
                                v-model="template.detail" 
                                label="Question / Statement" 
                                :rules="[v => !!v || 'Name is required']"
-                                :color="dirty ? 'info' : ''"
+                                :color="dirty ? 'dirty' : ''"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                hint="Ideally phrased as a question, e.g. 'What is the character's favorite food?'. Available template variables: {character_name}, {player_name}"
                                required>
                            </v-text-field>
                            <v-text-field 
                                v-model="template.description" 
                                label="Template description" 
-                                :color="dirty ? 'info' : ''"
+                                :color="dirty ? 'dirty' : ''"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                required>
                            </v-text-field>
                            <v-textarea 
                                v-model="template.instructions"
-                                :color="dirty ? 'info' : ''"
+                                :color="dirty ? 'dirty' : ''"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                auto-grow rows="5" 
                                label="Additional instructions to the AI for generating this character detail."
                                hint="Available template variables: {character_name}, {player_name}" 
@ -285,21 +285,21 @@
                            <v-checkbox 
                                v-model="template.supports_spice" 
                                label="Supports spice" 
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                hint="When a detail supports spice, there is a small chance that the AI will apply a random generation affector to push the detail in a potentially unexpected direction."
                                messages="Randomly spice up this detail during generation.">
                            </v-checkbox>
                            <v-checkbox
                                v-model="template.supports_style"
                                label="Supports writing style flavoring"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                hint="When a detail supports style, the AI will attempt to generate the detail in a way that matches a selected writing style."
                                messages="Generate this detail in a way that matches a selected writing style.">
                            </v-checkbox>
                            <v-checkbox 
                                v-model="template.favorite" 
                                label="Favorite" 
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                messages="Favorited templates will appear on the top of the list.">
                            </v-checkbox>
                        </v-col>
@ -320,15 +320,15 @@
                            <v-text-field 
                                v-model="template.description" 
                                label="Template description" 
-                                :color="dirty ? 'info' : ''"
+                                :color="dirty ? 'dirty' : ''"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                required>
                            </v-text-field>
                            <v-textarea 
                                v-model="template.instructions"
-                                :color="dirty ? 'info' : ''"
+                                :color="dirty ? 'dirty' : ''"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                auto-grow rows="3" 
                                placeholder="Make it {spice}."
                                label="Additional instructions to the AI for applying the spice."
@ -353,8 +353,8 @@
                                                variant="underlined"
                                                density="compact"
                                                hide-details
-                                                :color="dirty ? 'info' : ''"
+                                                :color="dirty ? 'dirty' : ''"
-                                                @update:model-value="queueSaveTemplate">
+                                                @update:model-value="queueSaveTemplate()">
                                            </v-text-field>
                                        </v-list-item-title>
                                    </v-list-item>
@ -365,7 +365,7 @@
                                            label="New spice" 
                                            placeholder="Make it dark and gritty."
                                            hint="An instruction or label to push the generated content into a specific direction."
-                                            :color="dirty ? 'info' : ''"
+                                            :color="dirty ? 'dirty' : ''"
                                            @keydown.enter="addSpice">
                                            <template v-slot:append>
                                                <v-btn @click="addSpice" color="primary" icon>
@ -405,7 +405,7 @@
                            <v-checkbox 
                                v-model="template.favorite" 
                                label="Favorite" 
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                messages="Favorited spice collections will appear on the top of the list.">
                            </v-checkbox>
                        </v-col>
@ -427,21 +427,21 @@
                                v-model="template.name" 
                                label="Writing style name" 
                                :rules="[v => !!v || 'Name is required']"
-                                :color="dirty ? 'info' : ''"
+                                :color="dirty ? 'dirty' : ''"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                required>
                            </v-text-field>
                            <v-text-field 
                                v-model="template.description" 
                                label="Template description" 
-                                :color="dirty ? 'info' : ''"
+                                :color="dirty ? 'dirty' : ''"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                required>
                            </v-text-field>
                            <v-textarea 
                                v-model="template.instructions"
-                                :color="dirty ? 'info' : ''"
+                                :color="dirty ? 'dirty' : ''"
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                auto-grow rows="5" 
                                placeholder="Use a narrative writing style that reminds of mid 90s point and click adventure games."
                                label="Writing style instructions"
@ -452,7 +452,7 @@
                            <v-checkbox 
                                v-model="template.favorite" 
                                label="Favorite" 
-                                @update:model-value="queueSaveTemplate"
+                                @update:model-value="queueSaveTemplate()"
                                messages="Favorited writing styles will appear on the top of the list.">
                            </v-checkbox>
                        </v-col>
@ -709,7 +709,7 @@ export default {
        },
        // queue requests
-        queueSaveTemplate() {
+        queueSaveTemplate(delay = 1500) {
            if(!this.template || !this.template.uid) {
                return;
@ -723,10 +723,10 @@ export default {
            this.saveTimeout = setTimeout(() => {
                this.saveTemplate();
-            }, 1000);
+            }, delay);
        },
-        queueSaveGroup() {
+        queueSaveGroup(delay = 1500) {
            if(!this.group || !this.group.uid) {
                return;
@ -740,7 +740,7 @@ export default {
            this.saveTimeout = setTimeout(() => {
                this.saveTemplateGroup();
-            }, 1000);
+            }, delay);
        },
        // requests 
--- a/talemate_frontend/src/components/WorldStateManagerWorldEntries.vue
+++ b/talemate_frontend/src/components/WorldStateManagerWorldEntries.vue
@ -12,14 +12,14 @@
                :original="entry.text"
                :requires-instructions="true"
                :generation-options="generationOptions"
-                @generate="content => { entry.text=content; queueSave(); }"
+                @generate="content => { entry.text=content; queueSave(500); }"
            />
            <v-textarea 
                v-model="entry.text"
                label="World information"
                hint="Describe the world information here. This could be a description of a location, a historical event, or anything else that is relevant to the world." 
-                :color="dirty ? 'info' : ''"
+                :color="dirty ? 'dirty' : ''"
-                @update:model-value="queueSave"
+                @update:model-value="queueSave()"
                auto-grow
                max-rows="24"
                rows="5">
@ -31,7 +31,7 @@
                'This entry will only be included when pinned and never be included via automatic relevancy matching.' :
                'This entry may be included via automatic relevancy matching.'
            )"
-            @change="queueSave"></v-checkbox>
+            @change="queueSave(500)"></v-checkbox>
        </v-form>
        <v-card-actions v-if="isNewEntry">
@ -126,7 +126,7 @@ export default {
        },
    },
    methods: {
-        queueSave() {
+        queueSave(delay = 1500) {
            if(this.isNewEntry) {
                return;
@ -140,7 +140,7 @@ export default {
            this.timeout = setTimeout(() => {
                this.save();
-            }, 500);
+            }, delay);
        },
        save() {
--- a/talemate_frontend/src/components/WorldStateManagerWorldStates.vue
+++ b/talemate_frontend/src/components/WorldStateManagerWorldStates.vue
@ -21,8 +21,8 @@
                        :label="state.question"
                        :disabled="busy"
                        hint="You can leave this blank as it will be automatically generated. Or you can fill it in to start with a specific answer."
-                        :color="dirty ? 'info' : ''"
+                        :color="dirty ? 'dirty' : ''"
-                        @update:model-value="queueSave"
+                        @update:model-value="queueSave()"
                        max-rows="15"
                        auto-grow
                        rows="5">
@ -41,7 +41,8 @@
                    step="1" 
                    class="mb-2" 
                    :disabled="busy"
-                    @update:modelValue="queueSave" :color="dirty ? 'info' : ''">
+                    @update:modelValue="queueSave()" 
                    :color="dirty ? 'dirty' : ''">
                    </v-text-field>
                </v-col>
                <v-col cols="6" xl="3">
@ -51,7 +52,7 @@
                        class="mr-1 mb-1" 
                        :disabled="busy"
                        variant="underlined"  
-                        density="compact" @update:modelValue="queueSave" :color="dirty ? 'info' : ''">
+                        density="compact" @update:modelValue="save()" :color="dirty ? 'dirty' : ''">
                    </v-select>
                </v-col>
            </v-row>
@ -61,9 +62,9 @@
                    <v-textarea 
                        v-model="state.instructions"
                        label="Additional instructions to the AI for generating this state."
-                        :color="dirty ? 'info' : ''"
+                        :color="dirty ? 'dirty' : ''"
                        :disabled="busy"
-                        @update:model-value="queueSave"
+                        @update:model-value="queueSave()"
                        auto-grow
                        max-rows="5"
                        rows="3">
@ -163,8 +164,7 @@ export default {
        },
    },
    methods: {
-        queueSave() {
+        queueSave(delay = 1500) {
            if(this.isNewState) {
                return;
            }
@ -177,7 +177,7 @@ export default {
            this.timeout = setTimeout(() => {
                this.save();
-            }, 500);
+            }, delay);
        },
        save(updateState) {
@ -253,9 +253,6 @@ export default {
            if (message.type !== 'world_state_manager') {
                return;
            }
            console.log('message', message);
            if (message.action == 'world_state_reinforcement_set') {
                this.dirty = false;
                if(message.data.question === this.state.question) {
--- a/talemate_frontend/src/plugins/vuetify.js
+++ b/talemate_frontend/src/plugins/vuetify.js
@ -27,6 +27,7 @@ export default createVuetify({
          highlight3: colors.lightGreen.lighten3,
          highlight4: colors.red.lighten1,
          highlight5: colors.amber.lighten3,
          dirty: colors.orange.lighten2,
          // messages
          narrator: colors.deepPurple.lighten3,
--- a/update.bat
+++ b/update.bat
@ -1,5 +1,6 @@
@echo off
 echo Checking git repository...
 REM check if git repository is initialized and initialize if not
 if not exist .git (
 git init
@ -13,15 +14,35 @@ REM activate the virtual environment
 call talemate_env\Scripts\activate
 REM use poetry to install dependencies
 echo Updating virtual environment...
 python -m poetry install
-echo Virtual environment updated
+REM we use nvcc to check for CUDA availability
 REM if cuda exists: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
 nvcc --version >nul 2>&1
 IF ERRORLEVEL 1 (
    echo CUDA not found. Keeping PyTorch installation without CUDA support...
 ) ELSE (
    echo CUDA found. Installing PyTorch with CUDA support...
    REM uninstalling existing torch, torchvision, torchaudio
    python -m pip uninstall torch torchaudio -y
    python -m pip install torch~=2.4.1 torchaudio~=2.4.1 --index-url https://download.pytorch.org/whl/cu121
 )
 echo Virtual environment updated!
 REM updating npm packages
 echo Updating npm packages...
 cd talemate_frontend
-npm install
+call npm install
 cd ..
 echo NPM packages updated
-pause
+REM build the frontend
 echo Building frontend...
 call npm run build
 cd ..
 echo Update complete - You may close this window now.
 pause
`@ -2,4 +2,4 @@`

	`Manages long term memory via embeddings.`	`Manages long term memory via embeddings.`

	`Currently only supports [ChromaDB](/talemate/user-guide/agents/memory/chromadb) as a memory story.`	`Currently only supports ChromaDB as a backend, but support for additional backends is planned.`
		`@ -1 +1 @@`
			`start cmd /k "cd talemate_env\Scripts && activate && cd ../../ && python src\talemate\server\run.py runserver --host 0.0.0.0 --port 5050"`				`start cmd /k "cd talemate_env\Scripts && activate && cd ../../ && python src\talemate\server\run.py runserver --host 0.0.0.0 --port 5050 --backend-only"`
		`@ -0,0 +1,2 @@`
							`start cmd /k "cd talemate_frontend && npm run serve"`
							`start cmd /k "cd talemate_env\Scripts && activate && cd ../../ && python src\talemate\server\run.py runserver --host 0.0.0.0 --port 5050"`
`@ -1,2 +1 @@`
	`start cmd /k "cd talemate_frontend && npm run serve"`
	`start cmd /k "cd talemate_env\Scripts && activate && cd ../../ && python src\talemate\server\run.py runserver --host 0.0.0.0 --port 5050"`	`start cmd /k "cd talemate_env\Scripts && activate && cd ../../ && python src\talemate\server\run.py runserver --host 0.0.0.0 --port 5050"`