Make embedding batch size configurable (#742)

* Make embedding batch size configurable

* Address embedding batch size review nits
This commit is contained in:
Artyom Mezin 2026-04-19 21:37:42 +03:00 committed by GitHub
parent 6aabacfca6
commit 4efe613f69
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
4 changed files with 36 additions and 1 deletions

View file

@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
### Added
- `OPEN_NOTEBOOK_EMBEDDING_BATCH_SIZE` environment variable to override the embedding batch size; default remains `50`. Helps with CPU-only local embedding and stricter OpenAI-compatible endpoints (#735)
## [1.8.5] - 2026-04-14
### Changed

View file

@ -197,6 +197,7 @@ Configure Ollama in the Settings UI:
| `SURREAL_NAMESPACE` | Database namespace | `open_notebook` |
| `SURREAL_DATABASE` | Database name | `open_notebook` |
| `API_URL` | API external URL | `http://localhost:5055` |
| `OPEN_NOTEBOOK_EMBEDDING_BATCH_SIZE` | Override embedding batch size for stricter/local providers (recommended: `8` for CPU-only local setups) | `50` |
See [Environment Reference](../5-CONFIGURATION/environment-reference.md) for complete list.

View file

@ -61,6 +61,14 @@ Comprehensive list of all environment variables available in Open Notebook.
---
## Embeddings
| Variable | Required? | Default | Description |
|----------|-----------|---------|-------------|
| `OPEN_NOTEBOOK_EMBEDDING_BATCH_SIZE` | No | 50 | Number of texts sent per embedding batch. Lower this for CPU-only or stricter OpenAI-compatible embedding providers. |
---
## Text-to-Speech (TTS)
| Variable | Required? | Default | Description |

View file

@ -11,6 +11,7 @@ to ensure consistent behavior and proper handling of large content.
"""
import asyncio
import os
from typing import TYPE_CHECKING, List, Optional
import numpy as np
@ -19,7 +20,29 @@ from loguru import logger
from .chunking import CHUNK_SIZE, ContentType, chunk_text
from .token_utils import token_count
EMBEDDING_BATCH_SIZE = 50
def _get_embedding_batch_size() -> int:
"""
Read the embedding batch size from the environment.
This is intentionally configurable because provider limits vary widely, and
CPU-only local embedding endpoints often need smaller batches than cloud APIs.
"""
raw = os.getenv("OPEN_NOTEBOOK_EMBEDDING_BATCH_SIZE", "50").strip()
try:
value = int(raw)
if value < 1:
raise ValueError
return value
except ValueError:
logger.warning(
"Invalid OPEN_NOTEBOOK_EMBEDDING_BATCH_SIZE='{}'; falling back to 50",
raw,
)
return 50
EMBEDDING_BATCH_SIZE = _get_embedding_batch_size()
EMBEDDING_MAX_RETRIES = 3
EMBEDDING_RETRY_DELAY = 2 # seconds