Commit graph

13 commits

Author SHA1 Message Date
orihatav
d0bbe4a921 fix: handle tiktoken network errors in offline environments (issue #264)
In air-gapped / offline Docker deployments, tiktoken.get_encoding() tries
to download the encoding file from openaipublic.blob.core.windows.net.
When that request fails it raises a URLError / OSError — not an ImportError
— so the previous except clause silently missed it and the crash surfaced in
the UI.

Widened `except ImportError` to `except Exception` so all failures —
"not installed" and "network unreachable" — fall through to the word-count
fallback (words × 1.3). Added a loguru WARNING so operators can see when
the fallback is active.

TIKTOKEN_CACHE_DIR now reads from the environment with a blank-safe
fallback (`or` guard prevents os.makedirs("") on empty env var). This lets
Docker images redirect the cache to a path outside /app/data/ so user-data
volume mounts cannot shadow the pre-baked encoding.

Both images now pre-download the o200k_base encoding during the builder
stage (internet is available at build time) and copy it into the runtime
image at /app/tiktoken-cache. ENV TIKTOKEN_CACHE_DIR=/app/tiktoken-cache
is set in the runtime stage so no network call is ever needed at runtime.

Added test_token_count_network_error_fallback in tests/test_utils.py:
patches tiktoken.get_encoding with a URLError and asserts token_count()
returns a positive int instead of raising.

Fixes #264

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-10 19:45:14 -05:00
Luis Novo
aa593c60bd
feat: add persistent tiktoken cache to reduce re-downloads (#171)
Some checks are pending
Development Build / extract-version (push) Waiting to run
Development Build / test-build-regular (push) Blocked by required conditions
Development Build / test-build-single (push) Blocked by required conditions
Development Build / summary (push) Blocked by required conditions
Configure tiktoken to cache tokenizer encodings in ./data/tiktoken-cache
instead of using system temp directory. This prevents re-downloading
encoding files on every container restart and improves startup time.

Changes:
- Add TIKTOKEN_CACHE_DIR configuration in config.py
- Set TIKTOKEN_CACHE_DIR environment variable in token_utils.py
- Bump version to 1.0.7
2025-10-19 14:50:52 -03:00
Luis Novo
d7b0fff954
Api podcast migration (#93)
Creates the API layer for Open Notebook
Creates a services API gateway for the Streamlit front-end
Migrates the SurrealDB SDK to the official one
Change all database calls to async
New podcast framework supporting multiple speaker configurations
Implement the surreal-commands library for async processing
Improve docker image and docker-compose configurations
2025-07-17 08:36:11 -03:00
LUIS NOVO
4a5d47d934 refactor transformation, add graph and admin 2024-11-18 22:01:11 -03:00
LUIS NOVO
e589c7b8aa cleanup 2024-11-08 18:30:56 -03:00
LUIS NOVO
feabfaed01 remove defaultmodel from config file 2024-11-01 19:56:27 -03:00
LUIS NOVO
b89250d3ca temporary fix to config cache 2024-11-01 17:06:10 -03:00
LUIS NOVO
2f2cdabd2d Fix issue with model defaults and bump version 2024-10-30 18:28:29 -03:00
LUIS NOVO
9390ea82f2 cleanup todos 2024-10-30 17:16:06 -03:00
LUIS NOVO
859b7f6e7e simplify the model selector 2024-10-30 14:30:29 -03:00
LUIS NOVO
8bb5db158f implement model config 2024-10-30 14:09:24 -03:00
LUIS NOVO
1002c6b4eb organize data folders 2024-10-26 18:54:54 -03:00
LUIS NOVO
6df77c5b92 add yaml config file mvp 2024-10-24 16:42:40 -03:00