open-notebook

vrr/open-notebook

Fork 0

mirror of https://github.com/lfnovo/open-notebook.git synced 2026-04-28 19:40:50 +00:00

Commit graph

Author	SHA1	Message	Date
Luis Novo	4f33b854dd	feat: add environment variables for chunk size configuration (#520 ) Some checks are pending Development Build / extract-version (push) Waiting to run Details Development Build / build-regular (push) Blocked by required conditions Details Development Build / build-single (push) Blocked by required conditions Details Development Build / summary (push) Blocked by required conditions Details Adds OPEN_NOTEBOOK_CHUNK_SIZE and OPEN_NOTEBOOK_CHUNK_OVERLAP environment variables to allow users to configure chunking behavior for different embedding models with varying context window limits. Key changes: - CHUNK_SIZE is now configurable via OPEN_NOTEBOOK_CHUNK_SIZE (default: 1200) - CHUNK_OVERLAP is configurable via OPEN_NOTEBOOK_CHUNK_OVERLAP (default: 15%) - Validation with warnings for invalid or out-of-range values - Updated documentation with configuration examples This enables users of models like mxbai-embed-large with limited context windows to reduce chunk size accordingly. Closes #510	2026-01-31 19:30:56 -03:00
Luis Novo	4e411e0488	feat: add cascade deletion for notebooks with delete preview (#471 ) * feat: decrease chunking size for maximum ollama compatibility * docs: improve i18n info on Claude.md * feat: add cascade deletion for notebooks with delete preview - Add Notebook.get_delete_preview() to show counts of affected items - Add Notebook.delete(delete_exclusive_sources) for cascade deletion - Always delete notes when notebook is deleted - Allow user to choose: delete or keep exclusive sources - Shared sources are always unlinked but never deleted - Add NotebookDeleteDialog component with radio button options - Add delete-preview API endpoint - Update delete endpoint with delete_exclusive_sources param - Add i18n support for all 5 locales Closes #77 * docs: remove harcoded config settings	2026-01-25 14:56:14 -03:00
Luis Novo	d8006ff5cb	feat: content-type aware chunking and unified embedding (#444 ) * feat: content-type aware chunking and unified embedding - Add chunking.py with HTML, Markdown, and plain text detection - Add embedding.py with mean pooling for large content - Create dedicated commands: embed_note, embed_insight, embed_source - Use fire-and-forget pattern for embedding via submit_command() - Refactor rebuild_embeddings_command to delegate to individual commands - Remove legacy commands and needs_embedding() methods - Reduce chunk size to 1500 chars for Ollama compatibility - Update CLAUDE.md documentation for new architecture Fixes #350, #142 * fix: address code review issues - Note.save() now returns command_id for tracking embedding jobs - Add length check after generate_embeddings() to fail fast on mismatch - Add numpy as explicit dependency (was transitive) - Remove hardcoded chunk sizes from docstrings * docs: address code review comments - Rename "SYNC PATH" to "DOMAIN MODEL PATH" in embedding router - Add test_chunking.py and test_embedding.py to Testing Strategy - Clarify auto-embedding behavior for each domain model * fix: clean thinking tags from prompt graph output Adds clean_thinking_content() to prompt.py to handle extended thinking models that return <think>...</think> tags. This fixes empty titles when saving notes from chat. * chore: remove local docker-compose from git * fix(frontend): handle null parent_id in search results Add defensive check for null parent_id in search results to prevent "Cannot read properties of null (reading 'split')" error. This can happen with orphaned records in the database. * fix: cascade delete embeddings and insights when source is deleted When deleting a Source, now also deletes associated: - source_embedding records - source_insight records This prevents orphaned records that cause null parent_id errors in vector search results. * fix: add cleanup for orphan embedding/insight records in migration 10 Deletes source_embedding and source_insight records where the linked source no longer exists (source.id = NONE). * chore: bump esperanto to 2.16 Increases ctx_num for Ollama models to accommodate larger notebook context windows. See: https://github.com/lfnovo/esperanto/pull/69	2026-01-21 23:49:08 -03:00

Author

SHA1

Message

Date

Luis Novo

4f33b854dd

feat: add environment variables for chunk size configuration (#520 )

Development Build / extract-version (push) Waiting to run

Details

Development Build / build-regular (push) Blocked by required conditions

Details

Development Build / build-single (push) Blocked by required conditions

Details

Development Build / summary (push) Blocked by required conditions

Details

Adds OPEN_NOTEBOOK_CHUNK_SIZE and OPEN_NOTEBOOK_CHUNK_OVERLAP environment
variables to allow users to configure chunking behavior for different
embedding models with varying context window limits.

Key changes:
- CHUNK_SIZE is now configurable via OPEN_NOTEBOOK_CHUNK_SIZE (default: 1200)
- CHUNK_OVERLAP is configurable via OPEN_NOTEBOOK_CHUNK_OVERLAP (default: 15%)
- Validation with warnings for invalid or out-of-range values
- Updated documentation with configuration examples

This enables users of models like mxbai-embed-large with limited context
windows to reduce chunk size accordingly.

Closes #510

2026-01-31 19:30:56 -03:00

Luis Novo

4e411e0488

feat: add cascade deletion for notebooks with delete preview (#471 )

* feat: decrease chunking size for maximum ollama compatibility

* docs: improve i18n info on Claude.md

* feat: add cascade deletion for notebooks with delete preview

- Add Notebook.get_delete_preview() to show counts of affected items
- Add Notebook.delete(delete_exclusive_sources) for cascade deletion
- Always delete notes when notebook is deleted
- Allow user to choose: delete or keep exclusive sources
- Shared sources are always unlinked but never deleted
- Add NotebookDeleteDialog component with radio button options
- Add delete-preview API endpoint
- Update delete endpoint with delete_exclusive_sources param
- Add i18n support for all 5 locales

Closes #77

* docs: remove harcoded config settings

2026-01-25 14:56:14 -03:00

Luis Novo

d8006ff5cb

feat: content-type aware chunking and unified embedding (#444 )

* feat: content-type aware chunking and unified embedding

- Add chunking.py with HTML, Markdown, and plain text detection
- Add embedding.py with mean pooling for large content
- Create dedicated commands: embed_note, embed_insight, embed_source
- Use fire-and-forget pattern for embedding via submit_command()
- Refactor rebuild_embeddings_command to delegate to individual commands
- Remove legacy commands and needs_embedding() methods
- Reduce chunk size to 1500 chars for Ollama compatibility
- Update CLAUDE.md documentation for new architecture

Fixes #350, #142

* fix: address code review issues

- Note.save() now returns command_id for tracking embedding jobs
- Add length check after generate_embeddings() to fail fast on mismatch
- Add numpy as explicit dependency (was transitive)
- Remove hardcoded chunk sizes from docstrings

* docs: address code review comments

- Rename "SYNC PATH" to "DOMAIN MODEL PATH" in embedding router
- Add test_chunking.py and test_embedding.py to Testing Strategy
- Clarify auto-embedding behavior for each domain model

* fix: clean thinking tags from prompt graph output

Adds clean_thinking_content() to prompt.py to handle extended thinking
models that return <think>...</think> tags. This fixes empty titles
when saving notes from chat.

* chore: remove local docker-compose from git

* fix(frontend): handle null parent_id in search results

Add defensive check for null parent_id in search results to prevent
"Cannot read properties of null (reading 'split')" error. This can
happen with orphaned records in the database.

* fix: cascade delete embeddings and insights when source is deleted

When deleting a Source, now also deletes associated:
- source_embedding records
- source_insight records

This prevents orphaned records that cause null parent_id errors
in vector search results.

* fix: add cleanup for orphan embedding/insight records in migration 10

Deletes source_embedding and source_insight records where the
linked source no longer exists (source.id = NONE).

* chore: bump esperanto to 2.16

Increases ctx_num for Ollama models to accommodate larger notebook
context windows. See: https://github.com/lfnovo/esperanto/pull/69

2026-01-21 23:49:08 -03:00

3 commits