mirror of
https://github.com/lfnovo/open-notebook.git
synced 2026-04-29 12:00:00 +00:00
Fix re-embedding issues and improve retry strategy (#515)
* fix: filter empty content in rebuild embeddings queries Update collect_items_for_rebuild() to properly filter out items with empty or whitespace-only content before submitting embedding jobs. Changes: - Sources: add string::trim(full_text) != '' filter - Notes: add string::trim(content) != '' filter - Insights: add content != none AND string::trim(content) != '' filter (previously had no content filter at all) This prevents unnecessary job submissions that would fail validation in the individual embed commands. Ref #513 * feat: add command_id to embedding error logs Add get_command_id() helper to extract command_id from execution context. Include command_id in error logs for all embedding commands: - embed_note_command - embed_insight_command - embed_source_command - create_insight_command This makes it easier to trace failed embedding jobs back to specific command records in the database. Ref #513 * fix: improve logging for embedding commands Log improvements: - Add command_id to all embedding error logs for traceability - Transaction conflicts in repo_insert now log at DEBUG (not ERROR) - Embedding API errors log at DEBUG, only ERROR when retries exhausted - Friendlier retry messages: "This will be retried automatically" - Include model name and command_id in generate_embeddings errors Files changed: - commands/embedding_commands.py: command_id in logs, friendlier messages - open_notebook/database/repository.py: DEBUG for transaction conflicts - open_notebook/utils/embedding.py: DEBUG logging, pass-through command_id Ref #513 * fix: correct field names in rebuild embeddings status endpoint The API status endpoint was looking for wrong field names: - sources_processed → sources_submitted - notes_processed → notes_submitted - insights_processed → insights_submitted - processed_items → jobs_submitted - failed_items → failed_submissions The command outputs "_submitted" because embedding happens async (we count jobs submitted, not items processed). Ref #513 * fix: update rebuild UI text to reflect async job submission Changed terminology from "Completed/processed" to "Jobs Submitted" since the rebuild command submits embedding jobs for async processing, not completing them synchronously. Updated in all locales: en-US, pt-BR, zh-CN, zh-TW, ja-JP Ref #513 * refactor: migrate retry strategy from allowlist to blocklist - Change from `retry_on: [RuntimeError, ...]` to `stop_on: [ValueError]` - This is more resilient: new exception types auto-retry by default - Simplified exception handling: ValueError = permanent, else = retry - Transient errors logged at DEBUG (surreal-commands logs final failure) - Permanent errors (ValueError) logged at ERROR Ref #513
This commit is contained in:
parent
301dd4e20a
commit
5b2c97cab7
11 changed files with 183 additions and 152 deletions
|
|
@ -180,7 +180,12 @@ async def repo_insert(
|
|||
except RuntimeError as e:
|
||||
if ignore_duplicates and "already contains" in str(e):
|
||||
return []
|
||||
logger.error(str(e))
|
||||
# Log transaction conflicts at debug level (they are expected during concurrent operations)
|
||||
error_str = str(e).lower()
|
||||
if "transaction" in error_str or "conflict" in error_str:
|
||||
logger.debug(str(e))
|
||||
else:
|
||||
logger.error(str(e))
|
||||
raise
|
||||
except Exception as e:
|
||||
if ignore_duplicates and "already contains" in str(e):
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue