fix: reduce retry log noise during concurrent chunk processing

Addresses issue #362 - users were seeing hundreds of ERROR/WARNING logs
when processing large documents due to SurrealDB v2 transaction conflicts
during concurrent chunk embedding operations.

Changes:
- Upgraded to surreal-commands v1.3.0 which includes retry_log_level feature
- Increased retry attempts from 5 to 15 with max wait time 120s (from 30s)
  to handle deep queues during concurrent processing
- Set retry_log_level to "debug" in embed_chunk and process_source commands
- Changed repository.py RuntimeError logging from ERROR to DEBUG level
- Updated command exception handlers to log retries at DEBUG level
- Updated documentation to reflect retry strategy

This is a temporary workaround for SurrealDB v2.x transaction conflict
issues with SEARCH indexes. Settings can be reduced after migrating to
SurrealDB v3 which fixes the underlying concurrency issue.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit is contained in:
LUIS NOVO 2026-01-05 11:30:55 -03:00
parent b76af505b2
commit 48e2800211
7 changed files with 53 additions and 44 deletions

View file

@ -74,7 +74,7 @@ Both leverage connection context manager for lifecycle management and automatic
- **Async-first design**: All operations async via AsyncSurreal; sync wrapper provided for legacy code
- **Connection per operation**: Each repo_* function opens/closes connection (no pooling); designed for serverless/stateless API
- **Auto-timestamping**: repo_create() and repo_update() auto-set `created`/`updated` fields
- **Error resilience**: RuntimeError for transaction conflicts (retriable); catches and re-raises other exceptions
- **Error resilience**: RuntimeError for transaction conflicts (retriable, logged at DEBUG level); catches and re-raises other exceptions
- **RecordID polymorphism**: Functions accept string or RecordID; coerced to consistent type
- **Graceful degradation**: Migration queries catch exceptions and treat table-not-found as version 0
@ -91,7 +91,7 @@ Both leverage connection context manager for lifecycle management and automatic
- **Record ID format inconsistency**: repo_update() accepts both `table:id` format and full RecordID; path handling can be subtle
- **ISO date parsing**: repo_update() parses `created` field from string to datetime if present; assumes ISO format
- **Timestamp overwrite risk**: repo_create() always sets new timestamps; can't preserve original created time on reimport
- **Transaction conflict handling**: RuntimeError from transaction conflicts logged without stack trace (prevents log spam)
- **Transaction conflict handling**: RuntimeError from transaction conflicts logged at DEBUG level without stack trace (prevents log spam during concurrent operations)
- **Graceful null returns**: get_all_versions() returns [] on table missing; allows migration system to bootstrap cleanly
## How to Extend