open-notebook/api/routers/embedding_rebuild.py
MisonL 67dd85c928
Feat/localization tests docker (#371)
* feat(i18n): complete 100% internationalization and fix Next.js 15 compatibility

* feat(i18n): complete 100% internationalization coverage

* chore(test): finalize component tests and project cleanup

* test(logic): add unit tests for useModalManager hook

* fix(test): resolve timeout in AppSidebar tests by mocking TooltipProvider

* feat(i18n): comprehensive i18n audit, fixes for hardcoded strings, and complete zh-TW support

* fix(i18n): resolve TypeScript warnings and improve translation hook stability

- Remove unused useTranslation import from ConnectionGuard
- Add ref-based checking state to prevent dependency cycles
- Fix useTranslation hook to return empty string for undefined translations
- Add comment for backward compatibility on ExtractedReference interface
- Ensure .replace() string methods work safely with nested translation keys

* feat(i18n): complete internationalization implementation with Docker deployment

- Add LanguageLoadingOverlay component for smooth language transitions
- Update all translation files (en-US, zh-CN, zh-TW) with improved terminology
- Optimize Docker configuration for better performance
- Update version check and config handling for i18n support
- Fix route handling for language-specific content
- Add comprehensive task documentation

* fix(i18n): resolve localization errors, duplicates, and type issues

* chore(i18n): finalize 100% internationalization coverage

* chore(test): supplement i18n test cases and cleanup redundant files

* fix(test): resolve lint type errors and finalize delivery documents

* feat(i18n): finalize full internationalization and zh-TW localization

* fix(frontend): add missing devDependency and fix build tsconfig

* feat(ui): enhance sidebar hover effects with better visual feedback

* fix(frontend): resolve accessibility, i18n, and lint issues

- fix: add missing id, name, autocomplete attributes to dialog inputs
- fix: add aria labels and DialogDescription for accessibility
- fix: resolve uncontrolled component warning in SettingsForm
- fix: correct duplicate 'Traditional Chinese' label in zh-TW locale
- feat: add i18n support for podcast template names
- chore: fix lint errors in Dialogs

* fix: address all 21 PR feedback items from cubic-dev-ai bot

Configuration:
- Remove ignoreDuringBuilds flags from next.config.ts

Testing:
- Fix AppSidebar.test.tsx regex pattern and add missing assertion

Logic:
- Fix ConnectionGuard.tsx re-entry prevention logic

Internationalization (I18n) - Translations:
- Add missing keys: notebooks.archived, common.note/insight, accessibility keys
- Add specific keys: sources.allSourcesDescShort, transformations.selectModel
- Add singular/plural keys: podcasts.usedByCount_one/other, common.note/notes
- Add common.created/updated with {time} placeholder

Internationalization (I18n) - Usage:
- SourcesPage: use allSourcesDescShort instead of string splitting
- TransformationPlayground: use navigation.transformation and selectModel
- CommandPalette: use dedicated keys instead of string concatenation
- GeneratePodcastDialog: fix zh-TW date locale handling
- NotebookHeader: correctly interpolate {time} placeholder
- TransformationCard: use common.description instead of undefined key
- ChatPanel/SpeakerProfilesPanel: implement proper pluralization
- SystemInfo: correctly interpolate {version} placeholder
- LanguageLoadingOverlay: use t.common.loading instead of hardcoded string
- MessageActions: use specific error key cannotSaveNoteNoNotebook

Other:
- Fix SessionManager.tsx exhaustive-deps warning

* fix: remove duplicate locale keys and add missing zh-CN translations

- en-US: remove duplicate loading key (line 59) and addNew key (sources)
- zh-CN: remove duplicate common keys (loading, note, insight, newSource, newNotebook, newPodcast)
- zh-CN: remove duplicate accessibility.searchNotebooks key
- zh-CN: remove duplicate sources.addNew key
- zh-CN: remove duplicate navigation.transformation key
- zh-CN: add missing usedByCount_one and usedByCount_other keys in podcasts
- zh-TW: remove duplicate common keys (loading, note, insight, newSource, newNotebook, newPodcast)
- zh-TW: remove duplicate accessibility.searchNotebooks key
- zh-TW: remove duplicate sources.addNew key

* docs: remove info.md

* fix: remove duplicate notebook keys and unused ts-expect-error

- zh-CN: remove duplicate notebooks keys (archived, archive, unarchive, deleteNotebook, deleteNotebookDesc)
- zh-TW: remove duplicate notebooks keys (archived, archive, unarchive, deleteNotebook, deleteNotebookDesc)
- GeneratePodcastDialog: remove unused @ts-expect-error directive

* fix(a11y): fix unassociated labels in search page

- Replace <Label> with role='group' + aria-labelledby for search type section
- Replace <Label> with role='group' + aria-labelledby for search in section
- Follows WAI-ARIA best practices for labeling form field groups

* fix(a11y): fix unassociated labels across multiple components

- search/page.tsx: use role='group' + aria-labelledby for search type and search in sections
- RebuildEmbeddings.tsx: use role='group' + aria-labelledby for include checkboxes
- TransformationPlayground.tsx: replace Label with span for non-form output label

* chore: revert to npm stack and ensure i18n compatibility

* chore: polish zh-TW translations for better idiomatic usage

* fix: resolve linter errors (ruff import sort, mypy config duplicate)

* style: apply ruff formatting

* fix: finalize upstream compliance (Dockerfile.single, i18n hooks, docker-compose)

* style: polish strings, fix timeout cleanup, and improve test mocks

* fix: use relative imports in test setup to resolve IDE path errors

* perf(docker): optimize build speed by removing apt-get upgrade and build tools

- Remove apt-get upgrade from both builder and runtime stages (saves 10-15 min each)
- Remove gcc/g++/make/git from builder (uv downloads pre-built wheels)
- Add --no-install-recommends to minimize package footprint
- Keep npm mirror (npmmirror.com) for faster frontend deps
- Add npm registry config for reliable China network access

Also includes:
- fix(a11y): add missing labels and aria attributes to form fields
- fix(i18n): add 2s safety timeout to LanguageLoadingOverlay
- fix(i18n): add robustness checks to use-translation proxy

Build time reduced from 2+ hours to ~34 minutes (~70% improvement)

* fix(a11y): resolve 16 form field accessibility warnings in notebook and podcast pages

* fix(a11y): resolve 4 button and 1 select field accessibility warnings in models page

* fix(a11y): resolve redundant attributes and residual warnings in transformations and podcast forms

* fix(i18n): deep fix for language switch hang using proxy protection and safer access

* fix(a11y): add name attributes to ModelSelector, TransformationPlayground, and SourceDetailContent

* fix: add missing Label import to SourceDetailContent

* fix(i18n): use native react-i18next in LanguageLoadingOverlay to prevent hang during language switch

* fix(i18n): rewrite use-translation Proxy with strict depth limit and expanded blocked props to prevent language switch hang

* fix: add type assertion to fix TypeScript comparison error

* fix(i18n): disable useSuspense to prevent thread hang during language resource loading

* fix(i18n): add infinite loop detection circuit breaker to useTranslation hook

* fix(i18n): update traditional chinese label to native script in en-US

* feat: add new localization strings for notebook and note management.

* fix: resolve config priority, docker build deps, and ui glitches

* refactor: improve ui details and test coverage based on feedback

* refactor: improve ui details (version check/lang toggle) and test coverage

* fix: polish language matching and test cleanup

* fix(test): update mocks to resolve timeouts and proxy errors

* fix(frontend): restore tsconfig.json structure and enable IDE support for tests

* fix: address PR review findings and resolve CI OIDC failure

* fix: merge exception headers in custom handler

* fix: comprehensive PR review remediations and async performance fixes

* refactor: address all PR #371 review feedback

- Docker: consolidate SURREAL_URL to docker.env, add single-container override
- Security: restore apt-get upgrade in Dockerfile and Dockerfile.single
- Create centralized getDateLocale helper (lib/utils/date-locale.ts)
- Refactor 7 files to use getDateLocale helper
- Revert config/route.ts to origin/main version
- Move test files to co-located pattern (3 files)
- Remove local useTranslation mock from ConfirmDialog.test.tsx
- Simplify use-version-check to single useEffect pattern
- Fix test import paths after moving to co-located pattern

* fix: add jest-dom types for test files

* fix: address remaining review issues

- Add apt-get upgrade -y to Dockerfile.single backend-builder stage
- Refactor ChatColumn.test.tsx: use 'as unknown as ReturnType<typeof hook>' instead of 'as any'
- Use toBeInTheDocument() assertions instead of toBeDefined()
2026-01-15 13:51:05 -03:00

192 lines
6.9 KiB
Python

from fastapi import APIRouter, HTTPException
from loguru import logger
from surreal_commands import get_command_status
from api.command_service import CommandService
from api.models import (
RebuildProgress,
RebuildRequest,
RebuildResponse,
RebuildStats,
RebuildStatusResponse,
)
from open_notebook.database.repository import repo_query
router = APIRouter()
@router.post("/rebuild", response_model=RebuildResponse)
async def start_rebuild(request: RebuildRequest):
"""
Start a background job to rebuild embeddings.
- **mode**: "existing" (re-embed items with embeddings) or "all" (embed everything)
- **include_sources**: Include sources in rebuild (default: true)
- **include_notes**: Include notes in rebuild (default: true)
- **include_insights**: Include insights in rebuild (default: true)
Returns command ID to track progress and estimated item count.
"""
try:
logger.info(f"Starting rebuild request: mode={request.mode}")
# Import commands to ensure they're registered
import commands.embedding_commands # noqa: F401
# Estimate total items (quick count query)
# This is a rough estimate before the command runs
total_estimate = 0
if request.include_sources:
if request.mode == "existing":
# Count sources with embeddings
result = await repo_query(
"""
SELECT VALUE count(array::distinct(
SELECT VALUE source.id
FROM source_embedding
WHERE embedding != none AND array::len(embedding) > 0
)) as count FROM {}
"""
)
else:
# Count all sources with content
result = await repo_query(
"SELECT VALUE count() as count FROM source WHERE full_text != none GROUP ALL"
)
if result and isinstance(result[0], dict):
total_estimate += result[0].get("count", 0)
elif result:
total_estimate += result[0] if isinstance(result[0], int) else 0
if request.include_notes:
if request.mode == "existing":
result = await repo_query(
"SELECT VALUE count() as count FROM note WHERE embedding != none AND array::len(embedding) > 0 GROUP ALL"
)
else:
result = await repo_query(
"SELECT VALUE count() as count FROM note WHERE content != none GROUP ALL"
)
if result and isinstance(result[0], dict):
total_estimate += result[0].get("count", 0)
elif result:
total_estimate += result[0] if isinstance(result[0], int) else 0
if request.include_insights:
if request.mode == "existing":
result = await repo_query(
"SELECT VALUE count() as count FROM source_insight WHERE embedding != none AND array::len(embedding) > 0 GROUP ALL"
)
else:
result = await repo_query(
"SELECT VALUE count() as count FROM source_insight GROUP ALL"
)
if result and isinstance(result[0], dict):
total_estimate += result[0].get("count", 0)
elif result:
total_estimate += result[0] if isinstance(result[0], int) else 0
logger.info(f"Estimated {total_estimate} items to process")
# Submit command
command_id = await CommandService.submit_command_job(
"open_notebook",
"rebuild_embeddings",
{
"mode": request.mode,
"include_sources": request.include_sources,
"include_notes": request.include_notes,
"include_insights": request.include_insights,
},
)
logger.info(f"Submitted rebuild command: {command_id}")
return RebuildResponse(
command_id=command_id,
total_items=total_estimate,
message=f"Rebuild operation started. Estimated {total_estimate} items to process.",
)
except Exception as e:
logger.error(f"Failed to start rebuild: {e}")
logger.exception(e)
raise HTTPException(
status_code=500, detail=f"Failed to start rebuild operation: {str(e)}"
)
@router.get("/rebuild/{command_id}/status", response_model=RebuildStatusResponse)
async def get_rebuild_status(command_id: str):
"""
Get the status of a rebuild operation.
Returns:
- **status**: queued, running, completed, failed
- **progress**: processed count, total count, percentage
- **stats**: breakdown by type (sources, notes, insights, failed)
- **timestamps**: started_at, completed_at
"""
try:
# Get command status from surreal_commands
status = await get_command_status(command_id)
if not status:
raise HTTPException(status_code=404, detail="Rebuild command not found")
# Build response based on status
response = RebuildStatusResponse(
command_id=command_id,
status=status.status,
)
# Extract metadata from command result
if status.result and isinstance(status.result, dict):
result = status.result
# Build progress info
if "total_items" in result and "processed_items" in result:
total = result["total_items"]
processed = result["processed_items"]
response.progress = RebuildProgress(
processed=processed,
total=total,
percentage=round((processed / total * 100) if total > 0 else 0, 2),
)
# Build stats
response.stats = RebuildStats(
sources=result.get("sources_processed", 0),
notes=result.get("notes_processed", 0),
insights=result.get("insights_processed", 0),
failed=result.get("failed_items", 0),
)
# Add timestamps
if hasattr(status, "created") and status.created:
response.started_at = str(status.created)
if hasattr(status, "updated") and status.updated:
response.completed_at = str(status.updated)
# Add error message if failed
if (
status.status == "failed"
and status.result
and isinstance(status.result, dict)
):
response.error_message = status.result.get("error_message", "Unknown error")
return response
except HTTPException:
raise
except Exception as e:
logger.error(f"Failed to get rebuild status: {e}")
logger.exception(e)
raise HTTPException(
status_code=500, detail=f"Failed to get rebuild status: {str(e)}"
)