open-notebook/api/sources_service.py
MisonL 67dd85c928
Feat/localization tests docker (#371)
* feat(i18n): complete 100% internationalization and fix Next.js 15 compatibility

* feat(i18n): complete 100% internationalization coverage

* chore(test): finalize component tests and project cleanup

* test(logic): add unit tests for useModalManager hook

* fix(test): resolve timeout in AppSidebar tests by mocking TooltipProvider

* feat(i18n): comprehensive i18n audit, fixes for hardcoded strings, and complete zh-TW support

* fix(i18n): resolve TypeScript warnings and improve translation hook stability

- Remove unused useTranslation import from ConnectionGuard
- Add ref-based checking state to prevent dependency cycles
- Fix useTranslation hook to return empty string for undefined translations
- Add comment for backward compatibility on ExtractedReference interface
- Ensure .replace() string methods work safely with nested translation keys

* feat(i18n): complete internationalization implementation with Docker deployment

- Add LanguageLoadingOverlay component for smooth language transitions
- Update all translation files (en-US, zh-CN, zh-TW) with improved terminology
- Optimize Docker configuration for better performance
- Update version check and config handling for i18n support
- Fix route handling for language-specific content
- Add comprehensive task documentation

* fix(i18n): resolve localization errors, duplicates, and type issues

* chore(i18n): finalize 100% internationalization coverage

* chore(test): supplement i18n test cases and cleanup redundant files

* fix(test): resolve lint type errors and finalize delivery documents

* feat(i18n): finalize full internationalization and zh-TW localization

* fix(frontend): add missing devDependency and fix build tsconfig

* feat(ui): enhance sidebar hover effects with better visual feedback

* fix(frontend): resolve accessibility, i18n, and lint issues

- fix: add missing id, name, autocomplete attributes to dialog inputs
- fix: add aria labels and DialogDescription for accessibility
- fix: resolve uncontrolled component warning in SettingsForm
- fix: correct duplicate 'Traditional Chinese' label in zh-TW locale
- feat: add i18n support for podcast template names
- chore: fix lint errors in Dialogs

* fix: address all 21 PR feedback items from cubic-dev-ai bot

Configuration:
- Remove ignoreDuringBuilds flags from next.config.ts

Testing:
- Fix AppSidebar.test.tsx regex pattern and add missing assertion

Logic:
- Fix ConnectionGuard.tsx re-entry prevention logic

Internationalization (I18n) - Translations:
- Add missing keys: notebooks.archived, common.note/insight, accessibility keys
- Add specific keys: sources.allSourcesDescShort, transformations.selectModel
- Add singular/plural keys: podcasts.usedByCount_one/other, common.note/notes
- Add common.created/updated with {time} placeholder

Internationalization (I18n) - Usage:
- SourcesPage: use allSourcesDescShort instead of string splitting
- TransformationPlayground: use navigation.transformation and selectModel
- CommandPalette: use dedicated keys instead of string concatenation
- GeneratePodcastDialog: fix zh-TW date locale handling
- NotebookHeader: correctly interpolate {time} placeholder
- TransformationCard: use common.description instead of undefined key
- ChatPanel/SpeakerProfilesPanel: implement proper pluralization
- SystemInfo: correctly interpolate {version} placeholder
- LanguageLoadingOverlay: use t.common.loading instead of hardcoded string
- MessageActions: use specific error key cannotSaveNoteNoNotebook

Other:
- Fix SessionManager.tsx exhaustive-deps warning

* fix: remove duplicate locale keys and add missing zh-CN translations

- en-US: remove duplicate loading key (line 59) and addNew key (sources)
- zh-CN: remove duplicate common keys (loading, note, insight, newSource, newNotebook, newPodcast)
- zh-CN: remove duplicate accessibility.searchNotebooks key
- zh-CN: remove duplicate sources.addNew key
- zh-CN: remove duplicate navigation.transformation key
- zh-CN: add missing usedByCount_one and usedByCount_other keys in podcasts
- zh-TW: remove duplicate common keys (loading, note, insight, newSource, newNotebook, newPodcast)
- zh-TW: remove duplicate accessibility.searchNotebooks key
- zh-TW: remove duplicate sources.addNew key

* docs: remove info.md

* fix: remove duplicate notebook keys and unused ts-expect-error

- zh-CN: remove duplicate notebooks keys (archived, archive, unarchive, deleteNotebook, deleteNotebookDesc)
- zh-TW: remove duplicate notebooks keys (archived, archive, unarchive, deleteNotebook, deleteNotebookDesc)
- GeneratePodcastDialog: remove unused @ts-expect-error directive

* fix(a11y): fix unassociated labels in search page

- Replace <Label> with role='group' + aria-labelledby for search type section
- Replace <Label> with role='group' + aria-labelledby for search in section
- Follows WAI-ARIA best practices for labeling form field groups

* fix(a11y): fix unassociated labels across multiple components

- search/page.tsx: use role='group' + aria-labelledby for search type and search in sections
- RebuildEmbeddings.tsx: use role='group' + aria-labelledby for include checkboxes
- TransformationPlayground.tsx: replace Label with span for non-form output label

* chore: revert to npm stack and ensure i18n compatibility

* chore: polish zh-TW translations for better idiomatic usage

* fix: resolve linter errors (ruff import sort, mypy config duplicate)

* style: apply ruff formatting

* fix: finalize upstream compliance (Dockerfile.single, i18n hooks, docker-compose)

* style: polish strings, fix timeout cleanup, and improve test mocks

* fix: use relative imports in test setup to resolve IDE path errors

* perf(docker): optimize build speed by removing apt-get upgrade and build tools

- Remove apt-get upgrade from both builder and runtime stages (saves 10-15 min each)
- Remove gcc/g++/make/git from builder (uv downloads pre-built wheels)
- Add --no-install-recommends to minimize package footprint
- Keep npm mirror (npmmirror.com) for faster frontend deps
- Add npm registry config for reliable China network access

Also includes:
- fix(a11y): add missing labels and aria attributes to form fields
- fix(i18n): add 2s safety timeout to LanguageLoadingOverlay
- fix(i18n): add robustness checks to use-translation proxy

Build time reduced from 2+ hours to ~34 minutes (~70% improvement)

* fix(a11y): resolve 16 form field accessibility warnings in notebook and podcast pages

* fix(a11y): resolve 4 button and 1 select field accessibility warnings in models page

* fix(a11y): resolve redundant attributes and residual warnings in transformations and podcast forms

* fix(i18n): deep fix for language switch hang using proxy protection and safer access

* fix(a11y): add name attributes to ModelSelector, TransformationPlayground, and SourceDetailContent

* fix: add missing Label import to SourceDetailContent

* fix(i18n): use native react-i18next in LanguageLoadingOverlay to prevent hang during language switch

* fix(i18n): rewrite use-translation Proxy with strict depth limit and expanded blocked props to prevent language switch hang

* fix: add type assertion to fix TypeScript comparison error

* fix(i18n): disable useSuspense to prevent thread hang during language resource loading

* fix(i18n): add infinite loop detection circuit breaker to useTranslation hook

* fix(i18n): update traditional chinese label to native script in en-US

* feat: add new localization strings for notebook and note management.

* fix: resolve config priority, docker build deps, and ui glitches

* refactor: improve ui details and test coverage based on feedback

* refactor: improve ui details (version check/lang toggle) and test coverage

* fix: polish language matching and test cleanup

* fix(test): update mocks to resolve timeouts and proxy errors

* fix(frontend): restore tsconfig.json structure and enable IDE support for tests

* fix: address PR review findings and resolve CI OIDC failure

* fix: merge exception headers in custom handler

* fix: comprehensive PR review remediations and async performance fixes

* refactor: address all PR #371 review feedback

- Docker: consolidate SURREAL_URL to docker.env, add single-container override
- Security: restore apt-get upgrade in Dockerfile and Dockerfile.single
- Create centralized getDateLocale helper (lib/utils/date-locale.ts)
- Refactor 7 files to use getDateLocale helper
- Revert config/route.ts to origin/main version
- Move test files to co-located pattern (3 files)
- Remove local useTranslation mock from ConfirmDialog.test.tsx
- Simplify use-version-check to single useEffect pattern
- Fix test import paths after moving to co-located pattern

* fix: add jest-dom types for test files

* fix: address remaining review issues

- Add apt-get upgrade -y to Dockerfile.single backend-builder stage
- Refactor ChatColumn.test.tsx: use 'as unknown as ReturnType<typeof hook>' instead of 'as any'
- Use toBeInTheDocument() assertions instead of toBeDefined()
2026-01-15 13:51:05 -03:00

324 lines
11 KiB
Python

"""
Sources service layer using API.
"""
from dataclasses import dataclass
from typing import Dict, List, Optional, Union
from loguru import logger
from api.client import api_client
from open_notebook.domain.notebook import Asset, Source
@dataclass
class SourceProcessingResult:
"""Result of source creation with optional async processing info."""
source: Source
is_async: bool = False
command_id: Optional[str] = None
status: Optional[str] = None
processing_info: Optional[Dict] = None
@dataclass
class SourceWithMetadata:
"""Source object with additional metadata from API."""
source: Source
embedded_chunks: int
# Expose common source properties for easy access
@property
def id(self):
return self.source.id
@property
def title(self):
return self.source.title
@title.setter
def title(self, value):
self.source.title = value
@property
def topics(self):
return self.source.topics
@property
def asset(self):
return self.source.asset
@property
def full_text(self):
return self.source.full_text
@property
def created(self):
return self.source.created
@property
def updated(self):
return self.source.updated
class SourcesService:
"""Service layer for sources operations using API."""
def __init__(self):
logger.info("Using API for sources operations")
def get_all_sources(
self, notebook_id: Optional[str] = None
) -> List[SourceWithMetadata]:
"""Get all sources with optional notebook filtering."""
sources_data = api_client.get_sources(notebook_id=notebook_id)
# Convert API response to SourceWithMetadata objects
sources = []
for source_data in sources_data:
source = Source(
title=source_data["title"],
topics=source_data["topics"],
asset=Asset(
file_path=source_data["asset"]["file_path"]
if source_data["asset"]
else None,
url=source_data["asset"]["url"] if source_data["asset"] else None,
)
if source_data["asset"]
else None,
)
source.id = source_data["id"]
source.created = source_data["created"]
source.updated = source_data["updated"]
# Wrap in SourceWithMetadata
source_with_metadata = SourceWithMetadata(
source=source, embedded_chunks=source_data.get("embedded_chunks", 0)
)
sources.append(source_with_metadata)
return sources
def get_source(self, source_id: str) -> SourceWithMetadata:
"""Get a specific source."""
response = api_client.get_source(source_id)
source_data = response if isinstance(response, dict) else response[0]
source = Source(
title=source_data["title"],
topics=source_data["topics"],
full_text=source_data["full_text"],
asset=Asset(
file_path=source_data["asset"]["file_path"]
if source_data["asset"]
else None,
url=source_data["asset"]["url"] if source_data["asset"] else None,
)
if source_data["asset"]
else None,
)
source.id = source_data["id"]
source.created = source_data["created"]
source.updated = source_data["updated"]
return SourceWithMetadata(
source=source, embedded_chunks=source_data.get("embedded_chunks", 0)
)
def create_source(
self,
notebook_id: Optional[str] = None,
source_type: str = "text",
url: Optional[str] = None,
file_path: Optional[str] = None,
content: Optional[str] = None,
title: Optional[str] = None,
transformations: Optional[List[str]] = None,
embed: bool = False,
delete_source: bool = False,
notebooks: Optional[List[str]] = None,
async_processing: bool = False,
) -> Union[Source, SourceProcessingResult]:
"""
Create a new source with support for async processing.
Args:
notebook_id: Single notebook ID (deprecated, use notebooks parameter)
source_type: Type of source (link, upload, text)
url: URL for link sources
file_path: File path for upload sources
content: Text content for text sources
title: Optional source title
transformations: List of transformation IDs to apply
embed: Whether to embed content for vector search
delete_source: Whether to delete uploaded file after processing
notebooks: List of notebook IDs to add source to (preferred over notebook_id)
async_processing: Whether to process source asynchronously
Returns:
Source object for sync processing (backward compatibility)
SourceProcessingResult for async processing (contains additional metadata)
"""
source_data = api_client.create_source(
notebook_id=notebook_id,
notebooks=notebooks,
source_type=source_type,
url=url,
file_path=file_path,
content=content,
title=title,
transformations=transformations,
embed=embed,
delete_source=delete_source,
async_processing=async_processing,
)
# Create Source object from response
response_data = source_data if isinstance(source_data, dict) else source_data[0]
source = Source(
title=response_data["title"],
topics=response_data.get("topics") or [],
full_text=response_data.get("full_text"),
asset=Asset(
file_path=response_data["asset"]["file_path"]
if response_data.get("asset")
else None,
url=response_data["asset"]["url"]
if response_data.get("asset")
else None,
)
if response_data.get("asset")
else None,
)
source.id = response_data["id"]
source.created = response_data["created"]
source.updated = response_data["updated"]
# Check if this is an async processing response
if (
response_data.get("command_id")
or response_data.get("status")
or response_data.get("processing_info")
):
# Ensure source_data is a dict for accessing attributes
source_data_dict = (
source_data if isinstance(source_data, dict) else source_data[0]
)
# Return enhanced result for async processing
return SourceProcessingResult(
source=source,
is_async=True,
command_id=source_data_dict.get("command_id"),
status=source_data_dict.get("status"),
processing_info=source_data_dict.get("processing_info"),
)
else:
# Return simple Source for backward compatibility
return source
def get_source_status(self, source_id: str) -> Dict:
"""Get processing status for a source."""
response = api_client.get_source_status(source_id)
return response if isinstance(response, dict) else response[0]
def create_source_async(
self,
notebook_id: Optional[str] = None,
source_type: str = "text",
url: Optional[str] = None,
file_path: Optional[str] = None,
content: Optional[str] = None,
title: Optional[str] = None,
transformations: Optional[List[str]] = None,
embed: bool = False,
delete_source: bool = False,
notebooks: Optional[List[str]] = None,
) -> SourceProcessingResult:
"""
Create a new source with async processing enabled.
This is a convenience method that always uses async processing.
Returns a SourceProcessingResult with processing status information.
"""
result = self.create_source(
notebook_id=notebook_id,
notebooks=notebooks,
source_type=source_type,
url=url,
file_path=file_path,
content=content,
title=title,
transformations=transformations,
embed=embed,
delete_source=delete_source,
async_processing=True,
)
# Since we forced async_processing=True, this should always be a SourceProcessingResult
if isinstance(result, SourceProcessingResult):
return result
else:
# Fallback: wrap Source in SourceProcessingResult
return SourceProcessingResult(
source=result,
is_async=False, # This shouldn't happen, but handle it gracefully
)
def is_source_processing_complete(self, source_id: str) -> bool:
"""
Check if a source's async processing is complete.
Returns True if processing is complete (success or failure),
False if still processing or queued.
"""
try:
status_data = self.get_source_status(source_id)
status = status_data.get("status")
return status in [
"completed",
"failed",
None,
] # None indicates legacy/sync source
except Exception as e:
logger.error(f"Error checking source processing status: {e}")
return True # Assume complete on error
def update_source(self, source: Source) -> Source:
"""Update a source."""
if not source.id:
raise ValueError("Source ID is required for update")
updates = {
"title": source.title,
"topics": source.topics,
}
source_data = api_client.update_source(source.id, **updates)
# Ensure source_data is a dict
source_data_dict = (
source_data if isinstance(source_data, dict) else source_data[0]
)
# Update the source object with the response
source.title = source_data_dict["title"]
source.topics = source_data_dict["topics"]
source.updated = source_data_dict["updated"]
return source
def delete_source(self, source_id: str) -> bool:
"""Delete a source."""
api_client.delete_source(source_id)
return True
# Global service instance
sources_service = SourcesService()
# Export important classes for easy importing
__all__ = [
"SourcesService",
"SourceWithMetadata",
"SourceProcessingResult",
"sources_service",
]