open-notebook/open_notebook/graphs/source.py
MisonL 67dd85c928
Feat/localization tests docker (#371)
* feat(i18n): complete 100% internationalization and fix Next.js 15 compatibility

* feat(i18n): complete 100% internationalization coverage

* chore(test): finalize component tests and project cleanup

* test(logic): add unit tests for useModalManager hook

* fix(test): resolve timeout in AppSidebar tests by mocking TooltipProvider

* feat(i18n): comprehensive i18n audit, fixes for hardcoded strings, and complete zh-TW support

* fix(i18n): resolve TypeScript warnings and improve translation hook stability

- Remove unused useTranslation import from ConnectionGuard
- Add ref-based checking state to prevent dependency cycles
- Fix useTranslation hook to return empty string for undefined translations
- Add comment for backward compatibility on ExtractedReference interface
- Ensure .replace() string methods work safely with nested translation keys

* feat(i18n): complete internationalization implementation with Docker deployment

- Add LanguageLoadingOverlay component for smooth language transitions
- Update all translation files (en-US, zh-CN, zh-TW) with improved terminology
- Optimize Docker configuration for better performance
- Update version check and config handling for i18n support
- Fix route handling for language-specific content
- Add comprehensive task documentation

* fix(i18n): resolve localization errors, duplicates, and type issues

* chore(i18n): finalize 100% internationalization coverage

* chore(test): supplement i18n test cases and cleanup redundant files

* fix(test): resolve lint type errors and finalize delivery documents

* feat(i18n): finalize full internationalization and zh-TW localization

* fix(frontend): add missing devDependency and fix build tsconfig

* feat(ui): enhance sidebar hover effects with better visual feedback

* fix(frontend): resolve accessibility, i18n, and lint issues

- fix: add missing id, name, autocomplete attributes to dialog inputs
- fix: add aria labels and DialogDescription for accessibility
- fix: resolve uncontrolled component warning in SettingsForm
- fix: correct duplicate 'Traditional Chinese' label in zh-TW locale
- feat: add i18n support for podcast template names
- chore: fix lint errors in Dialogs

* fix: address all 21 PR feedback items from cubic-dev-ai bot

Configuration:
- Remove ignoreDuringBuilds flags from next.config.ts

Testing:
- Fix AppSidebar.test.tsx regex pattern and add missing assertion

Logic:
- Fix ConnectionGuard.tsx re-entry prevention logic

Internationalization (I18n) - Translations:
- Add missing keys: notebooks.archived, common.note/insight, accessibility keys
- Add specific keys: sources.allSourcesDescShort, transformations.selectModel
- Add singular/plural keys: podcasts.usedByCount_one/other, common.note/notes
- Add common.created/updated with {time} placeholder

Internationalization (I18n) - Usage:
- SourcesPage: use allSourcesDescShort instead of string splitting
- TransformationPlayground: use navigation.transformation and selectModel
- CommandPalette: use dedicated keys instead of string concatenation
- GeneratePodcastDialog: fix zh-TW date locale handling
- NotebookHeader: correctly interpolate {time} placeholder
- TransformationCard: use common.description instead of undefined key
- ChatPanel/SpeakerProfilesPanel: implement proper pluralization
- SystemInfo: correctly interpolate {version} placeholder
- LanguageLoadingOverlay: use t.common.loading instead of hardcoded string
- MessageActions: use specific error key cannotSaveNoteNoNotebook

Other:
- Fix SessionManager.tsx exhaustive-deps warning

* fix: remove duplicate locale keys and add missing zh-CN translations

- en-US: remove duplicate loading key (line 59) and addNew key (sources)
- zh-CN: remove duplicate common keys (loading, note, insight, newSource, newNotebook, newPodcast)
- zh-CN: remove duplicate accessibility.searchNotebooks key
- zh-CN: remove duplicate sources.addNew key
- zh-CN: remove duplicate navigation.transformation key
- zh-CN: add missing usedByCount_one and usedByCount_other keys in podcasts
- zh-TW: remove duplicate common keys (loading, note, insight, newSource, newNotebook, newPodcast)
- zh-TW: remove duplicate accessibility.searchNotebooks key
- zh-TW: remove duplicate sources.addNew key

* docs: remove info.md

* fix: remove duplicate notebook keys and unused ts-expect-error

- zh-CN: remove duplicate notebooks keys (archived, archive, unarchive, deleteNotebook, deleteNotebookDesc)
- zh-TW: remove duplicate notebooks keys (archived, archive, unarchive, deleteNotebook, deleteNotebookDesc)
- GeneratePodcastDialog: remove unused @ts-expect-error directive

* fix(a11y): fix unassociated labels in search page

- Replace <Label> with role='group' + aria-labelledby for search type section
- Replace <Label> with role='group' + aria-labelledby for search in section
- Follows WAI-ARIA best practices for labeling form field groups

* fix(a11y): fix unassociated labels across multiple components

- search/page.tsx: use role='group' + aria-labelledby for search type and search in sections
- RebuildEmbeddings.tsx: use role='group' + aria-labelledby for include checkboxes
- TransformationPlayground.tsx: replace Label with span for non-form output label

* chore: revert to npm stack and ensure i18n compatibility

* chore: polish zh-TW translations for better idiomatic usage

* fix: resolve linter errors (ruff import sort, mypy config duplicate)

* style: apply ruff formatting

* fix: finalize upstream compliance (Dockerfile.single, i18n hooks, docker-compose)

* style: polish strings, fix timeout cleanup, and improve test mocks

* fix: use relative imports in test setup to resolve IDE path errors

* perf(docker): optimize build speed by removing apt-get upgrade and build tools

- Remove apt-get upgrade from both builder and runtime stages (saves 10-15 min each)
- Remove gcc/g++/make/git from builder (uv downloads pre-built wheels)
- Add --no-install-recommends to minimize package footprint
- Keep npm mirror (npmmirror.com) for faster frontend deps
- Add npm registry config for reliable China network access

Also includes:
- fix(a11y): add missing labels and aria attributes to form fields
- fix(i18n): add 2s safety timeout to LanguageLoadingOverlay
- fix(i18n): add robustness checks to use-translation proxy

Build time reduced from 2+ hours to ~34 minutes (~70% improvement)

* fix(a11y): resolve 16 form field accessibility warnings in notebook and podcast pages

* fix(a11y): resolve 4 button and 1 select field accessibility warnings in models page

* fix(a11y): resolve redundant attributes and residual warnings in transformations and podcast forms

* fix(i18n): deep fix for language switch hang using proxy protection and safer access

* fix(a11y): add name attributes to ModelSelector, TransformationPlayground, and SourceDetailContent

* fix: add missing Label import to SourceDetailContent

* fix(i18n): use native react-i18next in LanguageLoadingOverlay to prevent hang during language switch

* fix(i18n): rewrite use-translation Proxy with strict depth limit and expanded blocked props to prevent language switch hang

* fix: add type assertion to fix TypeScript comparison error

* fix(i18n): disable useSuspense to prevent thread hang during language resource loading

* fix(i18n): add infinite loop detection circuit breaker to useTranslation hook

* fix(i18n): update traditional chinese label to native script in en-US

* feat: add new localization strings for notebook and note management.

* fix: resolve config priority, docker build deps, and ui glitches

* refactor: improve ui details and test coverage based on feedback

* refactor: improve ui details (version check/lang toggle) and test coverage

* fix: polish language matching and test cleanup

* fix(test): update mocks to resolve timeouts and proxy errors

* fix(frontend): restore tsconfig.json structure and enable IDE support for tests

* fix: address PR review findings and resolve CI OIDC failure

* fix: merge exception headers in custom handler

* fix: comprehensive PR review remediations and async performance fixes

* refactor: address all PR #371 review feedback

- Docker: consolidate SURREAL_URL to docker.env, add single-container override
- Security: restore apt-get upgrade in Dockerfile and Dockerfile.single
- Create centralized getDateLocale helper (lib/utils/date-locale.ts)
- Refactor 7 files to use getDateLocale helper
- Revert config/route.ts to origin/main version
- Move test files to co-located pattern (3 files)
- Remove local useTranslation mock from ConfirmDialog.test.tsx
- Simplify use-version-check to single useEffect pattern
- Fix test import paths after moving to co-located pattern

* fix: add jest-dom types for test files

* fix: address remaining review issues

- Add apt-get upgrade -y to Dockerfile.single backend-builder stage
- Refactor ChatColumn.test.tsx: use 'as unknown as ReturnType<typeof hook>' instead of 'as any'
- Use toBeInTheDocument() assertions instead of toBeDefined()
2026-01-15 13:51:05 -03:00

167 lines
5.3 KiB
Python

import operator
from typing import Any, Dict, List, Optional
from content_core import extract_content
from content_core.common import ProcessSourceState
from langchain_core.runnables import RunnableConfig
from langgraph.graph import END, START, StateGraph
from langgraph.types import Send
from loguru import logger
from typing_extensions import Annotated, TypedDict
from open_notebook.ai.models import Model, ModelManager
from open_notebook.domain.content_settings import ContentSettings
from open_notebook.domain.notebook import Asset, Source
from open_notebook.domain.transformation import Transformation
from open_notebook.graphs.transformation import graph as transform_graph
class SourceState(TypedDict):
content_state: ProcessSourceState
apply_transformations: List[Transformation]
source_id: str
notebook_ids: List[str]
source: Source
transformation: Annotated[list, operator.add]
embed: bool
class TransformationState(TypedDict):
source: Source
transformation: Transformation
async def content_process(state: SourceState) -> dict:
content_settings = ContentSettings(
default_content_processing_engine_doc="auto",
default_content_processing_engine_url="auto",
default_embedding_option="ask",
auto_delete_files="yes",
youtube_preferred_languages=[
"en",
"pt",
"es",
"de",
"nl",
"en-GB",
"fr",
"hi",
"ja",
],
)
content_state: Dict[str, Any] = state["content_state"] # type: ignore[assignment]
content_state["url_engine"] = (
content_settings.default_content_processing_engine_url or "auto"
)
content_state["document_engine"] = (
content_settings.default_content_processing_engine_doc or "auto"
)
content_state["output_format"] = "markdown"
# Add speech-to-text model configuration from Default Models
try:
model_manager = ModelManager()
defaults = await model_manager.get_defaults()
if defaults.default_speech_to_text_model:
stt_model = await Model.get(defaults.default_speech_to_text_model)
if stt_model:
content_state["audio_provider"] = stt_model.provider
content_state["audio_model"] = stt_model.name
logger.debug(
f"Using speech-to-text model: {stt_model.provider}/{stt_model.name}"
)
except Exception as e:
logger.warning(f"Failed to retrieve speech-to-text model configuration: {e}")
# Continue without custom audio model (content-core will use its default)
processed_state = await extract_content(content_state)
return {"content_state": processed_state}
async def save_source(state: SourceState) -> dict:
content_state = state["content_state"]
# Get existing source using the provided source_id
source = await Source.get(state["source_id"])
if not source:
raise ValueError(f"Source with ID {state['source_id']} not found")
# Update the source with processed content
source.asset = Asset(url=content_state.url, file_path=content_state.file_path)
source.full_text = content_state.content
# Preserve existing title if none provided in processed content
if content_state.title:
source.title = content_state.title
await source.save()
# NOTE: Notebook associations are created by the API immediately for UI responsiveness
# No need to create them here to avoid duplicate edges
if state["embed"]:
logger.debug("Embedding content for vector search")
await source.vectorize()
return {"source": source}
def trigger_transformations(state: SourceState, config: RunnableConfig) -> List[Send]:
if len(state["apply_transformations"]) == 0:
return []
to_apply = state["apply_transformations"]
logger.debug(f"Applying transformations {to_apply}")
return [
Send(
"transform_content",
{
"source": state["source"],
"transformation": t,
},
)
for t in to_apply
]
async def transform_content(state: TransformationState) -> Optional[dict]:
source = state["source"]
content = source.full_text
if not content:
return None
transformation: Transformation = state["transformation"]
logger.debug(f"Applying transformation {transformation.name}")
result = await transform_graph.ainvoke(
dict(input_text=content, transformation=transformation) # type: ignore[arg-type]
)
await source.add_insight(transformation.title, result["output"])
return {
"transformation": [
{
"output": result["output"],
"transformation_name": transformation.name,
}
]
}
# Create and compile the workflow
workflow = StateGraph(SourceState)
# Add nodes
workflow.add_node("content_process", content_process)
workflow.add_node("save_source", save_source)
workflow.add_node("transform_content", transform_content)
# Define the graph edges
workflow.add_edge(START, "content_process")
workflow.add_edge("content_process", "save_source")
workflow.add_conditional_edges(
"save_source", trigger_transformations, ["transform_content"]
)
workflow.add_edge("transform_content", END)
# Compile the graph
source_graph = workflow.compile()