Luis Novo
d8006ff5cb
feat: content-type aware chunking and unified embedding ( #444 )
...
* feat: content-type aware chunking and unified embedding
- Add chunking.py with HTML, Markdown, and plain text detection
- Add embedding.py with mean pooling for large content
- Create dedicated commands: embed_note, embed_insight, embed_source
- Use fire-and-forget pattern for embedding via submit_command()
- Refactor rebuild_embeddings_command to delegate to individual commands
- Remove legacy commands and needs_embedding() methods
- Reduce chunk size to 1500 chars for Ollama compatibility
- Update CLAUDE.md documentation for new architecture
Fixes #350 , #142
* fix: address code review issues
- Note.save() now returns command_id for tracking embedding jobs
- Add length check after generate_embeddings() to fail fast on mismatch
- Add numpy as explicit dependency (was transitive)
- Remove hardcoded chunk sizes from docstrings
* docs: address code review comments
- Rename "SYNC PATH" to "DOMAIN MODEL PATH" in embedding router
- Add test_chunking.py and test_embedding.py to Testing Strategy
- Clarify auto-embedding behavior for each domain model
* fix: clean thinking tags from prompt graph output
Adds clean_thinking_content() to prompt.py to handle extended thinking
models that return <think>...</think> tags. This fixes empty titles
when saving notes from chat.
* chore: remove local docker-compose from git
* fix(frontend): handle null parent_id in search results
Add defensive check for null parent_id in search results to prevent
"Cannot read properties of null (reading 'split')" error. This can
happen with orphaned records in the database.
* fix: cascade delete embeddings and insights when source is deleted
When deleting a Source, now also deletes associated:
- source_embedding records
- source_insight records
This prevents orphaned records that cause null parent_id errors
in vector search results.
* fix: add cleanup for orphan embedding/insight records in migration 10
Deletes source_embedding and source_insight records where the
linked source no longer exists (source.id = NONE).
* chore: bump esperanto to 2.16
Increases ctx_num for Ollama models to accommodate larger notebook
context windows. See: https://github.com/lfnovo/esperanto/pull/69
2026-01-21 23:49:08 -03:00
MisonL
67dd85c928
Feat/localization tests docker ( #371 )
...
* feat(i18n): complete 100% internationalization and fix Next.js 15 compatibility
* feat(i18n): complete 100% internationalization coverage
* chore(test): finalize component tests and project cleanup
* test(logic): add unit tests for useModalManager hook
* fix(test): resolve timeout in AppSidebar tests by mocking TooltipProvider
* feat(i18n): comprehensive i18n audit, fixes for hardcoded strings, and complete zh-TW support
* fix(i18n): resolve TypeScript warnings and improve translation hook stability
- Remove unused useTranslation import from ConnectionGuard
- Add ref-based checking state to prevent dependency cycles
- Fix useTranslation hook to return empty string for undefined translations
- Add comment for backward compatibility on ExtractedReference interface
- Ensure .replace() string methods work safely with nested translation keys
* feat(i18n): complete internationalization implementation with Docker deployment
- Add LanguageLoadingOverlay component for smooth language transitions
- Update all translation files (en-US, zh-CN, zh-TW) with improved terminology
- Optimize Docker configuration for better performance
- Update version check and config handling for i18n support
- Fix route handling for language-specific content
- Add comprehensive task documentation
* fix(i18n): resolve localization errors, duplicates, and type issues
* chore(i18n): finalize 100% internationalization coverage
* chore(test): supplement i18n test cases and cleanup redundant files
* fix(test): resolve lint type errors and finalize delivery documents
* feat(i18n): finalize full internationalization and zh-TW localization
* fix(frontend): add missing devDependency and fix build tsconfig
* feat(ui): enhance sidebar hover effects with better visual feedback
* fix(frontend): resolve accessibility, i18n, and lint issues
- fix: add missing id, name, autocomplete attributes to dialog inputs
- fix: add aria labels and DialogDescription for accessibility
- fix: resolve uncontrolled component warning in SettingsForm
- fix: correct duplicate 'Traditional Chinese' label in zh-TW locale
- feat: add i18n support for podcast template names
- chore: fix lint errors in Dialogs
* fix: address all 21 PR feedback items from cubic-dev-ai bot
Configuration:
- Remove ignoreDuringBuilds flags from next.config.ts
Testing:
- Fix AppSidebar.test.tsx regex pattern and add missing assertion
Logic:
- Fix ConnectionGuard.tsx re-entry prevention logic
Internationalization (I18n) - Translations:
- Add missing keys: notebooks.archived, common.note/insight, accessibility keys
- Add specific keys: sources.allSourcesDescShort, transformations.selectModel
- Add singular/plural keys: podcasts.usedByCount_one/other, common.note/notes
- Add common.created/updated with {time} placeholder
Internationalization (I18n) - Usage:
- SourcesPage: use allSourcesDescShort instead of string splitting
- TransformationPlayground: use navigation.transformation and selectModel
- CommandPalette: use dedicated keys instead of string concatenation
- GeneratePodcastDialog: fix zh-TW date locale handling
- NotebookHeader: correctly interpolate {time} placeholder
- TransformationCard: use common.description instead of undefined key
- ChatPanel/SpeakerProfilesPanel: implement proper pluralization
- SystemInfo: correctly interpolate {version} placeholder
- LanguageLoadingOverlay: use t.common.loading instead of hardcoded string
- MessageActions: use specific error key cannotSaveNoteNoNotebook
Other:
- Fix SessionManager.tsx exhaustive-deps warning
* fix: remove duplicate locale keys and add missing zh-CN translations
- en-US: remove duplicate loading key (line 59) and addNew key (sources)
- zh-CN: remove duplicate common keys (loading, note, insight, newSource, newNotebook, newPodcast)
- zh-CN: remove duplicate accessibility.searchNotebooks key
- zh-CN: remove duplicate sources.addNew key
- zh-CN: remove duplicate navigation.transformation key
- zh-CN: add missing usedByCount_one and usedByCount_other keys in podcasts
- zh-TW: remove duplicate common keys (loading, note, insight, newSource, newNotebook, newPodcast)
- zh-TW: remove duplicate accessibility.searchNotebooks key
- zh-TW: remove duplicate sources.addNew key
* docs: remove info.md
* fix: remove duplicate notebook keys and unused ts-expect-error
- zh-CN: remove duplicate notebooks keys (archived, archive, unarchive, deleteNotebook, deleteNotebookDesc)
- zh-TW: remove duplicate notebooks keys (archived, archive, unarchive, deleteNotebook, deleteNotebookDesc)
- GeneratePodcastDialog: remove unused @ts-expect-error directive
* fix(a11y): fix unassociated labels in search page
- Replace <Label> with role='group' + aria-labelledby for search type section
- Replace <Label> with role='group' + aria-labelledby for search in section
- Follows WAI-ARIA best practices for labeling form field groups
* fix(a11y): fix unassociated labels across multiple components
- search/page.tsx: use role='group' + aria-labelledby for search type and search in sections
- RebuildEmbeddings.tsx: use role='group' + aria-labelledby for include checkboxes
- TransformationPlayground.tsx: replace Label with span for non-form output label
* chore: revert to npm stack and ensure i18n compatibility
* chore: polish zh-TW translations for better idiomatic usage
* fix: resolve linter errors (ruff import sort, mypy config duplicate)
* style: apply ruff formatting
* fix: finalize upstream compliance (Dockerfile.single, i18n hooks, docker-compose)
* style: polish strings, fix timeout cleanup, and improve test mocks
* fix: use relative imports in test setup to resolve IDE path errors
* perf(docker): optimize build speed by removing apt-get upgrade and build tools
- Remove apt-get upgrade from both builder and runtime stages (saves 10-15 min each)
- Remove gcc/g++/make/git from builder (uv downloads pre-built wheels)
- Add --no-install-recommends to minimize package footprint
- Keep npm mirror (npmmirror.com) for faster frontend deps
- Add npm registry config for reliable China network access
Also includes:
- fix(a11y): add missing labels and aria attributes to form fields
- fix(i18n): add 2s safety timeout to LanguageLoadingOverlay
- fix(i18n): add robustness checks to use-translation proxy
Build time reduced from 2+ hours to ~34 minutes (~70% improvement)
* fix(a11y): resolve 16 form field accessibility warnings in notebook and podcast pages
* fix(a11y): resolve 4 button and 1 select field accessibility warnings in models page
* fix(a11y): resolve redundant attributes and residual warnings in transformations and podcast forms
* fix(i18n): deep fix for language switch hang using proxy protection and safer access
* fix(a11y): add name attributes to ModelSelector, TransformationPlayground, and SourceDetailContent
* fix: add missing Label import to SourceDetailContent
* fix(i18n): use native react-i18next in LanguageLoadingOverlay to prevent hang during language switch
* fix(i18n): rewrite use-translation Proxy with strict depth limit and expanded blocked props to prevent language switch hang
* fix: add type assertion to fix TypeScript comparison error
* fix(i18n): disable useSuspense to prevent thread hang during language resource loading
* fix(i18n): add infinite loop detection circuit breaker to useTranslation hook
* fix(i18n): update traditional chinese label to native script in en-US
* feat: add new localization strings for notebook and note management.
* fix: resolve config priority, docker build deps, and ui glitches
* refactor: improve ui details and test coverage based on feedback
* refactor: improve ui details (version check/lang toggle) and test coverage
* fix: polish language matching and test cleanup
* fix(test): update mocks to resolve timeouts and proxy errors
* fix(frontend): restore tsconfig.json structure and enable IDE support for tests
* fix: address PR review findings and resolve CI OIDC failure
* fix: merge exception headers in custom handler
* fix: comprehensive PR review remediations and async performance fixes
* refactor: address all PR #371 review feedback
- Docker: consolidate SURREAL_URL to docker.env, add single-container override
- Security: restore apt-get upgrade in Dockerfile and Dockerfile.single
- Create centralized getDateLocale helper (lib/utils/date-locale.ts)
- Refactor 7 files to use getDateLocale helper
- Revert config/route.ts to origin/main version
- Move test files to co-located pattern (3 files)
- Remove local useTranslation mock from ConfirmDialog.test.tsx
- Simplify use-version-check to single useEffect pattern
- Fix test import paths after moving to co-located pattern
* fix: add jest-dom types for test files
* fix: address remaining review issues
- Add apt-get upgrade -y to Dockerfile.single backend-builder stage
- Refactor ChatColumn.test.tsx: use 'as unknown as ReturnType<typeof hook>' instead of 'as any'
- Use toBeInTheDocument() assertions instead of toBeDefined()
2026-01-15 13:51:05 -03:00
LUIS NOVO
71b8d13b24
docs: generate comprehensive CLAUDE.md reference documentation across codebase
...
Create a hierarchical CLAUDE.md documentation system for the entire Open Notebook
codebase with focus on concise, pattern-driven reference cards rather than
comprehensive tutorials.
## Changes
### Core Documentation System
- Updated `.claude/commands/build-claude-md.md` to distinguish between leaf and
parent modules, with special handling for prompt/template modules
- Established clear patterns:
* Leaf modules (40-70 lines): Components, hooks, API clients
* Parent modules (50-150 lines): Architecture, cross-layer patterns, data flows
* Template modules: Pattern focus, not catalog listings
### Generated Documentation
Created 15 CLAUDE.md reference files across the project:
**Frontend (React/Next.js)**
- frontend/src/CLAUDE.md: Architecture overview, data flow, three-tier design
- frontend/src/lib/hooks/CLAUDE.md: React Query patterns, state management
- frontend/src/lib/api/CLAUDE.md: Axios client, FormData handling, interceptors
- frontend/src/lib/stores/CLAUDE.md: Zustand state persistence, auth patterns
- frontend/src/components/ui/CLAUDE.md: Radix UI primitives, CVA styling
**Backend (Python/FastAPI)**
- open_notebook/CLAUDE.md: System architecture, layer interactions
- open_notebook/ai/CLAUDE.md: Model provisioning, Esperanto integration
- open_notebook/domain/CLAUDE.md: Data models, ObjectModel/RecordModel patterns
- open_notebook/database/CLAUDE.md: Repository pattern, async migrations
- open_notebook/graphs/CLAUDE.md: LangGraph workflows, async orchestration
- open_notebook/utils/CLAUDE.md: Cross-cutting utilities, context building
- open_notebook/podcasts/CLAUDE.md: Episode/speaker profiles, job tracking
**API & Other**
- api/CLAUDE.md: REST layer, service architecture
- commands/CLAUDE.md: Async command handlers, job queue patterns
- prompts/CLAUDE.md: Jinja2 templates, prompt engineering patterns (refactored)
**Project Root**
- CLAUDE.md: Project overview, three-tier architecture, tech stack, getting started
### Key Features
- Zero duplication: Parent modules reference child CLAUDE.md files, don't repeat them
- Pattern-focused: Emphasizes how components work together, not component catalogs
- Scannable: Short bullets, code examples only when necessary (1-2 per file)
- Practical: "How to extend" guides, quirks/gotchas for each module
- Navigation: Root CLAUDE.md acts as hub pointing to specialized documentation
### Cleanup
- Removed unused `batch_fix_services.py`
- Removed deprecated `open_notebook/plugins/podcasts.py`
- Updated .gitignore for documentation consistency
## Impact
New contributors can now:
1. Read root CLAUDE.md for system architecture (5 min)
2. Jump to specific layer documentation (frontend, api, open_notebook)
3. Dive into module-specific patterns in child CLAUDE.md files (1 min per module)
All documentation is lean, reference-focused, and avoids duplication.
2026-01-03 16:27:52 -03:00
LUIS NOVO
ab5560c9a2
refactor: reorganize folder structure for better maintainability
...
Changes:
- Move migrations/ under open_notebook/database/migrations/
- Extract AI models to open_notebook/ai/ (Model, ModelManager, provision)
- Extract podcasts to open_notebook/podcasts/ (EpisodeProfile, SpeakerProfile, PodcastEpisode)
- Reorganize prompts to mirror graphs structure (chat/, source_chat/)
This improves code organization by:
- Consolidating database concerns (migrations now with database code)
- Separating AI infrastructure from domain entities
- Isolating podcast feature into its own module
- Creating consistent prompt/graph naming conventions
All 52 tests pass.
2026-01-03 14:04:27 -03:00
Justin Florentine
855e730577
fix: preserve AIMessage metadata when cleaning thinking content
...
Use model_copy() instead of creating new AIMessage to preserve
response_metadata, id, usage_metadata, etc. Also adds test coverage
for malformed thinking tags pattern.
Addresses PR #333 feedback from lfnovo and cubic-dev-ai.
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 20:08:12 -05:00
Justin Florentine
869664a10b
fix: strip <think> tags from chat responses
...
Add thinking content cleaning to notebook and source chat graphs.
Previously, models that output <think>...</think> tags (like DeepSeek)
or malformed variants without opening tags (like Nemotron) would leak
reasoning content into user-visible responses.
Changes:
- chat.py: Clean AI response content before returning messages
- source_chat.py: Same fix for source-specific chat
- text_utils.py: Handle malformed output where opening <think> tag
is missing but </think> is present
🤖 Generated with [Claude Code](https://claude.com/claude-code )
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-18 16:31:23 -05:00
Luis Novo
f79a9040ae
Release 1.2 ( #242 )
...
* chore: improve podcast transcripts
* fix: remove date from insight - fixes #241
* fix: improve scrolling on source and insights - fixes #237
* chore: update esperanto to fix : #234
* chore: update esperanto to fix #226
* fix: process vectorization as subcommands to handle larger documents more gracefully - fix : #229
* feat: enable background job retry capabilities
* feat: reenable content types that were disabled during alpha version
* fix: remove unnecessary model caching causing many issues.
* feat: support multiple azure endpoints and keys just like openai compatible. Fixes #215
* docs: update azure variables
* chore: bump and update dependencies
2025-11-01 14:40:00 -03:00
LUIS NOVO
a51bb9d792
fix: missing parenthesis
2025-10-18 13:22:39 -03:00
LUIS NOVO
8b5daa86bc
fix: max tokens max is 8192 now
2025-10-18 13:21:53 -03:00
Luis Novo
b7e656a319
Version 1 ( #160 )
...
New front-end
Launch Chat API
Manage Sources
Enable re-embedding of all contents
Sources can be added without a notebook now
Improved settings
Enable model selector on all chats
Background processing for better experience
Dark mode
Improved Notes
Improved Docs:
- Remove all Streamlit references from documentation
- Update deployment guides with React frontend setup
- Fix Docker environment variables format (SURREAL_URL, SURREAL_PASSWORD)
- Update docker image tag from :latest to :v1-latest
- Change navigation references (Settings → Models to just Models)
- Update development setup to include frontend npm commands
- Add MIGRATION.md guide for users upgrading from Streamlit
- Update quick-start guide with correct environment variables
- Add port 5055 documentation for API access
- Update project structure to reflect frontend/ directory
- Remove outdated source-chat documentation files
2025-10-18 12:46:22 -03:00
Luis Novo
d7b0fff954
Api podcast migration ( #93 )
...
Creates the API layer for Open Notebook
Creates a services API gateway for the Streamlit front-end
Migrates the SurrealDB SDK to the official one
Change all database calls to async
New podcast framework supporting multiple speaker configurations
Implement the surreal-commands library for async processing
Improve docker image and docker-compose configurations
2025-07-17 08:36:11 -03:00
LUIS NOVO
7eee271232
feat: extract think tags from reasoning models
2025-06-26 11:41:15 -03:00
LUIS NOVO
bea43f3ce7
feat: implement the new model management based on esperanto framework
2025-06-08 19:38:43 -03:00
LUIS NOVO
2afbd36cb4
refactor: implement ai_prompter library
2025-06-01 08:09:33 -03:00
LUIS NOVO
1afb5d81e8
feat: implement new content settings page and remove options from the source panel
2025-05-30 15:25:39 -03:00
LUIS NOVO
36e928eb75
feat: replace content processing engine with content-core
2025-05-30 13:35:46 -03:00
LUIS NOVO
c297dcb809
refactor objectmodel
2024-11-19 19:03:32 -03:00
LUIS NOVO
4a5d47d934
refactor transformation, add graph and admin
2024-11-18 22:01:11 -03:00
LUIS NOVO
066c7a06e2
improve search functions
2024-11-13 15:52:44 -03:00
LUIS NOVO
80353a97c9
make model rag work with vector only
2024-11-13 12:18:26 -03:00
LUIS NOVO
e4b8fa8cc7
cleanup logging
2024-11-13 12:17:57 -03:00
LUIS NOVO
281abdf01b
improve the accuracy of ids in the citations
2024-11-13 11:55:38 -03:00
LUIS NOVO
a33228de5a
split system and user message in patterns
2024-11-12 12:56:03 -03:00
LUIS NOVO
8cb6d835fe
add ui improvements to embed and transformation dialogs
2024-11-11 18:17:08 -03:00
LUIS NOVO
817b1bc7f9
add initial embedding to the content graph
2024-11-11 17:47:50 -03:00
LUIS NOVO
01cf15e7d1
add check
2024-11-11 17:33:28 -03:00
LUIS NOVO
00f070a644
add async content processing
2024-11-11 17:32:35 -03:00
LUIS NOVO
2e2a4947b3
separate source and content graph
2024-11-10 13:30:03 -03:00
LUIS NOVO
d5be2b0d5b
make rag async
2024-11-09 16:03:41 -03:00
LUIS NOVO
183149014e
change model provisioning parameters
2024-11-08 16:08:54 -03:00
LUIS NOVO
99b8ada280
new ask model strategy
2024-11-08 16:08:13 -03:00
LUIS NOVO
418c67f69f
add search and rag functions in beta
2024-11-04 09:53:49 -03:00
LUIS NOVO
b4ba3ef4c8
change model provisioning strategy
2024-11-04 09:49:11 -03:00
LUIS NOVO
d9c0c93deb
improved typing
2024-11-01 22:50:27 -03:00
LUIS NOVO
7dc37a3ac7
model fixes
2024-11-01 22:43:33 -03:00
LUIS NOVO
223f1bdaf5
improve default_models
2024-11-01 22:38:21 -03:00
LUIS NOVO
212d3a33b0
improve object typing
2024-11-01 22:29:59 -03:00
LUIS NOVO
15048b0839
simplify model provisioning
2024-11-01 21:32:40 -03:00
LUIS NOVO
3b262a63f4
better model mgmt
2024-11-01 21:11:23 -03:00
LUIS NOVO
a9ac4a6dc8
model manager
2024-11-01 20:37:23 -03:00
LUIS NOVO
feabfaed01
remove defaultmodel from config file
2024-11-01 19:56:27 -03:00
LUIS NOVO
edf839cd1b
unused graphs
2024-11-01 19:08:33 -03:00
LUIS NOVO
0d4d9473b2
add transformation playground with model selection
2024-11-01 17:17:19 -03:00
LUIS NOVO
b89250d3ca
temporary fix to config cache
2024-11-01 17:06:10 -03:00
LUIS NOVO
796012f716
rename to patterns
2024-11-01 13:08:18 -03:00
LUIS NOVO
0876e94658
transformation folder change
2024-11-01 12:36:59 -03:00
LUIS NOVO
c65bf8ba12
rename model_name to model_id
2024-11-01 12:07:00 -03:00
LUIS NOVO
2aa453f73f
enable support for large context fallback
2024-10-30 15:39:45 -03:00
LUIS NOVO
859b7f6e7e
simplify the model selector
2024-10-30 14:30:29 -03:00
LUIS NOVO
8bb5db158f
implement model config
2024-10-30 14:09:24 -03:00