mirror of
https://github.com/lfnovo/open-notebook.git
synced 2026-04-29 03:50:04 +00:00
Some checks are pending
Development Build / extract-version (push) Waiting to run
Development Build / build-regular (push) Blocked by required conditions
Development Build / build-single (push) Blocked by required conditions
Development Build / summary (push) Blocked by required conditions
Tests / Backend Tests (push) Waiting to run
Tests / Frontend Tests (push) Waiting to run
* docs: update CLAUDE.md and user docs for error handling and podcast retry Add missing documentation for features introduced in v1.7.2 (#590) and v1.7.3 (#595): error classification system, global exception handlers, ConfigurationError, podcast failure recovery, and retry endpoint. * chore: update uv.lock
331 lines
16 KiB
Markdown
331 lines
16 KiB
Markdown
# AI Module
|
|
|
|
Model configuration, provisioning, and management for multi-provider AI integration via Esperanto.
|
|
|
|
## Purpose
|
|
|
|
Centralizes AI model lifecycle: database models for model metadata (provider, type), default model configuration, and factory for instantiating LLM/embedding/speech models at runtime with fallback logic.
|
|
|
|
## Architecture Overview
|
|
|
|
**Two-tier system**:
|
|
1. **Database models** (`Model`, `DefaultModels`): Metadata storage and default configuration
|
|
2. **ModelManager**: Factory for provisioning models with intelligent fallback (large context detection, config override)
|
|
|
|
All models use Esperanto library as provider abstraction (OpenAI, Anthropic, Google, Groq, Ollama, Mistral, DeepSeek, xAI, OpenRouter).
|
|
|
|
## Component Catalog
|
|
|
|
### models.py
|
|
|
|
#### Model (ObjectModel)
|
|
- Database record: name, provider, type (language/embedding/speech_to_text/text_to_speech), credential (optional link to Credential record)
|
|
- `get_models_by_type()`: Async query to fetch all models of a specific type
|
|
- `get_credential_obj()`: Fetches linked Credential object (if credential field set)
|
|
- `get_by_credential(credential_id)`: Class method to find all models linked to a credential
|
|
- Stores provider-model pairs for AI factory instantiation
|
|
|
|
#### DefaultModels (RecordModel)
|
|
- Singleton configuration record (record_id: `open_notebook:default_models`)
|
|
- Fields: default_chat_model, default_transformation_model, large_context_model, default_text_to_speech_model, default_speech_to_text_model, default_embedding_model, default_tools_model
|
|
- `get_instance()`: Always fetches fresh from database (overrides parent caching for real-time updates)
|
|
- Returns fresh instance on each call (no singleton cache)
|
|
|
|
#### ModelManager
|
|
- Stateless factory for instantiating AI models
|
|
- `get_model(model_id)`: Retrieves Model by ID; if model has linked credential, uses `credential.to_esperanto_config()` for provider config; otherwise falls back to env var provisioning via `key_provider`
|
|
- `get_defaults()`: Fetches DefaultModels configuration
|
|
- `get_default_model(model_type)`: Smart lookup (e.g., "chat" → default_chat_model, "transformation" → default_transformation_model with fallback to chat)
|
|
- `get_speech_to_text()`, `get_text_to_speech()`, `get_embedding_model()`: Type-specific convenience methods with assertions
|
|
- **Global instance**: `model_manager` singleton exported for use throughout app
|
|
|
|
### provision.py
|
|
|
|
#### provision_langchain_model()
|
|
- Factory for LangGraph nodes needing LLM provisioning
|
|
- **Smart fallback logic**:
|
|
- If tokens > 105,000: Use `large_context_model`
|
|
- Elif `model_id` specified: Use specific model
|
|
- Else: Use default model for type (e.g., "chat", "transformation")
|
|
- Returns LangChain-compatible model via `.to_langchain()`
|
|
- Logs model selection decision
|
|
|
|
### key_provider.py
|
|
|
|
#### API Key Provider (Credential→Env Fallback)
|
|
- **Purpose**: Provides API keys from database first, falls back to environment variables
|
|
- **Pattern**: Before Esperanto creates a model, keys are loaded from `Credential` records and set as environment variables
|
|
- **Integration point**: Called by `ModelManager.get_model()` as fallback when model has no linked credential
|
|
|
|
#### Key Functions
|
|
- `get_api_key(provider)`: Get single API key (DB first, then env var)
|
|
- `provision_provider_keys(provider)`: Set env vars from DB config for a provider
|
|
- `provision_all_keys()`: Load all provider keys from DB into env vars (useful at startup)
|
|
|
|
#### Provider Configuration Maps
|
|
- `PROVIDER_CONFIG`: Simple providers (openai, anthropic, google, groq, etc.)
|
|
- `VERTEX_CONFIG`: Google Vertex AI (project, location, credentials)
|
|
- `AZURE_CONFIG`: Azure OpenAI (api_key, endpoint, api_version, mode-specific endpoints)
|
|
- `OPENAI_COMPATIBLE_CONFIG`: Generic OpenAI-compatible (generic + mode-specific for LLM/EMBEDDING/STT/TTS)
|
|
|
|
## Common Patterns
|
|
|
|
- **Type dispatch**: Model.type field drives factory logic (4 model types)
|
|
- **Provider abstraction**: Esperanto handles provider differences; ModelManager unaware of provider specifics
|
|
- **Fresh defaults**: DefaultModels.get_instance() always fetches from database (not cached) for live config updates
|
|
- **Config override**: provision_langchain_model() accepts kwargs passed to AIFactory.create_* methods
|
|
- **Token-based selection**: provision_langchain_model() detects large contexts and upgrades model automatically
|
|
- **Type assertions**: get_speech_to_text(), get_embedding_model() assert returned type (safety check)
|
|
- **Credential→Env fallback**: If model has linked credential, config from `credential.to_esperanto_config()` is used directly; otherwise keys checked in database via key_provider, then environment variables; enables UI-based key management while maintaining backward compatibility
|
|
|
|
## Key Dependencies
|
|
|
|
- `esperanto`: AIFactory.create_language(), create_embedding(), create_speech_to_text(), create_text_to_speech()
|
|
- `open_notebook.database.repository`: repo_query, ensure_record_id
|
|
- `open_notebook.domain.base`: ObjectModel, RecordModel base classes
|
|
- `open_notebook.domain.credential`: Credential for database-stored API keys
|
|
- `open_notebook.utils`: token_count() for context size detection
|
|
- `loguru`: Logging for model selection decisions
|
|
|
|
## Important Quirks & Gotchas
|
|
|
|
- **Token counting rough estimate**: provision_langchain_model() uses token_count() which estimates via cl100k_base encoding (may differ 5-10% from actual model)
|
|
- **Large context threshold hard-coded**: 105,000 token threshold for large_context_model upgrade (not configurable)
|
|
- **DefaultModels.get_instance() fresh fetch**: Intentionally bypasses parent singleton cache to pick up live config changes; creates new instance each call
|
|
- **Type-specific getters use assertions**: get_speech_to_text() asserts isinstance (catches misconfiguration early)
|
|
- **ConfigurationError on missing model**: ModelManager.get_model() and provision_langchain_model() raise `ConfigurationError` (not ValueError) when a model is not found or not configured, so the global exception handler returns HTTP 422 with a descriptive message
|
|
- **Esperanto caching**: Actual model instances cached by Esperanto (not by ModelManager); ModelManager stateless
|
|
- **Fallback chain specificity**: "transformation" type falls back to default_chat_model if not explicitly set (convention-based)
|
|
- **kwargs passed through**: provision_langchain_model() passes kwargs to AIFactory but doesn't validate what's accepted
|
|
- **Key provider sets env vars**: `provision_provider_keys()` modifies `os.environ` to inject DB-stored keys (from `Credential` records); Esperanto reads from env vars (only used as fallback when model has no linked credential)
|
|
|
|
## How to Extend
|
|
|
|
1. **Add new model type**: Add type string to Model.type enum, add create_* method in AIFactory, handle in ModelManager.get_model()
|
|
2. **Add new default configuration**: Extend DefaultModels with new field (e.g., default_vision_model), add getter in ModelManager
|
|
3. **Change fallback logic**: Modify provision_langchain_model() token threshold or fallback chain
|
|
4. **Add model filtering**: Extend Model.get_models_by_type() with additional filters (e.g., by provider)
|
|
5. **Implement model caching**: Wrap ModelManager methods with functools.lru_cache (be aware of kwargs mutability)
|
|
|
|
## Usage Example
|
|
|
|
```python
|
|
from open_notebook.ai.models import model_manager
|
|
|
|
# Get default chat model
|
|
chat_model = await model_manager.get_default_model("chat")
|
|
|
|
# Get specific model by ID
|
|
embedding_model = await model_manager.get_model("model:openai_embedding")
|
|
|
|
# Get embedding model with config override
|
|
embedding_model = await model_manager.get_embedding_model(temperature=0.1)
|
|
|
|
# Provision model for LangGraph (auto-detects large context)
|
|
from open_notebook.ai.provision import provision_langchain_model
|
|
langchain_model = await provision_langchain_model(
|
|
content=long_text,
|
|
model_id=None, # Use default
|
|
default_type="chat",
|
|
temperature=0.7
|
|
)
|
|
```
|
|
|
|
---
|
|
|
|
## Connection Testing (connection_tester.py)
|
|
|
|
### Purpose
|
|
|
|
Provides functionality to test if a provider's API key is valid by making minimal API calls. Used by the API Configuration UI to validate user-entered credentials before saving.
|
|
|
|
### test_provider_connection()
|
|
|
|
Main entry point for testing provider connectivity.
|
|
|
|
```python
|
|
async def test_provider_connection(
|
|
provider: str, model_type: str = "language",
|
|
config_id: Optional[str] = None
|
|
) -> Tuple[bool, str]
|
|
```
|
|
|
|
**Returns**: `(success: bool, message: str)` - Success status and human-readable message.
|
|
|
|
**Flow**:
|
|
1. If `config_id` provided: Loads credential via `Credential.get(config_id)`, uses `credential.to_esperanto_config()` for provider config
|
|
2. Looks up test model from `TEST_MODELS` dict
|
|
3. For URL-based providers (ollama, openai_compatible): Tests server connectivity
|
|
4. For Azure: Tests `/openai/models` endpoint with api_version
|
|
5. For API-based providers: Creates minimal model via Esperanto and makes test call
|
|
6. Returns user-friendly error messages for common failures
|
|
|
|
### test_individual_model()
|
|
|
|
Tests a specific Model instance by loading its linked credential (if any) and making a minimal API call.
|
|
|
|
### TEST_MODELS Configuration
|
|
|
|
Maps each provider to `(model_name, model_type)` for testing:
|
|
|
|
```python
|
|
TEST_MODELS = {
|
|
"openai": ("gpt-3.5-turbo", "language"),
|
|
"anthropic": ("claude-3-haiku-20240307", "language"),
|
|
"google": ("gemini-1.5-flash", "language"),
|
|
"groq": ("llama-3.1-8b-instant", "language"),
|
|
"voyage": ("voyage-3-lite", "embedding"),
|
|
"elevenlabs": ("eleven_multilingual_v2", "text_to_speech"),
|
|
"ollama": (None, "language"), # Dynamic
|
|
# ... more providers
|
|
}
|
|
```
|
|
|
|
### Special Provider Handlers
|
|
|
|
- **`_test_ollama_connection(base_url)`**: Tests Ollama server via `/api/tags` endpoint, returns model count
|
|
- **`_test_openai_compatible_connection(base_url, api_key)`**: Tests OpenAI-compatible servers via `/models` endpoint
|
|
- **`_get_ollama_models(base_url)`**: Fetches available models from Ollama server
|
|
|
|
### Error Message Normalization
|
|
|
|
The tester normalizes error messages for user-friendly display:
|
|
- `401/unauthorized` -> "Invalid API key"
|
|
- `403/forbidden` -> "API key lacks required permissions"
|
|
- `rate limit` -> "Rate limited - but connection works" (success)
|
|
- `model not found` -> "API key valid (test model not available)" (success)
|
|
- Connection/timeout errors -> Helpful troubleshooting messages
|
|
|
|
---
|
|
|
|
## Key Provider (key_provider.py)
|
|
|
|
### Purpose
|
|
|
|
Unified interface for retrieving API keys with database-first, environment-fallback strategy. Enables UI-based key management while maintaining backward compatibility with `.env` files. Used as fallback when models don't have a directly linked credential.
|
|
|
|
### Core Functions
|
|
|
|
#### get_api_key(provider)
|
|
|
|
```python
|
|
async def get_api_key(provider: str) -> Optional[str]
|
|
```
|
|
|
|
Gets API key for a provider. Checks database (`Credential` records) first, then environment variable.
|
|
|
|
**Fallback Chain**:
|
|
1. Query `Credential` records from database for the given provider
|
|
2. Get api_key from default credential
|
|
3. Handle `SecretStr` (call `.get_secret_value()`) vs regular strings
|
|
4. If DB value exists and is non-empty, return it
|
|
5. Otherwise, return `os.environ.get(env_var)`
|
|
|
|
#### provision_provider_keys(provider)
|
|
|
|
```python
|
|
async def provision_provider_keys(provider: str) -> bool
|
|
```
|
|
|
|
Main entry point for DB->Env fallback. Sets environment variables from database config for a provider. Called before model provisioning to ensure Esperanto can read keys from env vars.
|
|
|
|
**Returns**: `True` if any keys were set from database.
|
|
|
|
**Usage**:
|
|
```python
|
|
# Before creating a model, ensure DB keys are in env vars
|
|
await provision_provider_keys("openai")
|
|
model = AIFactory.create_language(model_name="gpt-4", provider="openai")
|
|
```
|
|
|
|
#### provision_all_keys()
|
|
|
|
```python
|
|
async def provision_all_keys() -> dict[str, bool]
|
|
```
|
|
|
|
Provisions all providers at once. Useful at application startup.
|
|
|
|
### Provider Configuration Maps
|
|
|
|
#### PROVIDER_CONFIG (Simple Providers)
|
|
|
|
Single-field providers with API key only:
|
|
|
|
```python
|
|
PROVIDER_CONFIG = {
|
|
"openai": {"env_var": "OPENAI_API_KEY", "config_field": "openai_api_key"},
|
|
"anthropic": {"env_var": "ANTHROPIC_API_KEY", "config_field": "anthropic_api_key"},
|
|
"google": {"env_var": "GOOGLE_API_KEY", "config_field": "google_api_key"},
|
|
"groq": {"env_var": "GROQ_API_KEY", "config_field": "groq_api_key"},
|
|
"mistral": {"env_var": "MISTRAL_API_KEY", "config_field": "mistral_api_key"},
|
|
"deepseek": {"env_var": "DEEPSEEK_API_KEY", "config_field": "deepseek_api_key"},
|
|
"xai": {"env_var": "XAI_API_KEY", "config_field": "xai_api_key"},
|
|
"openrouter": {"env_var": "OPENROUTER_API_KEY", "config_field": "openrouter_api_key"},
|
|
"voyage": {"env_var": "VOYAGE_API_KEY", "config_field": "voyage_api_key"},
|
|
"elevenlabs": {"env_var": "ELEVENLABS_API_KEY", "config_field": "elevenlabs_api_key"},
|
|
"ollama": {"env_var": "OLLAMA_API_BASE", "config_field": "ollama_api_base"},
|
|
}
|
|
```
|
|
|
|
#### VERTEX_CONFIG (Google Vertex AI)
|
|
|
|
Multi-field configuration for Vertex AI:
|
|
|
|
```python
|
|
VERTEX_CONFIG = {
|
|
"project": {"env_var": "VERTEX_PROJECT", "config_field": "vertex_project"},
|
|
"location": {"env_var": "VERTEX_LOCATION", "config_field": "vertex_location"},
|
|
"credentials": {"env_var": "GOOGLE_APPLICATION_CREDENTIALS", "config_field": "google_application_credentials"},
|
|
}
|
|
```
|
|
|
|
#### AZURE_CONFIG (Azure OpenAI)
|
|
|
|
Generic and mode-specific endpoints for Azure:
|
|
|
|
```python
|
|
AZURE_CONFIG = {
|
|
"api_key": {"env_var": "AZURE_OPENAI_API_KEY", "config_field": "azure_openai_api_key"},
|
|
"api_version": {"env_var": "AZURE_OPENAI_API_VERSION", "config_field": "azure_openai_api_version"},
|
|
"endpoint": {"env_var": "AZURE_OPENAI_ENDPOINT", "config_field": "azure_openai_endpoint"},
|
|
# Mode-specific endpoints
|
|
"endpoint_llm": {"env_var": "AZURE_OPENAI_ENDPOINT_LLM", "config_field": "azure_openai_endpoint_llm"},
|
|
"endpoint_embedding": {"env_var": "AZURE_OPENAI_ENDPOINT_EMBEDDING", "config_field": "azure_openai_endpoint_embedding"},
|
|
"endpoint_stt": {"env_var": "AZURE_OPENAI_ENDPOINT_STT", "config_field": "azure_openai_endpoint_stt"},
|
|
"endpoint_tts": {"env_var": "AZURE_OPENAI_ENDPOINT_TTS", "config_field": "azure_openai_endpoint_tts"},
|
|
}
|
|
```
|
|
|
|
#### OPENAI_COMPATIBLE_CONFIG
|
|
|
|
Generic and mode-specific configuration for OpenAI-compatible providers:
|
|
|
|
```python
|
|
OPENAI_COMPATIBLE_CONFIG = {
|
|
# Generic
|
|
"api_key": {"env_var": "OPENAI_COMPATIBLE_API_KEY", "config_field": "openai_compatible_api_key"},
|
|
"base_url": {"env_var": "OPENAI_COMPATIBLE_BASE_URL", "config_field": "openai_compatible_base_url"},
|
|
# Mode-specific: LLM, Embedding, STT, TTS
|
|
"api_key_llm": {"env_var": "OPENAI_COMPATIBLE_API_KEY_LLM", "config_field": "openai_compatible_api_key_llm"},
|
|
"base_url_llm": {"env_var": "OPENAI_COMPATIBLE_BASE_URL_LLM", "config_field": "openai_compatible_base_url_llm"},
|
|
# ... similar for embedding, stt, tts
|
|
}
|
|
```
|
|
|
|
### Internal Helper Functions
|
|
|
|
- **`_provision_simple_provider(provider)`**: Sets single env var for simple providers
|
|
- **`_provision_vertex()`**: Sets all Vertex AI env vars
|
|
- **`_provision_azure()`**: Sets all Azure OpenAI env vars (handles SecretStr)
|
|
- **`_provision_openai_compatible()`**: Sets all OpenAI-compatible env vars
|
|
|
|
### Integration with ModelManager
|
|
|
|
The credential system integrates with model provisioning in two ways:
|
|
|
|
1. **Credential-linked models** (preferred): Model has `credential` field pointing to a Credential record. `ModelManager.get_model()` calls `credential.to_esperanto_config()` and passes config directly to Esperanto's `AIFactory.create_*` methods
|
|
2. **Env var fallback**: If model has no linked credential, `provision_provider_keys(provider)` sets env vars from DB credentials; Esperanto reads from env vars
|
|
3. **ConnectionTester** loads Credential directly via `Credential.get(config_id)` for testing
|
|
|
|
The credential-linked approach is preferred as it allows multiple credentials per provider and avoids env var mutation.
|