open-notebook/open_notebook/ai/CLAUDE.md
Luis Novo 924cd88494
Some checks are pending
Development Build / extract-version (push) Waiting to run
Development Build / build-regular (push) Blocked by required conditions
Development Build / build-single (push) Blocked by required conditions
Development Build / summary (push) Blocked by required conditions
Tests / Backend Tests (push) Waiting to run
Tests / Frontend Tests (push) Waiting to run
docs: update documentation for error handling and podcast retry (#599)
* docs: update CLAUDE.md and user docs for error handling and podcast retry

Add missing documentation for features introduced in v1.7.2 (#590) and
v1.7.3 (#595): error classification system, global exception handlers,
ConfigurationError, podcast failure recovery, and retry endpoint.

* chore: update uv.lock
2026-02-18 09:56:04 -03:00

331 lines
16 KiB
Markdown

# AI Module
Model configuration, provisioning, and management for multi-provider AI integration via Esperanto.
## Purpose
Centralizes AI model lifecycle: database models for model metadata (provider, type), default model configuration, and factory for instantiating LLM/embedding/speech models at runtime with fallback logic.
## Architecture Overview
**Two-tier system**:
1. **Database models** (`Model`, `DefaultModels`): Metadata storage and default configuration
2. **ModelManager**: Factory for provisioning models with intelligent fallback (large context detection, config override)
All models use Esperanto library as provider abstraction (OpenAI, Anthropic, Google, Groq, Ollama, Mistral, DeepSeek, xAI, OpenRouter).
## Component Catalog
### models.py
#### Model (ObjectModel)
- Database record: name, provider, type (language/embedding/speech_to_text/text_to_speech), credential (optional link to Credential record)
- `get_models_by_type()`: Async query to fetch all models of a specific type
- `get_credential_obj()`: Fetches linked Credential object (if credential field set)
- `get_by_credential(credential_id)`: Class method to find all models linked to a credential
- Stores provider-model pairs for AI factory instantiation
#### DefaultModels (RecordModel)
- Singleton configuration record (record_id: `open_notebook:default_models`)
- Fields: default_chat_model, default_transformation_model, large_context_model, default_text_to_speech_model, default_speech_to_text_model, default_embedding_model, default_tools_model
- `get_instance()`: Always fetches fresh from database (overrides parent caching for real-time updates)
- Returns fresh instance on each call (no singleton cache)
#### ModelManager
- Stateless factory for instantiating AI models
- `get_model(model_id)`: Retrieves Model by ID; if model has linked credential, uses `credential.to_esperanto_config()` for provider config; otherwise falls back to env var provisioning via `key_provider`
- `get_defaults()`: Fetches DefaultModels configuration
- `get_default_model(model_type)`: Smart lookup (e.g., "chat" → default_chat_model, "transformation" → default_transformation_model with fallback to chat)
- `get_speech_to_text()`, `get_text_to_speech()`, `get_embedding_model()`: Type-specific convenience methods with assertions
- **Global instance**: `model_manager` singleton exported for use throughout app
### provision.py
#### provision_langchain_model()
- Factory for LangGraph nodes needing LLM provisioning
- **Smart fallback logic**:
- If tokens > 105,000: Use `large_context_model`
- Elif `model_id` specified: Use specific model
- Else: Use default model for type (e.g., "chat", "transformation")
- Returns LangChain-compatible model via `.to_langchain()`
- Logs model selection decision
### key_provider.py
#### API Key Provider (Credential→Env Fallback)
- **Purpose**: Provides API keys from database first, falls back to environment variables
- **Pattern**: Before Esperanto creates a model, keys are loaded from `Credential` records and set as environment variables
- **Integration point**: Called by `ModelManager.get_model()` as fallback when model has no linked credential
#### Key Functions
- `get_api_key(provider)`: Get single API key (DB first, then env var)
- `provision_provider_keys(provider)`: Set env vars from DB config for a provider
- `provision_all_keys()`: Load all provider keys from DB into env vars (useful at startup)
#### Provider Configuration Maps
- `PROVIDER_CONFIG`: Simple providers (openai, anthropic, google, groq, etc.)
- `VERTEX_CONFIG`: Google Vertex AI (project, location, credentials)
- `AZURE_CONFIG`: Azure OpenAI (api_key, endpoint, api_version, mode-specific endpoints)
- `OPENAI_COMPATIBLE_CONFIG`: Generic OpenAI-compatible (generic + mode-specific for LLM/EMBEDDING/STT/TTS)
## Common Patterns
- **Type dispatch**: Model.type field drives factory logic (4 model types)
- **Provider abstraction**: Esperanto handles provider differences; ModelManager unaware of provider specifics
- **Fresh defaults**: DefaultModels.get_instance() always fetches from database (not cached) for live config updates
- **Config override**: provision_langchain_model() accepts kwargs passed to AIFactory.create_* methods
- **Token-based selection**: provision_langchain_model() detects large contexts and upgrades model automatically
- **Type assertions**: get_speech_to_text(), get_embedding_model() assert returned type (safety check)
- **Credential→Env fallback**: If model has linked credential, config from `credential.to_esperanto_config()` is used directly; otherwise keys checked in database via key_provider, then environment variables; enables UI-based key management while maintaining backward compatibility
## Key Dependencies
- `esperanto`: AIFactory.create_language(), create_embedding(), create_speech_to_text(), create_text_to_speech()
- `open_notebook.database.repository`: repo_query, ensure_record_id
- `open_notebook.domain.base`: ObjectModel, RecordModel base classes
- `open_notebook.domain.credential`: Credential for database-stored API keys
- `open_notebook.utils`: token_count() for context size detection
- `loguru`: Logging for model selection decisions
## Important Quirks & Gotchas
- **Token counting rough estimate**: provision_langchain_model() uses token_count() which estimates via cl100k_base encoding (may differ 5-10% from actual model)
- **Large context threshold hard-coded**: 105,000 token threshold for large_context_model upgrade (not configurable)
- **DefaultModels.get_instance() fresh fetch**: Intentionally bypasses parent singleton cache to pick up live config changes; creates new instance each call
- **Type-specific getters use assertions**: get_speech_to_text() asserts isinstance (catches misconfiguration early)
- **ConfigurationError on missing model**: ModelManager.get_model() and provision_langchain_model() raise `ConfigurationError` (not ValueError) when a model is not found or not configured, so the global exception handler returns HTTP 422 with a descriptive message
- **Esperanto caching**: Actual model instances cached by Esperanto (not by ModelManager); ModelManager stateless
- **Fallback chain specificity**: "transformation" type falls back to default_chat_model if not explicitly set (convention-based)
- **kwargs passed through**: provision_langchain_model() passes kwargs to AIFactory but doesn't validate what's accepted
- **Key provider sets env vars**: `provision_provider_keys()` modifies `os.environ` to inject DB-stored keys (from `Credential` records); Esperanto reads from env vars (only used as fallback when model has no linked credential)
## How to Extend
1. **Add new model type**: Add type string to Model.type enum, add create_* method in AIFactory, handle in ModelManager.get_model()
2. **Add new default configuration**: Extend DefaultModels with new field (e.g., default_vision_model), add getter in ModelManager
3. **Change fallback logic**: Modify provision_langchain_model() token threshold or fallback chain
4. **Add model filtering**: Extend Model.get_models_by_type() with additional filters (e.g., by provider)
5. **Implement model caching**: Wrap ModelManager methods with functools.lru_cache (be aware of kwargs mutability)
## Usage Example
```python
from open_notebook.ai.models import model_manager
# Get default chat model
chat_model = await model_manager.get_default_model("chat")
# Get specific model by ID
embedding_model = await model_manager.get_model("model:openai_embedding")
# Get embedding model with config override
embedding_model = await model_manager.get_embedding_model(temperature=0.1)
# Provision model for LangGraph (auto-detects large context)
from open_notebook.ai.provision import provision_langchain_model
langchain_model = await provision_langchain_model(
content=long_text,
model_id=None, # Use default
default_type="chat",
temperature=0.7
)
```
---
## Connection Testing (connection_tester.py)
### Purpose
Provides functionality to test if a provider's API key is valid by making minimal API calls. Used by the API Configuration UI to validate user-entered credentials before saving.
### test_provider_connection()
Main entry point for testing provider connectivity.
```python
async def test_provider_connection(
provider: str, model_type: str = "language",
config_id: Optional[str] = None
) -> Tuple[bool, str]
```
**Returns**: `(success: bool, message: str)` - Success status and human-readable message.
**Flow**:
1. If `config_id` provided: Loads credential via `Credential.get(config_id)`, uses `credential.to_esperanto_config()` for provider config
2. Looks up test model from `TEST_MODELS` dict
3. For URL-based providers (ollama, openai_compatible): Tests server connectivity
4. For Azure: Tests `/openai/models` endpoint with api_version
5. For API-based providers: Creates minimal model via Esperanto and makes test call
6. Returns user-friendly error messages for common failures
### test_individual_model()
Tests a specific Model instance by loading its linked credential (if any) and making a minimal API call.
### TEST_MODELS Configuration
Maps each provider to `(model_name, model_type)` for testing:
```python
TEST_MODELS = {
"openai": ("gpt-3.5-turbo", "language"),
"anthropic": ("claude-3-haiku-20240307", "language"),
"google": ("gemini-1.5-flash", "language"),
"groq": ("llama-3.1-8b-instant", "language"),
"voyage": ("voyage-3-lite", "embedding"),
"elevenlabs": ("eleven_multilingual_v2", "text_to_speech"),
"ollama": (None, "language"), # Dynamic
# ... more providers
}
```
### Special Provider Handlers
- **`_test_ollama_connection(base_url)`**: Tests Ollama server via `/api/tags` endpoint, returns model count
- **`_test_openai_compatible_connection(base_url, api_key)`**: Tests OpenAI-compatible servers via `/models` endpoint
- **`_get_ollama_models(base_url)`**: Fetches available models from Ollama server
### Error Message Normalization
The tester normalizes error messages for user-friendly display:
- `401/unauthorized` -> "Invalid API key"
- `403/forbidden` -> "API key lacks required permissions"
- `rate limit` -> "Rate limited - but connection works" (success)
- `model not found` -> "API key valid (test model not available)" (success)
- Connection/timeout errors -> Helpful troubleshooting messages
---
## Key Provider (key_provider.py)
### Purpose
Unified interface for retrieving API keys with database-first, environment-fallback strategy. Enables UI-based key management while maintaining backward compatibility with `.env` files. Used as fallback when models don't have a directly linked credential.
### Core Functions
#### get_api_key(provider)
```python
async def get_api_key(provider: str) -> Optional[str]
```
Gets API key for a provider. Checks database (`Credential` records) first, then environment variable.
**Fallback Chain**:
1. Query `Credential` records from database for the given provider
2. Get api_key from default credential
3. Handle `SecretStr` (call `.get_secret_value()`) vs regular strings
4. If DB value exists and is non-empty, return it
5. Otherwise, return `os.environ.get(env_var)`
#### provision_provider_keys(provider)
```python
async def provision_provider_keys(provider: str) -> bool
```
Main entry point for DB->Env fallback. Sets environment variables from database config for a provider. Called before model provisioning to ensure Esperanto can read keys from env vars.
**Returns**: `True` if any keys were set from database.
**Usage**:
```python
# Before creating a model, ensure DB keys are in env vars
await provision_provider_keys("openai")
model = AIFactory.create_language(model_name="gpt-4", provider="openai")
```
#### provision_all_keys()
```python
async def provision_all_keys() -> dict[str, bool]
```
Provisions all providers at once. Useful at application startup.
### Provider Configuration Maps
#### PROVIDER_CONFIG (Simple Providers)
Single-field providers with API key only:
```python
PROVIDER_CONFIG = {
"openai": {"env_var": "OPENAI_API_KEY", "config_field": "openai_api_key"},
"anthropic": {"env_var": "ANTHROPIC_API_KEY", "config_field": "anthropic_api_key"},
"google": {"env_var": "GOOGLE_API_KEY", "config_field": "google_api_key"},
"groq": {"env_var": "GROQ_API_KEY", "config_field": "groq_api_key"},
"mistral": {"env_var": "MISTRAL_API_KEY", "config_field": "mistral_api_key"},
"deepseek": {"env_var": "DEEPSEEK_API_KEY", "config_field": "deepseek_api_key"},
"xai": {"env_var": "XAI_API_KEY", "config_field": "xai_api_key"},
"openrouter": {"env_var": "OPENROUTER_API_KEY", "config_field": "openrouter_api_key"},
"voyage": {"env_var": "VOYAGE_API_KEY", "config_field": "voyage_api_key"},
"elevenlabs": {"env_var": "ELEVENLABS_API_KEY", "config_field": "elevenlabs_api_key"},
"ollama": {"env_var": "OLLAMA_API_BASE", "config_field": "ollama_api_base"},
}
```
#### VERTEX_CONFIG (Google Vertex AI)
Multi-field configuration for Vertex AI:
```python
VERTEX_CONFIG = {
"project": {"env_var": "VERTEX_PROJECT", "config_field": "vertex_project"},
"location": {"env_var": "VERTEX_LOCATION", "config_field": "vertex_location"},
"credentials": {"env_var": "GOOGLE_APPLICATION_CREDENTIALS", "config_field": "google_application_credentials"},
}
```
#### AZURE_CONFIG (Azure OpenAI)
Generic and mode-specific endpoints for Azure:
```python
AZURE_CONFIG = {
"api_key": {"env_var": "AZURE_OPENAI_API_KEY", "config_field": "azure_openai_api_key"},
"api_version": {"env_var": "AZURE_OPENAI_API_VERSION", "config_field": "azure_openai_api_version"},
"endpoint": {"env_var": "AZURE_OPENAI_ENDPOINT", "config_field": "azure_openai_endpoint"},
# Mode-specific endpoints
"endpoint_llm": {"env_var": "AZURE_OPENAI_ENDPOINT_LLM", "config_field": "azure_openai_endpoint_llm"},
"endpoint_embedding": {"env_var": "AZURE_OPENAI_ENDPOINT_EMBEDDING", "config_field": "azure_openai_endpoint_embedding"},
"endpoint_stt": {"env_var": "AZURE_OPENAI_ENDPOINT_STT", "config_field": "azure_openai_endpoint_stt"},
"endpoint_tts": {"env_var": "AZURE_OPENAI_ENDPOINT_TTS", "config_field": "azure_openai_endpoint_tts"},
}
```
#### OPENAI_COMPATIBLE_CONFIG
Generic and mode-specific configuration for OpenAI-compatible providers:
```python
OPENAI_COMPATIBLE_CONFIG = {
# Generic
"api_key": {"env_var": "OPENAI_COMPATIBLE_API_KEY", "config_field": "openai_compatible_api_key"},
"base_url": {"env_var": "OPENAI_COMPATIBLE_BASE_URL", "config_field": "openai_compatible_base_url"},
# Mode-specific: LLM, Embedding, STT, TTS
"api_key_llm": {"env_var": "OPENAI_COMPATIBLE_API_KEY_LLM", "config_field": "openai_compatible_api_key_llm"},
"base_url_llm": {"env_var": "OPENAI_COMPATIBLE_BASE_URL_LLM", "config_field": "openai_compatible_base_url_llm"},
# ... similar for embedding, stt, tts
}
```
### Internal Helper Functions
- **`_provision_simple_provider(provider)`**: Sets single env var for simple providers
- **`_provision_vertex()`**: Sets all Vertex AI env vars
- **`_provision_azure()`**: Sets all Azure OpenAI env vars (handles SecretStr)
- **`_provision_openai_compatible()`**: Sets all OpenAI-compatible env vars
### Integration with ModelManager
The credential system integrates with model provisioning in two ways:
1. **Credential-linked models** (preferred): Model has `credential` field pointing to a Credential record. `ModelManager.get_model()` calls `credential.to_esperanto_config()` and passes config directly to Esperanto's `AIFactory.create_*` methods
2. **Env var fallback**: If model has no linked credential, `provision_provider_keys(provider)` sets env vars from DB credentials; Esperanto reads from env vars
3. **ConnectionTester** loads Credential directly via `Credential.get(config_id)` for testing
The credential-linked approach is preferred as it allows multiple credentials per provider and avoids env var mutation.