open-notebook/open_notebook/ai/CLAUDE.md

# AI Module

Model configuration, provisioning, and management for multi-provider AI integration via Esperanto.

## Purpose

Centralizes AI model lifecycle: database models for model metadata (provider, type), default model configuration, and factory for instantiating LLM/embedding/speech models at runtime with fallback logic.

## Architecture Overview

**Two-tier system**:
1. **Database models** (`Model`, `DefaultModels`): Metadata storage and default configuration
2. **ModelManager**: Factory for provisioning models with intelligent fallback (large context detection, config override)

All models use Esperanto library as provider abstraction (OpenAI, Anthropic, Google, Groq, Ollama, Mistral, DeepSeek, xAI, OpenRouter).

## Component Catalog

### models.py

#### Model (ObjectModel)
- Database record: name, provider, type (language/embedding/speech_to_text/text_to_speech), credential (optional link to Credential record)
- `get_models_by_type()`: Async query to fetch all models of a specific type
- `get_credential_obj()`: Fetches linked Credential object (if credential field set)
- `get_by_credential(credential_id)`: Class method to find all models linked to a credential
- Stores provider-model pairs for AI factory instantiation

#### DefaultModels (RecordModel)
- Singleton configuration record (record_id: `open_notebook:default_models`)
- Fields: default_chat_model, default_transformation_model, large_context_model, default_text_to_speech_model, default_speech_to_text_model, default_embedding_model, default_tools_model
- `get_instance()`: Always fetches fresh from database (overrides parent caching for real-time updates)
- Returns fresh instance on each call (no singleton cache)

#### ModelManager
- Stateless factory for instantiating AI models
- `get_model(model_id)`: Retrieves Model by ID; if model has linked credential, uses `credential.to_esperanto_config()` for provider config; otherwise falls back to env var provisioning via `key_provider`
- `get_defaults()`: Fetches DefaultModels configuration
- `get_default_model(model_type)`: Smart lookup (e.g., "chat" → default_chat_model, "transformation" → default_transformation_model with fallback to chat)
- `get_speech_to_text()`, `get_text_to_speech()`, `get_embedding_model()`: Type-specific convenience methods with assertions
- **Global instance**: `model_manager` singleton exported for use throughout app

### provision.py

#### provision_langchain_model()
- Factory for LangGraph nodes needing LLM provisioning
- **Smart fallback logic**:
  - If tokens > 105,000: Use `large_context_model`
  - Elif `model_id` specified: Use specific model
  - Else: Use default model for type (e.g., "chat", "transformation")
- Returns LangChain-compatible model via `.to_langchain()`
- Logs model selection decision

### key_provider.py

#### API Key Provider (Credential→Env Fallback)
- **Purpose**: Provides API keys from database first, falls back to environment variables
- **Pattern**: Before Esperanto creates a model, keys are loaded from `Credential` records and set as environment variables
- **Integration point**: Called by `ModelManager.get_model()` as fallback when model has no linked credential

#### Key Functions
- `get_api_key(provider)`: Get single API key (DB first, then env var)
- `provision_provider_keys(provider)`: Set env vars from DB config for a provider
- `provision_all_keys()`: Load all provider keys from DB into env vars (useful at startup)

#### Provider Configuration Maps
- `PROVIDER_CONFIG`: Simple providers (openai, anthropic, google, groq, etc.)
- `VERTEX_CONFIG`: Google Vertex AI (project, location, credentials)
- `AZURE_CONFIG`: Azure OpenAI (api_key, endpoint, api_version, mode-specific endpoints)
- `OPENAI_COMPATIBLE_CONFIG`: Generic OpenAI-compatible (generic + mode-specific for LLM/EMBEDDING/STT/TTS)

## Common Patterns

- **Type dispatch**: Model.type field drives factory logic (4 model types)
- **Provider abstraction**: Esperanto handles provider differences; ModelManager unaware of provider specifics
- **Fresh defaults**: DefaultModels.get_instance() always fetches from database (not cached) for live config updates
- **Config override**: provision_langchain_model() accepts kwargs passed to AIFactory.create_* methods
- **Token-based selection**: provision_langchain_model() detects large contexts and upgrades model automatically
- **Type assertions**: get_speech_to_text(), get_embedding_model() assert returned type (safety check)
- **Credential→Env fallback**: If model has linked credential, config from `credential.to_esperanto_config()` is used directly; otherwise keys checked in database via key_provider, then environment variables; enables UI-based key management while maintaining backward compatibility

## Key Dependencies

- `esperanto`: AIFactory.create_language(), create_embedding(), create_speech_to_text(), create_text_to_speech()
- `open_notebook.database.repository`: repo_query, ensure_record_id
- `open_notebook.domain.base`: ObjectModel, RecordModel base classes
- `open_notebook.domain.credential`: Credential for database-stored API keys
- `open_notebook.utils`: token_count() for context size detection
- `loguru`: Logging for model selection decisions

## Important Quirks & Gotchas

- **Token counting rough estimate**: provision_langchain_model() uses token_count() which estimates via cl100k_base encoding (may differ 5-10% from actual model)
- **Large context threshold hard-coded**: 105,000 token threshold for large_context_model upgrade (not configurable)
- **DefaultModels.get_instance() fresh fetch**: Intentionally bypasses parent singleton cache to pick up live config changes; creates new instance each call
- **Type-specific getters use assertions**: get_speech_to_text() asserts isinstance (catches misconfiguration early)
- **ConfigurationError on missing model**: ModelManager.get_model() and provision_langchain_model() raise `ConfigurationError` (not ValueError) when a model is not found or not configured, so the global exception handler returns HTTP 422 with a descriptive message
- **Esperanto caching**: Actual model instances cached by Esperanto (not by ModelManager); ModelManager stateless
- **Fallback chain specificity**: "transformation" type falls back to default_chat_model if not explicitly set (convention-based)
- **kwargs passed through**: provision_langchain_model() passes kwargs to AIFactory but doesn't validate what's accepted
- **Key provider sets env vars**: `provision_provider_keys()` modifies `os.environ` to inject DB-stored keys (from `Credential` records); Esperanto reads from env vars (only used as fallback when model has no linked credential)

## How to Extend

1. **Add new model type**: Add type string to Model.type enum, add create_* method in AIFactory, handle in ModelManager.get_model()
2. **Add new default configuration**: Extend DefaultModels with new field (e.g., default_vision_model), add getter in ModelManager
3. **Change fallback logic**: Modify provision_langchain_model() token threshold or fallback chain
4. **Add model filtering**: Extend Model.get_models_by_type() with additional filters (e.g., by provider)
5. **Implement model caching**: Wrap ModelManager methods with functools.lru_cache (be aware of kwargs mutability)

## Usage Example

```python
from open_notebook.ai.models import model_manager

# Get default chat model
chat_model = await model_manager.get_default_model("chat")

# Get specific model by ID
embedding_model = await model_manager.get_model("model:openai_embedding")

# Get embedding model with config override
embedding_model = await model_manager.get_embedding_model(temperature=0.1)

# Provision model for LangGraph (auto-detects large context)
from open_notebook.ai.provision import provision_langchain_model
langchain_model = await provision_langchain_model(
    content=long_text,
    model_id=None,  # Use default
    default_type="chat",
    temperature=0.7
)
```

---

## Connection Testing (connection_tester.py)

### Purpose

Provides functionality to test if a provider's API key is valid by making minimal API calls. Used by the API Configuration UI to validate user-entered credentials before saving.

### test_provider_connection()

Main entry point for testing provider connectivity.

```python
async def test_provider_connection(
    provider: str, model_type: str = "language",
    config_id: Optional[str] = None
) -> Tuple[bool, str]
```

**Returns**: `(success: bool, message: str)` - Success status and human-readable message.

**Flow**:
1. If `config_id` provided: Loads credential via `Credential.get(config_id)`, uses `credential.to_esperanto_config()` for provider config
2. Looks up test model from `TEST_MODELS` dict
3. For URL-based providers (ollama, openai_compatible): Tests server connectivity
4. For Azure: Tests `/openai/models` endpoint with api_version
5. For API-based providers: Creates minimal model via Esperanto and makes test call
6. Returns user-friendly error messages for common failures

### test_individual_model()

Tests a specific Model instance by loading its linked credential (if any) and making a minimal API call.

### TEST_MODELS Configuration

Maps each provider to `(model_name, model_type)` for testing:

```python
TEST_MODELS = {
    "openai": ("gpt-3.5-turbo", "language"),
    "anthropic": ("claude-3-haiku-20240307", "language"),
    "google": ("gemini-1.5-flash", "language"),
    "groq": ("llama-3.1-8b-instant", "language"),
    "voyage": ("voyage-3-lite", "embedding"),
    "elevenlabs": ("eleven_multilingual_v2", "text_to_speech"),
    "ollama": (None, "language"),  # Dynamic
    # ... more providers
}
```

### Special Provider Handlers

- **`_test_ollama_connection(base_url)`**: Tests Ollama server via `/api/tags` endpoint, returns model count
- **`_test_openai_compatible_connection(base_url, api_key)`**: Tests OpenAI-compatible servers via `/models` endpoint
- **`_get_ollama_models(base_url)`**: Fetches available models from Ollama server

### Error Message Normalization

The tester normalizes error messages for user-friendly display:
- `401/unauthorized` -> "Invalid API key"
- `403/forbidden` -> "API key lacks required permissions"
- `rate limit` -> "Rate limited - but connection works" (success)
- `model not found` -> "API key valid (test model not available)" (success)
- Connection/timeout errors -> Helpful troubleshooting messages

---

## Key Provider (key_provider.py)

### Purpose

Unified interface for retrieving API keys with database-first, environment-fallback strategy. Enables UI-based key management while maintaining backward compatibility with `.env` files. Used as fallback when models don't have a directly linked credential.

### Core Functions

#### get_api_key(provider)

```python
async def get_api_key(provider: str) -> Optional[str]
```

Gets API key for a provider. Checks database (`Credential` records) first, then environment variable.

**Fallback Chain**:
1. Query `Credential` records from database for the given provider
2. Get api_key from default credential
3. Handle `SecretStr` (call `.get_secret_value()`) vs regular strings
4. If DB value exists and is non-empty, return it
5. Otherwise, return `os.environ.get(env_var)`

#### provision_provider_keys(provider)

```python
async def provision_provider_keys(provider: str) -> bool
```

Main entry point for DB->Env fallback. Sets environment variables from database config for a provider. Called before model provisioning to ensure Esperanto can read keys from env vars.

**Returns**: `True` if any keys were set from database.

**Usage**:
```python
# Before creating a model, ensure DB keys are in env vars
await provision_provider_keys("openai")
model = AIFactory.create_language(model_name="gpt-4", provider="openai")
```

#### provision_all_keys()

```python
async def provision_all_keys() -> dict[str, bool]
```

Provisions all providers at once. Useful at application startup.

### Provider Configuration Maps

#### PROVIDER_CONFIG (Simple Providers)

Single-field providers with API key only:

```python
PROVIDER_CONFIG = {
    "openai": {"env_var": "OPENAI_API_KEY", "config_field": "openai_api_key"},
    "anthropic": {"env_var": "ANTHROPIC_API_KEY", "config_field": "anthropic_api_key"},
    "google": {"env_var": "GOOGLE_API_KEY", "config_field": "google_api_key"},
    "groq": {"env_var": "GROQ_API_KEY", "config_field": "groq_api_key"},
    "mistral": {"env_var": "MISTRAL_API_KEY", "config_field": "mistral_api_key"},
    "deepseek": {"env_var": "DEEPSEEK_API_KEY", "config_field": "deepseek_api_key"},
    "xai": {"env_var": "XAI_API_KEY", "config_field": "xai_api_key"},
    "openrouter": {"env_var": "OPENROUTER_API_KEY", "config_field": "openrouter_api_key"},
    "voyage": {"env_var": "VOYAGE_API_KEY", "config_field": "voyage_api_key"},
    "elevenlabs": {"env_var": "ELEVENLABS_API_KEY", "config_field": "elevenlabs_api_key"},
    "ollama": {"env_var": "OLLAMA_API_BASE", "config_field": "ollama_api_base"},
}
```

#### VERTEX_CONFIG (Google Vertex AI)

Multi-field configuration for Vertex AI:

```python
VERTEX_CONFIG = {
    "project": {"env_var": "VERTEX_PROJECT", "config_field": "vertex_project"},
    "location": {"env_var": "VERTEX_LOCATION", "config_field": "vertex_location"},
    "credentials": {"env_var": "GOOGLE_APPLICATION_CREDENTIALS", "config_field": "google_application_credentials"},
}
```

#### AZURE_CONFIG (Azure OpenAI)

Generic and mode-specific endpoints for Azure:

```python
AZURE_CONFIG = {
    "api_key": {"env_var": "AZURE_OPENAI_API_KEY", "config_field": "azure_openai_api_key"},
    "api_version": {"env_var": "AZURE_OPENAI_API_VERSION", "config_field": "azure_openai_api_version"},
    "endpoint": {"env_var": "AZURE_OPENAI_ENDPOINT", "config_field": "azure_openai_endpoint"},
    # Mode-specific endpoints
    "endpoint_llm": {"env_var": "AZURE_OPENAI_ENDPOINT_LLM", "config_field": "azure_openai_endpoint_llm"},
    "endpoint_embedding": {"env_var": "AZURE_OPENAI_ENDPOINT_EMBEDDING", "config_field": "azure_openai_endpoint_embedding"},
    "endpoint_stt": {"env_var": "AZURE_OPENAI_ENDPOINT_STT", "config_field": "azure_openai_endpoint_stt"},
    "endpoint_tts": {"env_var": "AZURE_OPENAI_ENDPOINT_TTS", "config_field": "azure_openai_endpoint_tts"},
}
```

#### OPENAI_COMPATIBLE_CONFIG

Generic and mode-specific configuration for OpenAI-compatible providers:

```python
OPENAI_COMPATIBLE_CONFIG = {
    # Generic
    "api_key": {"env_var": "OPENAI_COMPATIBLE_API_KEY", "config_field": "openai_compatible_api_key"},
    "base_url": {"env_var": "OPENAI_COMPATIBLE_BASE_URL", "config_field": "openai_compatible_base_url"},
    # Mode-specific: LLM, Embedding, STT, TTS
    "api_key_llm": {"env_var": "OPENAI_COMPATIBLE_API_KEY_LLM", "config_field": "openai_compatible_api_key_llm"},
    "base_url_llm": {"env_var": "OPENAI_COMPATIBLE_BASE_URL_LLM", "config_field": "openai_compatible_base_url_llm"},
    # ... similar for embedding, stt, tts
}
```

### Internal Helper Functions

- **`_provision_simple_provider(provider)`**: Sets single env var for simple providers
- **`_provision_vertex()`**: Sets all Vertex AI env vars
- **`_provision_azure()`**: Sets all Azure OpenAI env vars (handles SecretStr)
- **`_provision_openai_compatible()`**: Sets all OpenAI-compatible env vars

### Integration with ModelManager

The credential system integrates with model provisioning in two ways:

1. **Credential-linked models** (preferred): Model has `credential` field pointing to a Credential record. `ModelManager.get_model()` calls `credential.to_esperanto_config()` and passes config directly to Esperanto's `AIFactory.create_*` methods
2. **Env var fallback**: If model has no linked credential, `provision_provider_keys(provider)` sets env vars from DB credentials; Esperanto reads from env vars
3. **ConnectionTester** loads Credential directly via `Credential.get(config_id)` for testing

The credential-linked approach is preferred as it allows multiple credentials per provider and avoids env var mutation.