open-notebook/open_notebook/ai/CLAUDE.md
Luis Novo 924cd88494
Some checks are pending
Development Build / extract-version (push) Waiting to run
Development Build / build-regular (push) Blocked by required conditions
Development Build / build-single (push) Blocked by required conditions
Development Build / summary (push) Blocked by required conditions
Tests / Backend Tests (push) Waiting to run
Tests / Frontend Tests (push) Waiting to run
docs: update documentation for error handling and podcast retry (#599)
* docs: update CLAUDE.md and user docs for error handling and podcast retry

Add missing documentation for features introduced in v1.7.2 (#590) and
v1.7.3 (#595): error classification system, global exception handlers,
ConfigurationError, podcast failure recovery, and retry endpoint.

* chore: update uv.lock
2026-02-18 09:56:04 -03:00

16 KiB

AI Module

Model configuration, provisioning, and management for multi-provider AI integration via Esperanto.

Purpose

Centralizes AI model lifecycle: database models for model metadata (provider, type), default model configuration, and factory for instantiating LLM/embedding/speech models at runtime with fallback logic.

Architecture Overview

Two-tier system:

  1. Database models (Model, DefaultModels): Metadata storage and default configuration
  2. ModelManager: Factory for provisioning models with intelligent fallback (large context detection, config override)

All models use Esperanto library as provider abstraction (OpenAI, Anthropic, Google, Groq, Ollama, Mistral, DeepSeek, xAI, OpenRouter).

Component Catalog

models.py

Model (ObjectModel)

  • Database record: name, provider, type (language/embedding/speech_to_text/text_to_speech), credential (optional link to Credential record)
  • get_models_by_type(): Async query to fetch all models of a specific type
  • get_credential_obj(): Fetches linked Credential object (if credential field set)
  • get_by_credential(credential_id): Class method to find all models linked to a credential
  • Stores provider-model pairs for AI factory instantiation

DefaultModels (RecordModel)

  • Singleton configuration record (record_id: open_notebook:default_models)
  • Fields: default_chat_model, default_transformation_model, large_context_model, default_text_to_speech_model, default_speech_to_text_model, default_embedding_model, default_tools_model
  • get_instance(): Always fetches fresh from database (overrides parent caching for real-time updates)
  • Returns fresh instance on each call (no singleton cache)

ModelManager

  • Stateless factory for instantiating AI models
  • get_model(model_id): Retrieves Model by ID; if model has linked credential, uses credential.to_esperanto_config() for provider config; otherwise falls back to env var provisioning via key_provider
  • get_defaults(): Fetches DefaultModels configuration
  • get_default_model(model_type): Smart lookup (e.g., "chat" → default_chat_model, "transformation" → default_transformation_model with fallback to chat)
  • get_speech_to_text(), get_text_to_speech(), get_embedding_model(): Type-specific convenience methods with assertions
  • Global instance: model_manager singleton exported for use throughout app

provision.py

provision_langchain_model()

  • Factory for LangGraph nodes needing LLM provisioning
  • Smart fallback logic:
    • If tokens > 105,000: Use large_context_model
    • Elif model_id specified: Use specific model
    • Else: Use default model for type (e.g., "chat", "transformation")
  • Returns LangChain-compatible model via .to_langchain()
  • Logs model selection decision

key_provider.py

API Key Provider (Credential→Env Fallback)

  • Purpose: Provides API keys from database first, falls back to environment variables
  • Pattern: Before Esperanto creates a model, keys are loaded from Credential records and set as environment variables
  • Integration point: Called by ModelManager.get_model() as fallback when model has no linked credential

Key Functions

  • get_api_key(provider): Get single API key (DB first, then env var)
  • provision_provider_keys(provider): Set env vars from DB config for a provider
  • provision_all_keys(): Load all provider keys from DB into env vars (useful at startup)

Provider Configuration Maps

  • PROVIDER_CONFIG: Simple providers (openai, anthropic, google, groq, etc.)
  • VERTEX_CONFIG: Google Vertex AI (project, location, credentials)
  • AZURE_CONFIG: Azure OpenAI (api_key, endpoint, api_version, mode-specific endpoints)
  • OPENAI_COMPATIBLE_CONFIG: Generic OpenAI-compatible (generic + mode-specific for LLM/EMBEDDING/STT/TTS)

Common Patterns

  • Type dispatch: Model.type field drives factory logic (4 model types)
  • Provider abstraction: Esperanto handles provider differences; ModelManager unaware of provider specifics
  • Fresh defaults: DefaultModels.get_instance() always fetches from database (not cached) for live config updates
  • Config override: provision_langchain_model() accepts kwargs passed to AIFactory.create_* methods
  • Token-based selection: provision_langchain_model() detects large contexts and upgrades model automatically
  • Type assertions: get_speech_to_text(), get_embedding_model() assert returned type (safety check)
  • Credential→Env fallback: If model has linked credential, config from credential.to_esperanto_config() is used directly; otherwise keys checked in database via key_provider, then environment variables; enables UI-based key management while maintaining backward compatibility

Key Dependencies

  • esperanto: AIFactory.create_language(), create_embedding(), create_speech_to_text(), create_text_to_speech()
  • open_notebook.database.repository: repo_query, ensure_record_id
  • open_notebook.domain.base: ObjectModel, RecordModel base classes
  • open_notebook.domain.credential: Credential for database-stored API keys
  • open_notebook.utils: token_count() for context size detection
  • loguru: Logging for model selection decisions

Important Quirks & Gotchas

  • Token counting rough estimate: provision_langchain_model() uses token_count() which estimates via cl100k_base encoding (may differ 5-10% from actual model)
  • Large context threshold hard-coded: 105,000 token threshold for large_context_model upgrade (not configurable)
  • DefaultModels.get_instance() fresh fetch: Intentionally bypasses parent singleton cache to pick up live config changes; creates new instance each call
  • Type-specific getters use assertions: get_speech_to_text() asserts isinstance (catches misconfiguration early)
  • ConfigurationError on missing model: ModelManager.get_model() and provision_langchain_model() raise ConfigurationError (not ValueError) when a model is not found or not configured, so the global exception handler returns HTTP 422 with a descriptive message
  • Esperanto caching: Actual model instances cached by Esperanto (not by ModelManager); ModelManager stateless
  • Fallback chain specificity: "transformation" type falls back to default_chat_model if not explicitly set (convention-based)
  • kwargs passed through: provision_langchain_model() passes kwargs to AIFactory but doesn't validate what's accepted
  • Key provider sets env vars: provision_provider_keys() modifies os.environ to inject DB-stored keys (from Credential records); Esperanto reads from env vars (only used as fallback when model has no linked credential)

How to Extend

  1. Add new model type: Add type string to Model.type enum, add create_* method in AIFactory, handle in ModelManager.get_model()
  2. Add new default configuration: Extend DefaultModels with new field (e.g., default_vision_model), add getter in ModelManager
  3. Change fallback logic: Modify provision_langchain_model() token threshold or fallback chain
  4. Add model filtering: Extend Model.get_models_by_type() with additional filters (e.g., by provider)
  5. Implement model caching: Wrap ModelManager methods with functools.lru_cache (be aware of kwargs mutability)

Usage Example

from open_notebook.ai.models import model_manager

# Get default chat model
chat_model = await model_manager.get_default_model("chat")

# Get specific model by ID
embedding_model = await model_manager.get_model("model:openai_embedding")

# Get embedding model with config override
embedding_model = await model_manager.get_embedding_model(temperature=0.1)

# Provision model for LangGraph (auto-detects large context)
from open_notebook.ai.provision import provision_langchain_model
langchain_model = await provision_langchain_model(
    content=long_text,
    model_id=None,  # Use default
    default_type="chat",
    temperature=0.7
)

Connection Testing (connection_tester.py)

Purpose

Provides functionality to test if a provider's API key is valid by making minimal API calls. Used by the API Configuration UI to validate user-entered credentials before saving.

test_provider_connection()

Main entry point for testing provider connectivity.

async def test_provider_connection(
    provider: str, model_type: str = "language",
    config_id: Optional[str] = None
) -> Tuple[bool, str]

Returns: (success: bool, message: str) - Success status and human-readable message.

Flow:

  1. If config_id provided: Loads credential via Credential.get(config_id), uses credential.to_esperanto_config() for provider config
  2. Looks up test model from TEST_MODELS dict
  3. For URL-based providers (ollama, openai_compatible): Tests server connectivity
  4. For Azure: Tests /openai/models endpoint with api_version
  5. For API-based providers: Creates minimal model via Esperanto and makes test call
  6. Returns user-friendly error messages for common failures

test_individual_model()

Tests a specific Model instance by loading its linked credential (if any) and making a minimal API call.

TEST_MODELS Configuration

Maps each provider to (model_name, model_type) for testing:

TEST_MODELS = {
    "openai": ("gpt-3.5-turbo", "language"),
    "anthropic": ("claude-3-haiku-20240307", "language"),
    "google": ("gemini-1.5-flash", "language"),
    "groq": ("llama-3.1-8b-instant", "language"),
    "voyage": ("voyage-3-lite", "embedding"),
    "elevenlabs": ("eleven_multilingual_v2", "text_to_speech"),
    "ollama": (None, "language"),  # Dynamic
    # ... more providers
}

Special Provider Handlers

  • _test_ollama_connection(base_url): Tests Ollama server via /api/tags endpoint, returns model count
  • _test_openai_compatible_connection(base_url, api_key): Tests OpenAI-compatible servers via /models endpoint
  • _get_ollama_models(base_url): Fetches available models from Ollama server

Error Message Normalization

The tester normalizes error messages for user-friendly display:

  • 401/unauthorized -> "Invalid API key"
  • 403/forbidden -> "API key lacks required permissions"
  • rate limit -> "Rate limited - but connection works" (success)
  • model not found -> "API key valid (test model not available)" (success)
  • Connection/timeout errors -> Helpful troubleshooting messages

Key Provider (key_provider.py)

Purpose

Unified interface for retrieving API keys with database-first, environment-fallback strategy. Enables UI-based key management while maintaining backward compatibility with .env files. Used as fallback when models don't have a directly linked credential.

Core Functions

get_api_key(provider)

async def get_api_key(provider: str) -> Optional[str]

Gets API key for a provider. Checks database (Credential records) first, then environment variable.

Fallback Chain:

  1. Query Credential records from database for the given provider
  2. Get api_key from default credential
  3. Handle SecretStr (call .get_secret_value()) vs regular strings
  4. If DB value exists and is non-empty, return it
  5. Otherwise, return os.environ.get(env_var)

provision_provider_keys(provider)

async def provision_provider_keys(provider: str) -> bool

Main entry point for DB->Env fallback. Sets environment variables from database config for a provider. Called before model provisioning to ensure Esperanto can read keys from env vars.

Returns: True if any keys were set from database.

Usage:

# Before creating a model, ensure DB keys are in env vars
await provision_provider_keys("openai")
model = AIFactory.create_language(model_name="gpt-4", provider="openai")

provision_all_keys()

async def provision_all_keys() -> dict[str, bool]

Provisions all providers at once. Useful at application startup.

Provider Configuration Maps

PROVIDER_CONFIG (Simple Providers)

Single-field providers with API key only:

PROVIDER_CONFIG = {
    "openai": {"env_var": "OPENAI_API_KEY", "config_field": "openai_api_key"},
    "anthropic": {"env_var": "ANTHROPIC_API_KEY", "config_field": "anthropic_api_key"},
    "google": {"env_var": "GOOGLE_API_KEY", "config_field": "google_api_key"},
    "groq": {"env_var": "GROQ_API_KEY", "config_field": "groq_api_key"},
    "mistral": {"env_var": "MISTRAL_API_KEY", "config_field": "mistral_api_key"},
    "deepseek": {"env_var": "DEEPSEEK_API_KEY", "config_field": "deepseek_api_key"},
    "xai": {"env_var": "XAI_API_KEY", "config_field": "xai_api_key"},
    "openrouter": {"env_var": "OPENROUTER_API_KEY", "config_field": "openrouter_api_key"},
    "voyage": {"env_var": "VOYAGE_API_KEY", "config_field": "voyage_api_key"},
    "elevenlabs": {"env_var": "ELEVENLABS_API_KEY", "config_field": "elevenlabs_api_key"},
    "ollama": {"env_var": "OLLAMA_API_BASE", "config_field": "ollama_api_base"},
}

VERTEX_CONFIG (Google Vertex AI)

Multi-field configuration for Vertex AI:

VERTEX_CONFIG = {
    "project": {"env_var": "VERTEX_PROJECT", "config_field": "vertex_project"},
    "location": {"env_var": "VERTEX_LOCATION", "config_field": "vertex_location"},
    "credentials": {"env_var": "GOOGLE_APPLICATION_CREDENTIALS", "config_field": "google_application_credentials"},
}

AZURE_CONFIG (Azure OpenAI)

Generic and mode-specific endpoints for Azure:

AZURE_CONFIG = {
    "api_key": {"env_var": "AZURE_OPENAI_API_KEY", "config_field": "azure_openai_api_key"},
    "api_version": {"env_var": "AZURE_OPENAI_API_VERSION", "config_field": "azure_openai_api_version"},
    "endpoint": {"env_var": "AZURE_OPENAI_ENDPOINT", "config_field": "azure_openai_endpoint"},
    # Mode-specific endpoints
    "endpoint_llm": {"env_var": "AZURE_OPENAI_ENDPOINT_LLM", "config_field": "azure_openai_endpoint_llm"},
    "endpoint_embedding": {"env_var": "AZURE_OPENAI_ENDPOINT_EMBEDDING", "config_field": "azure_openai_endpoint_embedding"},
    "endpoint_stt": {"env_var": "AZURE_OPENAI_ENDPOINT_STT", "config_field": "azure_openai_endpoint_stt"},
    "endpoint_tts": {"env_var": "AZURE_OPENAI_ENDPOINT_TTS", "config_field": "azure_openai_endpoint_tts"},
}

OPENAI_COMPATIBLE_CONFIG

Generic and mode-specific configuration for OpenAI-compatible providers:

OPENAI_COMPATIBLE_CONFIG = {
    # Generic
    "api_key": {"env_var": "OPENAI_COMPATIBLE_API_KEY", "config_field": "openai_compatible_api_key"},
    "base_url": {"env_var": "OPENAI_COMPATIBLE_BASE_URL", "config_field": "openai_compatible_base_url"},
    # Mode-specific: LLM, Embedding, STT, TTS
    "api_key_llm": {"env_var": "OPENAI_COMPATIBLE_API_KEY_LLM", "config_field": "openai_compatible_api_key_llm"},
    "base_url_llm": {"env_var": "OPENAI_COMPATIBLE_BASE_URL_LLM", "config_field": "openai_compatible_base_url_llm"},
    # ... similar for embedding, stt, tts
}

Internal Helper Functions

  • _provision_simple_provider(provider): Sets single env var for simple providers
  • _provision_vertex(): Sets all Vertex AI env vars
  • _provision_azure(): Sets all Azure OpenAI env vars (handles SecretStr)
  • _provision_openai_compatible(): Sets all OpenAI-compatible env vars

Integration with ModelManager

The credential system integrates with model provisioning in two ways:

  1. Credential-linked models (preferred): Model has credential field pointing to a Credential record. ModelManager.get_model() calls credential.to_esperanto_config() and passes config directly to Esperanto's AIFactory.create_* methods
  2. Env var fallback: If model has no linked credential, provision_provider_keys(provider) sets env vars from DB credentials; Esperanto reads from env vars
  3. ConnectionTester loads Credential directly via Credential.get(config_id) for testing

The credential-linked approach is preferred as it allows multiple credentials per provider and avoids env var mutation.