open-notebook/docs/features/openai-compatible.md

# OpenAI-Compatible Providers Setup Guide

Open Notebook supports OpenAI-compatible API endpoints across all AI modalities (language models, embeddings, speech-to-text, and text-to-speech), giving you the flexibility to use popular tools like LM Studio, Text Generation WebUI, vLLM, and custom inference servers.

## Why Choose OpenAI-Compatible Providers?

- **🆓 Cost Flexibility**: Use free local inference or choose cost-effective cloud providers
- **🔒 Privacy Control**: Run models locally or choose privacy-focused hosted services
- **🎯 Model Selection**: Access to thousands of open-source models
- **⚡ Performance Tuning**: Optimize inference for your specific hardware
- **🔧 Full Control**: Deploy on your infrastructure with your configurations
- **🌐 Universal Standard**: Works with any service that implements the OpenAI API specification

## Quick Start

### Basic Setup (All Modalities)

**For LM Studio** (simplest):
```bash
# Start LM Studio and enable server mode on port 1234
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1

# Most LM Studio endpoints don't require an API key
# export OPENAI_COMPATIBLE_API_KEY=not_needed
```

**For Text Generation WebUI**:
```bash
# Start with --api flag
# python server.py --api --listen

export OPENAI_COMPATIBLE_BASE_URL=http://localhost:5000/v1
```

**For vLLM**:
```bash
# Start vLLM server
# vllm serve MODEL_NAME --port 8000

export OPENAI_COMPATIBLE_BASE_URL=http://localhost:8000/v1
```

### Advanced Setup (Mode-Specific Endpoints)

Use different endpoints for different capabilities:

```bash
# Language models on LM Studio
export OPENAI_COMPATIBLE_BASE_URL_LLM=http://localhost:1234/v1

# Embeddings on a dedicated embedding server
export OPENAI_COMPATIBLE_BASE_URL_EMBEDDING=http://localhost:8080/v1

# Speech services on a different server
export OPENAI_COMPATIBLE_BASE_URL_STT=http://localhost:9000/v1
export OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969/v1
```

> **🎙️ Want free, local text-to-speech?** Check our [Local TTS Setup Guide](local_tts.md) for completely private, zero-cost podcast generation!

## Environment Variable Reference

### Generic Configuration

Use these when you want the same endpoint for all modalities:

| Variable | Purpose | Required |
|----------|---------|----------|
| `OPENAI_COMPATIBLE_BASE_URL` | Base URL for all AI services | Yes (unless using mode-specific) |
| `OPENAI_COMPATIBLE_API_KEY` | API key if endpoint requires auth | Optional |

**Example:**
```bash
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1
export OPENAI_COMPATIBLE_API_KEY=your_key_here  # If needed
```

### Mode-Specific Configuration

Use these when you want different endpoints for different capabilities:

| Variable | Purpose | Modality |
|----------|---------|----------|
| `OPENAI_COMPATIBLE_BASE_URL_LLM` | Language model endpoint | Language models |
| `OPENAI_COMPATIBLE_API_KEY_LLM` | API key for LLM endpoint | Language models |
| `OPENAI_COMPATIBLE_BASE_URL_EMBEDDING` | Embedding model endpoint | Embeddings |
| `OPENAI_COMPATIBLE_API_KEY_EMBEDDING` | API key for embedding endpoint | Embeddings |
| `OPENAI_COMPATIBLE_BASE_URL_STT` | Speech-to-text endpoint | Speech-to-Text |
| `OPENAI_COMPATIBLE_API_KEY_STT` | API key for STT endpoint | Speech-to-Text |
| `OPENAI_COMPATIBLE_BASE_URL_TTS` | Text-to-speech endpoint | Text-to-Speech |
| `OPENAI_COMPATIBLE_API_KEY_TTS` | API key for TTS endpoint | Text-to-Speech |

**Precedence**: Mode-specific variables override the generic `OPENAI_COMPATIBLE_BASE_URL`

**Example:**
```bash
# LLM on LM Studio
export OPENAI_COMPATIBLE_BASE_URL_LLM=http://localhost:1234/v1

# Embeddings on dedicated server
export OPENAI_COMPATIBLE_BASE_URL_EMBEDDING=http://localhost:8080/v1
export OPENAI_COMPATIBLE_API_KEY_EMBEDDING=secret_key_here
```

## Common Use Cases

### LM Studio

**What is LM Studio?**
LM Studio is a desktop application for running large language models locally with a user-friendly interface.

**Setup Steps:**
1. **Download and install** LM Studio from [lmstudio.ai](https://lmstudio.ai/)
2. **Download a model** (e.g., Llama 3, Qwen, Mistral)
3. **Start the local server**:
   - Go to the "Local Server" tab
   - Click "Start Server"
   - Note the port (default: 1234)

4. **Configure Open Notebook**:
```bash
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1
```

**What works:**
- ✅ Language models (chat, completions)
- ✅ Embeddings (with embedding models)
- ❌ Speech-to-text (not supported)
- ❌ Text-to-speech (not supported)

**Tips:**
- LM Studio doesn't require an API key
- Choose quantized models (Q4, Q5) for better performance
- Monitor RAM usage - larger models need more memory

---

### Text Generation WebUI (Oobabooga)

**What is Text Generation WebUI?**
A powerful Gradio-based web interface for running Large Language Models.

**Setup Steps:**
1. **Install** following [official instructions](https://github.com/oobabooga/text-generation-webui)
2. **Download a model** using the UI or manually
3. **Start with API mode**:
```bash
python server.py --api --listen
```

4. **Configure Open Notebook**:
```bash
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:5000/v1
```

**What works:**
- ✅ Language models (excellent support)
- ✅ Embeddings (with compatible models)
- ❌ Speech services (not supported)

**Tips:**
- Use `--listen` to accept connections from Docker
- Supports more model formats than LM Studio
- Great for fine-tuned models

---

### vLLM

**What is vLLM?**
High-performance inference server optimized for serving large language models at scale.

**Setup Steps:**
1. **Install vLLM**:
```bash
pip install vllm
```

2. **Start the server**:
```bash
vllm serve meta-llama/Llama-3-8B-Instruct --port 8000
```

3. **Configure Open Notebook**:
```bash
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:8000/v1
```

**What works:**
- ✅ Language models (optimized inference)
- ✅ Embeddings (with embedding models)
- ❌ Speech services (not supported)

**Tips:**
- Best performance for production deployments
- Supports tensor parallelism for large models
- Excellent for high-throughput scenarios

---

### Custom OpenAI-Compatible Services

Many services implement the OpenAI API specification:

**Examples:**
- **Together AI**: Cloud-hosted models
- **Anyscale Endpoints**: Ray-based inference
- **Replicate**: Cloud model hosting
- **LocalAI**: Self-hosted alternative to OpenAI
- **FastChat**: Multi-model serving

**Configuration:**
```bash
# Generic setup
export OPENAI_COMPATIBLE_BASE_URL=https://api.your-service.com/v1
export OPENAI_COMPATIBLE_API_KEY=your_api_key_here
```

## Configuration Scenarios

### Scenario 1: Single Local Endpoint (Simplest)

**Use Case**: Running LM Studio for language models only

```bash
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1
```

**Result**:
- ✅ Language models available
- ✅ Embeddings available (if model supports)
- ✅ Speech services available (if endpoint supports)
- All use the same endpoint

---

### Scenario 2: Separate Endpoints per Modality

**Use Case**: Language models on LM Studio, embeddings on dedicated server

```bash
# Language models on LM Studio
export OPENAI_COMPATIBLE_BASE_URL_LLM=http://localhost:1234/v1

# Embeddings on specialized server
export OPENAI_COMPATIBLE_BASE_URL_EMBEDDING=http://localhost:8080/v1
export OPENAI_COMPATIBLE_API_KEY_EMBEDDING=embedding_key_here
```

**Result**:
- ✅ Language models use LM Studio (port 1234)
- ✅ Embeddings use specialized server (port 8080)
- ❌ Speech services not available (not configured)

---

### Scenario 3: Mixed Local and Cloud

**Use Case**: Local models for privacy, cloud for specialized tasks

```bash
# Local LLM (privacy-sensitive work)
export OPENAI_COMPATIBLE_BASE_URL_LLM=http://localhost:1234/v1

# Cloud embeddings (better quality)
export OPENAI_COMPATIBLE_BASE_URL_EMBEDDING=https://api.cloud-provider.com/v1
export OPENAI_COMPATIBLE_API_KEY_EMBEDDING=cloud_key_here

# Cloud speech services
export OPENAI_COMPATIBLE_BASE_URL_TTS=https://api.cloud-provider.com/v1
export OPENAI_COMPATIBLE_API_KEY_TTS=cloud_key_here
```

**Result**:
- ✅ Sensitive chat stays local
- ✅ High-quality embeddings from cloud
- ✅ Professional TTS from cloud
- 🔒 Privacy for conversations, cloud for non-sensitive features

---

### Scenario 4: Docker Deployment

**Use Case**: Open Notebook in Docker, LM Studio on host machine

**On macOS/Windows**:
```bash
export OPENAI_COMPATIBLE_BASE_URL=http://host.docker.internal:1234/v1
```

**On Linux**:
```bash
# Use host networking or find host IP
export OPENAI_COMPATIBLE_BASE_URL=http://172.17.0.1:1234/v1
# or use --network host in docker run
```

**Important**:
- LM Studio must be set to listen on `0.0.0.0`, not just `localhost`
- In LM Studio settings, enable "Allow network connections"

## Network Configuration

### Docker Networking

**Problem**: Docker containers can't reach `localhost` on the host

**Solutions:**

**Option 1: Use `host.docker.internal` (Mac/Windows)**
```bash
export OPENAI_COMPATIBLE_BASE_URL=http://host.docker.internal:1234/v1
```

**Option 2: Use host IP address (Linux)**
```bash
# Find host IP
ip addr show docker0 | grep inet

# Use in environment
export OPENAI_COMPATIBLE_BASE_URL=http://172.17.0.1:1234/v1
```

**Option 3: Host networking (Linux only)**
```bash
docker run --network host \
  -v ./notebook_data:/app/data \
  -e OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1 \
  lfnovo/open_notebook:v1-latest-single
```

### Remote Servers

**Use Case**: OpenAI-compatible service on a different machine

```bash
# Replace with your server's IP or hostname
export OPENAI_COMPATIBLE_BASE_URL=http://192.168.1.100:1234/v1
```

**Security Notes:**
- ⚠️ Only use on trusted networks
- Consider using HTTPS for production
- Implement API key authentication if possible
- Use firewall rules to restrict access

### Port Conflicts

**Problem**: Default port (1234) is already in use

**Solution**: Change the port in your inference server

**LM Studio:**
- Settings → Local Server → Port → Change to different port

**Then update environment:**
```bash
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:8888/v1
```

## Troubleshooting

### Connection Refused

**Symptom**: "Connection refused" or "Could not connect to endpoint"

**Solutions:**
1. **Verify server is running**:
   ```bash
   curl http://localhost:1234/v1/models
   ```

2. **Check firewall settings**: Ensure the port is not blocked

3. **For Docker**: Use `host.docker.internal` instead of `localhost`

4. **Check server binding**: Server must listen on `0.0.0.0`, not just `127.0.0.1`

---

### Models Not Found

**Symptom**: "Model not found" or "No models available"

**Solutions:**
1. **Verify model is loaded** in your inference server
2. **Check model name** matches what Open Notebook expects
3. **For LM Studio**: Ensure model is loaded in the local server tab
4. **Test endpoint**:
   ```bash
   curl http://localhost:1234/v1/models
   ```

---

### Slow Performance

**Symptom**: Responses take a long time

**Solutions:**
1. **Use quantized models** (Q4, Q5 instead of full precision)
2. **Check RAM usage**: Model might be swapping to disk
3. **Reduce context length**: Smaller context = faster inference
4. **Enable GPU acceleration**: If available
5. **For vLLM**: Enable tensor parallelism for large models

---

### Authentication Errors

**Symptom**: "Unauthorized" or "Invalid API key"

**Solutions:**
1. **Set API key** if your endpoint requires it:
   ```bash
   export OPENAI_COMPATIBLE_API_KEY=your_key_here
   ```

2. **Check key validity**: Test with curl:
   ```bash
   curl -H "Authorization: Bearer YOUR_KEY" \
     http://localhost:1234/v1/models
   ```

3. **For mode-specific**: Use the correct key variable:
   ```bash
   export OPENAI_COMPATIBLE_API_KEY_LLM=llm_key
   export OPENAI_COMPATIBLE_API_KEY_EMBEDDING=embedding_key
   ```

---

### Docker Can't Reach Host

**Symptom**: Connection works locally but not from Docker

**Solutions:**
1. **Use `host.docker.internal`** (Mac/Windows):
   ```bash
   export OPENAI_COMPATIBLE_BASE_URL=http://host.docker.internal:1234/v1
   ```

2. **On Linux**: Use host IP or `--network host`

3. **Check server listening**: Must listen on `0.0.0.0:1234`, not `127.0.0.1:1234`

4. **Test from inside container**:
   ```bash
   docker exec -it open-notebook curl http://host.docker.internal:1234/v1/models
   ```

---

### Embeddings Not Working

**Symptom**: Search or embeddings fail

**Solutions:**
1. **Verify embedding model is loaded**: Many inference servers need explicit embedding model setup
2. **Use dedicated embedding endpoint**: If available
3. **Check model compatibility**: Not all models support embeddings
4. **For LM Studio**: Load an embedding model separately

---

### Mixed Results (Some Modes Work, Others Don't)

**Symptom**: Language models work, but embeddings or speech don't

**Solution**: Use mode-specific configuration:
```bash
# What works
export OPENAI_COMPATIBLE_BASE_URL_LLM=http://localhost:1234/v1

# For embeddings, use a different provider
export OPENAI_API_KEY=your_openai_key  # Fallback to OpenAI for embeddings
```

## Best Practices

### Security

1. **API Keys**:
   - Use environment variables, never hardcode
   - Rotate keys regularly for cloud services
   - Use different keys for different services

2. **Network**:
   - Only expose on trusted networks
   - Use HTTPS in production
   - Implement firewall rules

3. **Data Privacy**:
   - Use local models for sensitive data
   - Check service privacy policies
   - Understand data retention policies

### Performance

1. **Model Selection**:
   - Quantized models (Q4, Q5) for better speed/memory trade-off
   - Smaller models for simple tasks
   - Larger models only when needed

2. **Resource Management**:
   - Monitor RAM and GPU usage
   - Use appropriate batch sizes
   - Consider model caching strategies

3. **Network**:
   - Use local endpoints when possible for lower latency
   - For cloud: Choose geographically close servers

### Reliability

1. **Fallback Strategy**:
   ```bash
   # Primary: Local LLM
   export OPENAI_COMPATIBLE_BASE_URL_LLM=http://localhost:1234/v1

   # Fallback: Use OpenAI if local is unavailable
   export OPENAI_API_KEY=your_backup_key
   ```

2. **Health Checks**:
   - Periodically test endpoints
   - Monitor server status
   - Set up alerts for downtime

3. **Testing**:
   - Test configuration before production
   - Validate all required modalities work
   - Check error handling

## Related Guides

**OpenAI-Compatible Setups:**
- **[Local TTS Setup](local_tts.md)** - Free, private text-to-speech for podcasts
- **[Ollama Setup](ollama.md)** - Local language models and embeddings
- **[AI Models Guide](ai-models.md)** - Complete model configuration overview

## Getting Help

**Community Resources:**
- [Open Notebook Discord](https://discord.gg/37XJPXfz2w) - Get help with Open Notebook integration
- [LM Studio Discord](https://discord.gg/lmstudio) - LM Studio-specific support
- [Text Generation WebUI GitHub](https://github.com/oobabooga/text-generation-webui) - Issues and discussions

**Debugging Steps:**
1. **Test endpoint directly** with curl before configuring Open Notebook
2. **Check Open Notebook logs** for detailed error messages
3. **Verify environment variables** are set correctly
4. **Test with simple requests** first (list models, simple completion)

**Common curl tests:**
```bash
# List models
curl http://localhost:1234/v1/models

# Test completion
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# Test embeddings
curl http://localhost:8080/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "model": "embedding-model",
    "input": "Test text"
  }'
```

This guide should help you successfully configure OpenAI-compatible providers with Open Notebook. For general AI model configuration, see the [AI Models Guide](ai-models.md).