open-notebook/docs/5-CONFIGURATION/openai-compatible.md

# OpenAI-Compatible Providers

Use any server that implements the OpenAI API format with Open Notebook. This includes LM Studio, Text Generation WebUI, vLLM, and many others.

---

## What is OpenAI-Compatible?

Many AI tools implement the same API format as OpenAI:

```
POST /v1/chat/completions
POST /v1/embeddings
POST /v1/audio/speech
```

Open Notebook can connect to any server using this format.

---

## Common Compatible Servers

| Server | Use Case | URL |
|--------|----------|-----|
| **LM Studio** | Desktop GUI for local models | https://lmstudio.ai |
| **Text Generation WebUI** | Full-featured local inference | https://github.com/oobabooga/text-generation-webui |
| **vLLM** | High-performance serving | https://github.com/vllm-project/vllm |
| **Ollama** | Simple local models | (Use native Ollama provider instead) |
| **LocalAI** | Local AI inference | https://github.com/mudler/LocalAI |
| **llama.cpp server** | Lightweight inference | https://github.com/ggerganov/llama.cpp |

---

## Quick Setup: LM Studio

### Step 1: Install and Start LM Studio

1. Download from https://lmstudio.ai
2. Install and launch
3. Download a model (e.g., Llama 3)
4. Start the local server (default: port 1234)

### Step 2: Configure in Settings UI (Recommended)

1. Go to **Settings** → **API Keys**
2. Click **Add Credential** → Select **OpenAI-Compatible**
3. Enter base URL: `http://host.docker.internal:1234/v1` (Docker) or `http://localhost:1234/v1` (local)
4. API key: `lm-studio` (placeholder, LM Studio doesn't require one)
5. Click **Save**, then **Test Connection**

**Legacy (Deprecated) — Environment variables:**
```bash
export OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1
export OPENAI_COMPATIBLE_API_KEY=not-needed
```

### Step 3: Add Model in Open Notebook

1. Go to **Settings** → **Models**
2. Click **Add Model**
3. Configure:
   - **Provider**: `openai_compatible`
   - **Model Name**: Your model name from LM Studio
   - **Display Name**: `LM Studio - Llama 3`
4. Click **Save**

---

## Configuration via Settings UI

The recommended way to configure OpenAI-compatible providers is through the Settings UI:

1. Go to **Settings** → **API Keys**
2. Click **Add Credential** → Select **OpenAI-Compatible**
3. Enter your base URL and API key (if needed)
4. Optionally configure per-service URLs for LLM, Embedding, TTS, and STT
5. Click **Save**, then **Test Connection**

## Legacy: Environment Variables (Deprecated)

> **Deprecated**: These environment variables are deprecated. Use the Settings UI instead.

### Language Models (Chat)

```bash
OPENAI_COMPATIBLE_BASE_URL=http://localhost:1234/v1
OPENAI_COMPATIBLE_API_KEY=optional-api-key
```

### Embeddings

```bash
OPENAI_COMPATIBLE_BASE_URL_EMBEDDING=http://localhost:1234/v1
OPENAI_COMPATIBLE_API_KEY_EMBEDDING=optional-api-key
```

### Text-to-Speech

```bash
OPENAI_COMPATIBLE_BASE_URL_TTS=http://localhost:8969/v1
OPENAI_COMPATIBLE_API_KEY_TTS=optional-api-key
```

### Speech-to-Text

```bash
OPENAI_COMPATIBLE_BASE_URL_STT=http://localhost:9000/v1
OPENAI_COMPATIBLE_API_KEY_STT=optional-api-key
```

---

## Docker Networking

When Open Notebook runs in Docker and your compatible server runs on the host, use the appropriate base URL when adding your credential in **Settings → API Keys**:

### macOS / Windows

**Base URL:** `http://host.docker.internal:1234/v1`

### Linux

**Base URL (Option 1 — Docker bridge IP):** `http://172.17.0.1:1234/v1`

**Option 2:** Use host networking mode: `docker run --network host ...`
Then use base URL: `http://localhost:1234/v1`

### Same Docker Network

```yaml
# docker-compose.yml
services:
  open-notebook:
    # ...

  lm-studio:
    # your LM Studio container
    ports:
      - "1234:1234"
```

**Base URL in Settings → API Keys:** `http://lm-studio:1234/v1`

---

## Text Generation WebUI Setup

### Start with API Enabled

```bash
python server.py --api --listen
```

### Configure Open Notebook

In **Settings → API Keys**, add an **OpenAI-Compatible** credential with base URL: `http://localhost:5000/v1`

### Docker Compose Example

```yaml
services:
  text-gen:
    image: atinoda/text-generation-webui:default
    ports:
      - "5000:5000"
      - "7860:7860"
    volumes:
      - ./models:/app/models
    command: --api --listen

  open-notebook:
    image: lfnovo/open_notebook:v1-latest-single
    pull_policy: always
    depends_on:
      - text-gen
```

Then in **Settings → API Keys**, add an **OpenAI-Compatible** credential with base URL: `http://text-gen:5000/v1`

---

## vLLM Setup

### Start vLLM Server

```bash
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.1-8B-Instruct \
  --port 8000
```

### Configure Open Notebook

In **Settings → API Keys**, add an **OpenAI-Compatible** credential with base URL: `http://localhost:8000/v1`

### Docker Compose with GPU

```yaml
services:
  vllm:
    image: vllm/vllm-openai:latest
    command: --model meta-llama/Llama-3.1-8B-Instruct
    ports:
      - "8000:8000"
    volumes:
      - ~/.cache/huggingface:/root/.cache/huggingface
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]

  open-notebook:
    image: lfnovo/open_notebook:v1-latest-single
    pull_policy: always
    depends_on:
      - vllm
```

Then in **Settings → API Keys**, add an **OpenAI-Compatible** credential with base URL: `http://vllm:8000/v1`

---

## Adding Models in Open Notebook

### Via Settings UI

1. Go to **Settings** → **Models**
2. Click **Add Model** in appropriate section
3. Select **Provider**: `openai_compatible`
4. Enter **Model Name**: exactly as the server expects
5. Enter **Display Name**: your preferred name
6. Click **Save**

### Model Name Format

The model name must match what your server expects:

| Server | Model Name Format |
|--------|-------------------|
| LM Studio | As shown in LM Studio UI |
| vLLM | HuggingFace model path |
| Text Gen WebUI | As loaded in UI |
| llama.cpp | Model file name |

---

## Testing Connection

### Test API Endpoint

```bash
# Test chat completions
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "your-model-name",
    "messages": [{"role": "user", "content": "Hello"}]
  }'
```

### Test from Inside Docker

```bash
docker exec -it open-notebook curl http://host.docker.internal:1234/v1/models
```

---

## Troubleshooting

### Connection Refused

```
Problem: Cannot connect to server

Solutions:
1. Verify server is running
2. Check port is correct
3. Test with curl directly
4. Check Docker networking (use host.docker.internal)
5. Verify firewall allows connection
```

### Model Not Found

```
Problem: Server returns "model not found"

Solutions:
1. Check model is loaded in server
2. Verify exact model name spelling
3. List available models: curl http://localhost:1234/v1/models
4. Update model name in Open Notebook
```

### Slow Responses

```
Problem: Requests take very long

Solutions:
1. Check server resources (RAM, GPU)
2. Use smaller/quantized model
3. Reduce context length
4. Enable GPU acceleration if available
```

### Authentication Errors

```
Problem: 401 or authentication failed

Solutions:
1. Check if server requires API key
2. Set the API key in your credential (Settings → API Keys)
3. Some servers need any non-empty key (use a placeholder like "not-needed")
```

### Timeout Errors

```
Problem: Request times out

Solutions:
1. Model may be loading (first request slow)
2. Increase timeout settings
3. Check server logs for errors
4. Reduce request size
```

---

## Multiple Compatible Endpoints

You can use different compatible servers for different purposes. When adding an **OpenAI-Compatible** credential in **Settings → API Keys**, you can configure per-service URLs:

- **LLM URL**: e.g., `http://localhost:1234/v1` (LM Studio)
- **Embedding URL**: e.g., `http://localhost:8080/v1` (different server)
- **TTS URL**: e.g., `http://localhost:8969/v1` (Speaches)
- **STT URL**: e.g., `http://localhost:9000/v1` (Speaches)

Alternatively, add each as a separate credential with its own base URL.

---

## Performance Tips

### Model Selection

| Model Size | RAM Needed | Speed |
|------------|------------|-------|
| 7B | 8GB | Fast |
| 13B | 16GB | Medium |
| 70B | 64GB+ | Slow |

### Quantization

Use quantized models (Q4, Q5) for faster inference with less RAM:

```
llama-3-8b-q4_k_m.gguf  → ~4GB RAM, fast
llama-3-8b-f16.gguf     → ~16GB RAM, slower
```

### GPU Acceleration

Enable GPU in your server for much faster inference:
- LM Studio: Settings → GPU layers
- vLLM: Automatic with CUDA
- llama.cpp: `--n-gpu-layers 35`

---

## Comparison: Native vs Compatible

| Aspect | Native Provider | OpenAI Compatible |
|--------|-----------------|-------------------|
| **Setup** | API key only | Server + configuration |
| **Models** | Provider's models | Any compatible model |
| **Cost** | Pay per token | Free (local) |
| **Speed** | Usually fast | Depends on hardware |
| **Features** | Full support | Basic features |

Use OpenAI-compatible when:
- Running local models
- Using custom/fine-tuned models
- Privacy requirements
- Cost control

---

## Related

- **[Local TTS Setup](local-tts.md)** - Text-to-speech with Speaches
- **[Local STT Setup](local-stt.md)** - Speech-to-text with Speaches
- **[AI Providers](ai-providers.md)** - All provider options
- **[Ollama Setup](ollama.md)** - Native Ollama integration