open-notebook/docs/troubleshooting/common-issues.md
Luis Novo b5666c4d68
Fix/increase fix: increase API client timeouts for transformation operations timeouts (#170)
* fix: increase API client timeouts for transformation operations

- Increase frontend timeout from 30s to 300s (5 minutes)
- Increase Streamlit API client timeout from 30s to 300s
- Add API_CLIENT_TIMEOUT environment variable for configurability
- Add ESPERANTO_LLM_TIMEOUT environment variable documentation
- Update .env.example with comprehensive timeout documentation

Fixes #131 - API timeout errors during transformation generation
Transformations now have sufficient time to complete on slower
hardware (Ollama, LM Studio) without frontend timeout errors.

Users can now configure timeouts for both the API client layer
(API_CLIENT_TIMEOUT) and the LLM provider layer (ESPERANTO_LLM_TIMEOUT)
to accommodate their specific hardware and network conditions.

* docs: add timeout configuration documentation

- Add comprehensive timeout troubleshooting section to common-issues.md
- Add FAQ entry about timeout errors during transformations
- Document API_CLIENT_TIMEOUT and ESPERANTO_LLM_TIMEOUT usage
- Provide specific timeout recommendations for different hardware/network scenarios
- Link to GitHub issue #131 for reference

* chore: bump

* refactor: improve timeout configuration with validation and consistency

Based on PR review feedback, this commit addresses several improvements:

**Timeout Validation:**
- Add validation to ensure timeout values are between 30s and 3600s
- Invalid values fall back to default 300s with warning logs
- Handles edge cases (negative, zero, invalid strings)

**Fix Hard-coded Timeouts:**
- Replace all hard-coded timeout values in api/client.py
- ask_simple: 300s → self.timeout
- execute_transformation: 120s → self.timeout
- embed_content: 120s → self.timeout
- create_source: 300s → self.timeout
- rebuild_embeddings: Uses smart logic (2x timeout, max 3600s)

**Improved Documentation:**
- Add clarifying comments about ms vs seconds (frontend vs backend)
- Document that frontend uses 300000ms = backend 300s
- Add inline documentation for rebuild_embeddings timeout logic

**Development Dependencies:**
- Add pytest>=8.0.0 to dev dependencies for future test coverage

This makes timeout configuration more robust, consistent, and user-friendly
while maintaining backward compatibility.
2025-10-19 11:37:24 -03:00

612 lines
No EOL
14 KiB
Markdown

# Common Issues and Solutions
This document covers the most frequently encountered issues when installing, configuring, and using Open Notebook, along with their solutions.
## Installation Problems
### Port Already in Use
**Problem**: Error message "Port 8502 is already in use" or similar port conflicts.
**Symptoms**:
- Cannot start React frontend
- Error messages about address already in use
- Services failing to bind to ports
**Solutions**:
1. **Find and stop conflicting process**:
```bash
# Check what's using port 8502
lsof -i :8502
# Kill the process (replace PID with actual process ID)
kill -9 <PID>
```
2. **Use different ports**:
```bash
# For React frontend
uv run --env-file .env cd frontend && npm run dev --server.port=8503
# For Docker deployment, modify docker-compose.yml
ports:
- "8503:8502" # host:container
```
3. **Common port conflicts**:
- Port 8502 (Next.js): Often used by other Next.js apps
- Port 5055 (API): May conflict with other web services
- Port 8000 (SurrealDB): May conflict with other databases
### Permission Denied (Docker)
**Problem**: Docker commands fail with permission denied errors.
**Symptoms**:
- "permission denied while trying to connect to the Docker daemon socket"
- Docker commands require sudo
**Solutions**:
1. **Add user to docker group (Linux)**:
```bash
sudo usermod -aG docker $USER
# Log out and log back in, or run:
newgrp docker
```
2. **Start Docker service (Linux)**:
```bash
sudo systemctl start docker
sudo systemctl enable docker
```
3. **Restart Docker Desktop (Windows/Mac)**:
- Close Docker Desktop completely
- Restart Docker Desktop
- Wait for it to fully start
### Python/uv Installation Issues
**Problem**: `uv` command not found or Python version conflicts.
**Symptoms**:
- "uv: command not found"
- Python version mismatch errors
- Virtual environment issues
**Solutions**:
1. **Install uv package manager**:
```bash
# macOS
brew install uv
# Linux/WSL
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc
# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
```
2. **Fix Python version issues**:
```bash
# Install specific Python version
uv python install 3.11
# Pin Python version for project
uv python pin 3.11
# Recreate virtual environment
uv sync --reinstall
```
3. **Clear uv cache**:
```bash
uv cache clean
```
### SurrealDB Connection Issues
**Problem**: Cannot connect to SurrealDB database.
**Symptoms**:
- "Connection refused" errors
- Database queries failing
- Timeout errors
**Solutions**:
1. **Check SurrealDB is running**:
```bash
# For Docker
docker compose ps surrealdb
# Check logs
docker compose logs surrealdb
```
2. **Verify connection settings**:
```bash
# Check environment variables
echo $SURREAL_URL
echo $SURREAL_USER
# Test connection
curl http://localhost:8000/health
```
3. **Restart SurrealDB**:
```bash
docker compose restart surrealdb
# Wait 10 seconds for startup
sleep 10
```
4. **Check file permissions**:
```bash
# Ensure data directory is writable
ls -la surreal_data/
# Fix permissions if needed
sudo chown -R $USER:$USER surreal_data/
```
## Runtime Errors
### AI Provider API Errors
**Problem**: Errors when using AI models (OpenAI, Anthropic, etc.).
**Symptoms**:
- "Invalid API key" errors
- "Rate limit exceeded" messages
- Model not found errors
**Solutions**:
1. **Verify API keys**:
```bash
# Check key format (don't expose full key)
echo $OPENAI_API_KEY | cut -c1-10
# Test OpenAI key
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
https://api.openai.com/v1/models
```
2. **Check billing and usage**:
- OpenAI: Visit https://platform.openai.com/account/billing
- Anthropic: Visit https://console.anthropic.com/account/billing
- Ensure you have sufficient credits
3. **Verify model availability**:
```bash
# Check model names in settings
# Use gpt-5-mini instead of gpt-4-mini
# Use claude-3-haiku-20240307 instead of claude-3-haiku
```
4. **Handle rate limits**:
- Wait before retrying
- Use lower-tier models for testing
- Check provider rate limits
### API Timeout Errors During Transformations
**Problem**: Timeout errors when running transformations or generating insights, even though the operation completes successfully.
**Symptoms**:
- "timeout of 30000ms exceeded" in React frontend
- "Failed to connect to API: timed out" in Streamlit UI
- Transformation completes after a few minutes, but error appears after 30-60 seconds
- Common with local models (Ollama), remote LM Studio, or slow hardware
**Solutions**:
1. **Increase API client timeout** (recommended):
```bash
# Add to your .env file
API_CLIENT_TIMEOUT=600 # 10 minutes (600 seconds)
```
This controls how long the frontend/UI waits for API responses. Default is 300 seconds (5 minutes).
2. **Adjust timeout based on your setup**:
```bash
# Fast cloud APIs (OpenAI, Anthropic, Groq)
API_CLIENT_TIMEOUT=300 # 5 minutes (default)
# Local Ollama on GPU
API_CLIENT_TIMEOUT=600 # 10 minutes
# Local Ollama on CPU or slow hardware
API_CLIENT_TIMEOUT=1200 # 20 minutes
# Remote LM Studio over slow network
API_CLIENT_TIMEOUT=900 # 15 minutes
```
3. **Increase LLM provider timeout if needed**:
```bash
# Add to your .env file if the model itself is timing out
ESPERANTO_LLM_TIMEOUT=180 # 3 minutes (default is 60s)
```
Only increase this if you see errors during actual model inference, not just HTTP timeouts.
4. **Use faster models for testing**:
- Test with cloud APIs first to verify setup
- Try smaller local models (e.g., `gemma2:2b` instead of `llama3:70b`)
- Preload models before running transformations: `ollama run model-name`
5. **Restart services after configuration changes**:
```bash
# For Docker
docker compose down
docker compose up -d
# For source installation
make stop-all
make start-all
```
**Important Notes**:
- `API_CLIENT_TIMEOUT` should be HIGHER than `ESPERANTO_LLM_TIMEOUT` for proper error handling
- If transformations complete successfully after refresh, you only need to increase `API_CLIENT_TIMEOUT`
- First time running a model may be slower due to model loading
**Related GitHub Issue**: [#131](https://github.com/lfnovo/open-notebook/issues/131)
### Memory and Performance Issues
**Problem**: Application running slowly or crashing due to memory issues.
**Symptoms**:
- Slow response times
- Out of memory errors
- Application crashes
- High CPU usage
**Solutions**:
1. **Increase Docker memory**:
```bash
# Docker Desktop → Settings → Resources → Memory
# Increase to 4GB or more
```
2. **Monitor resource usage**:
```bash
# Check Docker stats
docker stats
# Check system resources
htop
top
```
3. **Optimize model usage**:
- Use smaller models (gpt-5-mini vs gpt-5)
- Reduce context window size
- Process fewer documents at once
4. **Clear application cache**:
```bash
# Clear Python cache
find . -name "__pycache__" -type d -exec rm -rf {} +
# Clear Next.js cache
rm -rf ~/.streamlit/cache/
```
### Background Job Failures
**Problem**: Background tasks (podcast generation, transformations) failing.
**Symptoms**:
- Jobs stuck in "processing" state
- No podcast audio generated
- Transformations not completing
**Solutions**:
1. **Check worker status**:
```bash
# Check if worker is running
pgrep -f "surreal-commands-worker"
# Restart worker
make worker-restart
```
2. **Check job logs**:
```bash
# View worker logs
docker compose logs worker
# Check command status in database
# (Access through UI or API)
```
3. **Verify AI provider configuration**:
- Ensure TTS/STT models are configured
- Check API keys for required providers
- Test models individually
4. **Clear stuck jobs**:
```bash
# Restart all services
make stop-all
make start-all
```
### File Upload Issues
**Problem**: Cannot upload files or file processing fails.
**Symptoms**:
- Upload button not working
- File processing errors
- Unsupported file type messages
**Solutions**:
1. **Check file size limits**:
```bash
# Default Next.js limit is 200MB
# Large files may timeout
```
2. **Verify file types**:
- PDF: Standard PDF files (not password protected)
- Images: PNG, JPG, GIF, WebP
- Audio: MP3, WAV, M4A
- Video: MP4, AVI, MOV (for transcript extraction)
- Documents: TXT, DOC, DOCX
3. **Check file permissions**:
```bash
# Ensure files are readable
ls -la /path/to/file
# Fix permissions
chmod 644 /path/to/file
```
4. **Test with smaller files**:
- Try with a simple text file first
- Gradually increase complexity
## Performance Issues
### Slow Search and Chat
**Problem**: Search and chat responses are very slow.
**Symptoms**:
- Long wait times for responses
- Timeout errors
- Poor user experience
**Solutions**:
1. **Optimize embedding model**:
- Use faster embedding models
- Reduce embedding dimensions
- Process fewer documents at once
2. **Database optimization**:
```bash
# Check database performance
docker compose logs surrealdb
# Consider using RocksDB for better performance
# (Already configured in docker-compose.yml)
```
3. **Reduce context size**:
- Limit search results
- Use shorter prompts
- Reduce notebook size
4. **Use faster models**:
- gpt-5-mini instead of gpt-5
- claude-3-haiku instead of claude-3-opus
- Local models for simple tasks
### High Resource Usage
**Problem**: Application consuming too much CPU or memory.
**Symptoms**:
- High CPU usage in task manager
- System becoming unresponsive
- Docker containers using excessive resources
**Solutions**:
1. **Set resource limits**:
```yaml
# In docker-compose.yml
services:
open_notebook:
deploy:
resources:
limits:
memory: 2G
cpus: "1.0"
```
2. **Monitor and identify bottlenecks**:
```bash
# Check which service is consuming resources
docker stats
# Check system processes
htop
```
3. **Optimize processing**:
- Process documents in batches
- Use background jobs for heavy tasks
- Limit concurrent operations
## Configuration Problems
### Environment Variables Not Loading
**Problem**: Environment variables are not being read correctly.
**Symptoms**:
- Default values being used instead of configured values
- API keys not recognized
- Connection errors to external services
**Solutions**:
1. **Check file names**:
```bash
# For source installation
ls -la .env
# For Docker
ls -la docker.env
```
2. **Verify file format**:
```bash
# Check for invisible characters
cat -A .env
# Ensure no spaces around equals
OPENAI_API_KEY=value # Correct
OPENAI_API_KEY = value # Incorrect
```
3. **Check environment loading**:
```bash
# Test environment variable
echo $OPENAI_API_KEY
# For Docker
docker compose config
```
4. **Restart services after changes**:
```bash
# For Docker
docker compose down
docker compose up -d
# For source installation
make stop-all
make start-all
```
### Model Configuration Issues
**Problem**: AI models not working or configured incorrectly.
**Symptoms**:
- Model not found errors
- Incorrect responses
- Configuration not saving
**Solutions**:
1. **Check model names**:
```bash
# Use exact model names from provider documentation
# OpenAI: gpt-5-mini, gpt-5, text-embedding-3-small
# Anthropic: claude-3-haiku-20240307, claude-3-sonnet-20240229
```
2. **Verify provider configuration**:
- Check API keys are valid
- Ensure models are available for your account
- Test with simple requests first
3. **Reset model configuration**:
- Go to Models
- Clear all configurations
- Reconfigure with known working models
4. **Check provider status**:
- Visit provider status pages
- Check for service outages
- Try alternative providers
### Database Schema Issues
**Problem**: Database schema conflicts or migration issues.
**Symptoms**:
- Field validation errors
- Query failures
- Data not saving correctly
**Solutions**:
1. **Check database logs**:
```bash
docker compose logs surrealdb
```
2. **Reset database (WARNING: This deletes all data)**:
```bash
# Stop services
make stop-all
# Remove database files
rm -rf surreal_data/
# Restart services (will recreate database)
make start-all
```
3. **Manual schema update**:
```bash
# Run migrations
uv run python -m open_notebook.database.async_migrate
```
4. **Check SurrealDB version**:
```bash
# Ensure using compatible version
docker compose pull surrealdb
docker compose up -d
```
## Getting Help
If you've tried the solutions above and are still experiencing issues:
1. **Collect diagnostic information**:
```bash
# System information
uname -a
docker version
docker compose version
# Service status
make status
# Recent logs
docker compose logs --tail=100 > logs.txt
```
2. **Create a minimal reproduction**:
- Start with a fresh installation
- Use minimal configuration
- Document exact steps to reproduce
3. **Ask for help**:
- Discord: https://discord.gg/37XJPXfz2w
- GitHub Issues: https://github.com/lfnovo/open-notebook/issues
- Include all diagnostic information
Remember to remove API keys and sensitive information before sharing logs or configuration files.