Major new AI capabilities for infrastructure monitoring:
Investigation System:
- Autonomous finding investigation with configurable autonomy levels
- Investigation orchestrator with rate limiting and guardrails
- Safety checks for read-only mode enforcement
- Chat-based investigation with approval workflows
Forecasting & Remediation:
- Trend forecasting for resource capacity planning
- Remediation engine for generating fix proposals
- Circuit breaker for AI operation protection
Unified Findings:
- Unified store bridging alerts and AI findings
- Correlation and root cause analysis
- Incident coordinator with metrics recording
New Frontend:
- AI Intelligence page with patrol controls
- Investigation drawer for finding details
- Unified findings panel with actions
Supporting Infrastructure:
- Learning store for user preference tracking
- Proxmox event ingestion and correlation
- Enhanced patrol with investigation triggers
Implements Phase 1-2 of multi-tenancy support using a directory-per-tenant
strategy that preserves existing file-based persistence.
Key changes:
- Add MultiTenantPersistence manager for org-scoped config routing
- Add TenantMiddleware for X-Pulse-Org-ID header extraction and context propagation
- Add MultiTenantMonitor for per-tenant monitor lifecycle management
- Refactor handlers (ConfigHandlers, AlertHandlers, AIHandlers, etc.) to be
context-aware with getConfig(ctx)/getMonitor(ctx) helpers
- Add Organization model for future tenant metadata
- Update server and router to wire multi-tenant components
All handlers maintain backward compatibility via legacy field fallbacks
for single-tenant deployments using the "default" org.
Previous LLM sessions incorrectly inserted fake URLs (pulse.sh/pro and
yourpulse.io/pro) for the Pro upgrade links. Neither domain exists.
Replaced all 34 instances with the correct URL: https://pulserelay.pro/Fixes#1077
Users providing base URLs like "https://openrouter.ai/api/v1" were
getting HTML error responses because the client used the URL directly
without appending "/chat/completions".
- Normalize baseURL in NewOpenAIClient to ensure it ends with /chat/completions
- Fix modelsEndpoint() to derive /models from the normalized baseURL
- Add tests for URL normalization with various endpoint formats
- Add persistent volume mounts for Go/npm caches (faster rebuilds)
- Add shell config with helpful aliases and custom prompt
- Add comprehensive devcontainer documentation
- Add pre-commit hooks for Go formatting and linting
- Use go-version-file in CI workflows instead of hardcoded versions
- Simplify docker compose commands with --wait flag
- Add gitignore entries for devcontainer auth files
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add IsAIEnabled() method to AISettingsHandler for consistent checks
- Gate baseline learning, pattern detector, and correlation detector initialization
in StartPatrol() on AI being enabled
- Add AI enabled checks to all /api/ai/intelligence/* endpoints as defense-in-depth
- Return empty results with "AI is not enabled" message when AI is disabled
This ensures no AI-related data is collected, persisted, or returned when AI is disabled,
preventing the "undismissable alerts" issue where old AI findings would appear.
Addresses #866 - agents were logging 'WebSocket connection failed' warnings
even during normal reconnection scenarios (server restart, network blip, etc).
Changes:
- Normal close errors (1000, 1001, connection reset) now log at Debug level
- Only log Warning after 3+ consecutive failures
- Changed 'Connecting to Pulse' from Info to Debug to reduce noise
- Successful connections still log at Info level
The WebSocket is only used for AI command execution, not metrics, so
transient disconnections don't affect monitoring functionality.
Critical fixes to show only actionable insights:
1. Skip stopped VMs/containers from anomaly detection
- '0.0x baseline' for stopped resources is expected, not an anomaly
- Only check anomalies for status='running'
2. Filter correlations by confidence (>=70%)
- Low confidence correlations are likely coincidental
- Only show high-confidence, actionable dependencies
This reduces noise and surfaces genuinely useful intelligence.
Free Features (no license required):
- Anomaly detection - removed license gating, purely statistical analysis
- Learning status endpoint - GET /api/ai/intelligence/learning
Learning Status Response:
- resources_baselined: count of resources with learned baselines
- total_metrics: total metric baselines (cpu + memory + disk)
- metric_breakdown: {cpu: X, memory: Y, disk: Z}
- status: 'waiting' | 'learning' | 'active'
- message: human-readable description
This makes the AI intelligence features visible to all users,
encouraging upgrades for the full LLM-powered patrol experience.
Add /api/ai/intelligence/anomalies endpoint that compares live metrics
against learned baselines to surface deviations - all deterministic
(no LLM required).
Backend:
- Add AnomalyReport struct with severity classification
- Add CheckResourceAnomalies method to baseline store
- Add HandleGetAnomalies API handler
- Add GetStateProvider getter to AI service
Frontend:
- Add AnomalyReport and AnomaliesResponse types
- Add getAnomalies API function
- Add AnomalySeverity type
This is the first step toward surfacing deterministic intelligence
directly in the UI without requiring LLM interaction.