Commit graph

14 commits

Author SHA1 Message Date
rcourtman
9279358cec imp(api): remove remaining license gates for intelligence features 2026-02-01 16:28:49 +00:00
rcourtman
95a0d7a6bd feat(backend): implement AI Patrol, Investigation, and system-wide refactors 2026-01-30 19:02:14 +00:00
rcourtman
27f1a11acb feat: add AI Intelligence system with investigation and forecasting
Major new AI capabilities for infrastructure monitoring:

Investigation System:
- Autonomous finding investigation with configurable autonomy levels
- Investigation orchestrator with rate limiting and guardrails
- Safety checks for read-only mode enforcement
- Chat-based investigation with approval workflows

Forecasting & Remediation:
- Trend forecasting for resource capacity planning
- Remediation engine for generating fix proposals
- Circuit breaker for AI operation protection

Unified Findings:
- Unified store bridging alerts and AI findings
- Correlation and root cause analysis
- Incident coordinator with metrics recording

New Frontend:
- AI Intelligence page with patrol controls
- Investigation drawer for finding details
- Unified findings panel with actions

Supporting Infrastructure:
- Learning store for user preference tracking
- Proxmox event ingestion and correlation
- Enhanced patrol with investigation triggers
2026-01-24 22:41:43 +00:00
rcourtman
289d95374f feat: add multi-tenancy foundation (directory-per-tenant)
Implements Phase 1-2 of multi-tenancy support using a directory-per-tenant
strategy that preserves existing file-based persistence.

Key changes:
- Add MultiTenantPersistence manager for org-scoped config routing
- Add TenantMiddleware for X-Pulse-Org-ID header extraction and context propagation
- Add MultiTenantMonitor for per-tenant monitor lifecycle management
- Refactor handlers (ConfigHandlers, AlertHandlers, AIHandlers, etc.) to be
  context-aware with getConfig(ctx)/getMonitor(ctx) helpers
- Add Organization model for future tenant metadata
- Update server and router to wire multi-tenant components

All handlers maintain backward compatibility via legacy field fallbacks
for single-tenant deployments using the "default" org.
2026-01-22 13:39:06 +00:00
rcourtman
55f5f071ed fix: replace hallucinated upgrade URLs with correct pulserelay.pro
Previous LLM sessions incorrectly inserted fake URLs (pulse.sh/pro and
yourpulse.io/pro) for the Pro upgrade links. Neither domain exists.

Replaced all 34 instances with the correct URL: https://pulserelay.pro/

Fixes #1077
2026-01-10 22:45:40 +00:00
rcourtman
6019e3e77e fix: normalize custom OpenAI-compatible API URLs (#1067)
Users providing base URLs like "https://openrouter.ai/api/v1" were
getting HTML error responses because the client used the URL directly
without appending "/chat/completions".

- Normalize baseURL in NewOpenAIClient to ensure it ends with /chat/completions
- Fix modelsEndpoint() to derive /models from the normalized baseURL
- Add tests for URL normalization with various endpoint formats
2026-01-09 09:13:36 +00:00
rcourtman
3fdf753a5b Enhance devcontainer and CI workflows
- Add persistent volume mounts for Go/npm caches (faster rebuilds)
- Add shell config with helpful aliases and custom prompt
- Add comprehensive devcontainer documentation
- Add pre-commit hooks for Go formatting and linting
- Use go-version-file in CI workflows instead of hardcoded versions
- Simplify docker compose commands with --wait flag
- Add gitignore entries for devcontainer auth files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 22:29:15 +00:00
rcourtman
5c2db10780 fix: Gate AI subsystems and intelligence endpoints on AI enabled state. Related to #885
- Add IsAIEnabled() method to AISettingsHandler for consistent checks
- Gate baseline learning, pattern detector, and correlation detector initialization
  in StartPatrol() on AI being enabled
- Add AI enabled checks to all /api/ai/intelligence/* endpoints as defense-in-depth
- Return empty results with "AI is not enabled" message when AI is disabled

This ensures no AI-related data is collected, persisted, or returned when AI is disabled,
preventing the "undismissable alerts" issue where old AI findings would appear.
2025-12-30 22:06:07 +00:00
rcourtman
28ac86c8ab fix: reduce WebSocket reconnection log noise in host agent
Addresses #866 - agents were logging 'WebSocket connection failed' warnings
even during normal reconnection scenarios (server restart, network blip, etc).

Changes:
- Normal close errors (1000, 1001, connection reset) now log at Debug level
- Only log Warning after 3+ consecutive failures
- Changed 'Connecting to Pulse' from Info to Debug to reduce noise
- Successful connections still log at Info level

The WebSocket is only used for AI command execution, not metrics, so
transient disconnections don't affect monitoring functionality.
2025-12-22 14:11:23 +00:00
rcourtman
fdacb60969 fix(ai): filter out noise from AI intelligence display
Critical fixes to show only actionable insights:

1. Skip stopped VMs/containers from anomaly detection
   - '0.0x baseline' for stopped resources is expected, not an anomaly
   - Only check anomalies for status='running'

2. Filter correlations by confidence (>=70%)
   - Low confidence correlations are likely coincidental
   - Only show high-confidence, actionable dependencies

This reduces noise and surfaces genuinely useful intelligence.
2025-12-21 12:41:27 +00:00
rcourtman
9aa266a615 feat(ai): make anomaly detection FREE and add learning status endpoint
Free Features (no license required):
- Anomaly detection - removed license gating, purely statistical analysis
- Learning status endpoint - GET /api/ai/intelligence/learning

Learning Status Response:
- resources_baselined: count of resources with learned baselines
- total_metrics: total metric baselines (cpu + memory + disk)
- metric_breakdown: {cpu: X, memory: Y, disk: Z}
- status: 'waiting' | 'learning' | 'active'
- message: human-readable description

This makes the AI intelligence features visible to all users,
encouraging upgrades for the full LLM-powered patrol experience.
2025-12-21 11:36:54 +00:00
rcourtman
d9f1f7accd feat(ai): add real-time anomaly detection endpoint
Add /api/ai/intelligence/anomalies endpoint that compares live metrics
against learned baselines to surface deviations - all deterministic
(no LLM required).

Backend:
- Add AnomalyReport struct with severity classification
- Add CheckResourceAnomalies method to baseline store
- Add HandleGetAnomalies API handler
- Add GetStateProvider getter to AI service

Frontend:
- Add AnomalyReport and AnomaliesResponse types
- Add getAnomalies API function
- Add AnomalySeverity type

This is the first step toward surfacing deterministic intelligence
directly in the UI without requiring LLM interaction.
2025-12-21 10:52:54 +00:00
rcourtman
db5e79bb37 fix: Allow Host Agent thresholds to be set to 0 to disable alerting. Related to #864 2025-12-20 20:25:20 +00:00
rcourtman
6f0379f879 feat(api): Add AI intelligence API endpoints
Expose learned AI intelligence data via REST API:

New endpoints:
- GET /api/ai/intelligence/patterns - Detected failure patterns
- GET /api/ai/intelligence/predictions - Failure predictions
- GET /api/ai/intelligence/correlations - Resource correlations
- GET /api/ai/intelligence/changes - Recent infrastructure changes
- GET /api/ai/intelligence/baselines - Learned baselines

All endpoints support ?resource_id filter for per-resource queries.
Changes endpoint supports ?hours filter (default: 24).

Backend additions:
- ai_intelligence_handlers.go - Handler implementations
- baseline.Store.GetAllBaselines() - Flat baseline export
- patrol.GetChangeDetector() - Access change detector

This enables frontend to display:
- 'OOM expected in 3 days based on pattern'
- 'When storage-1 is full, database VM restarts'
- 'VM memory baseline: 60-75%'

All tests passing.
2025-12-12 14:49:46 +00:00