Major new AI capabilities for infrastructure monitoring:
Investigation System:
- Autonomous finding investigation with configurable autonomy levels
- Investigation orchestrator with rate limiting and guardrails
- Safety checks for read-only mode enforcement
- Chat-based investigation with approval workflows
Forecasting & Remediation:
- Trend forecasting for resource capacity planning
- Remediation engine for generating fix proposals
- Circuit breaker for AI operation protection
Unified Findings:
- Unified store bridging alerts and AI findings
- Correlation and root cause analysis
- Incident coordinator with metrics recording
New Frontend:
- AI Intelligence page with patrol controls
- Investigation drawer for finding details
- Unified findings panel with actions
Supporting Infrastructure:
- Learning store for user preference tracking
- Proxmox event ingestion and correlation
- Enhanced patrol with investigation triggers
- Add persistent volume mounts for Go/npm caches (faster rebuilds)
- Add shell config with helpful aliases and custom prompt
- Add comprehensive devcontainer documentation
- Add pre-commit hooks for Go formatting and linting
- Use go-version-file in CI workflows instead of hardcoded versions
- Simplify docker compose commands with --wait flag
- Add gitignore entries for devcontainer auth files
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add critical release section at top of file
- Make VERSION file requirement impossible to miss
- Explain that version_guard job will fail if VERSION doesn't match
Smarter anomaly detection to reduce false positives:
**Learning Window:** 7 days → 14 days
- Captures weekly patterns (weekday vs weekend)
**Metric-Specific Thresholds:**
CPU:
- Only report if usage >70% AND >2x baseline
- Low CPU variance (5% vs 10%) is not actionable
Memory:
- Report if >80% OR (>1.5x baseline AND >60%)
- Memory is more stable, lower threshold makes sense
Disk:
- Report if >85% usage OR +15 percentage points growth
- Disk problems are critical, use absolute thresholds
Other metrics:
- Use 2x threshold as default
This dramatically reduces 'noise' anomalies while catching
actual problems that need operator attention.
More aggressive noise filtering:
1. Anomaly threshold raised from 1.5x to 2x
- 1.5x is too borderline to be actionable
- Now requires genuinely significant deviation
2. Filter out 'Ran diagnostic' and 'Executed command' fallback items
- These are generic summaries that provide no value
- Only show remediations with specific, meaningful descriptions
Goal: If something shows in AI Intelligence, it should demand attention.
Critical changes to surface only actionable insights:
1. Anomalies now require at least 50% deviation from baseline
- '1.0x baseline' values filtered out (statistically significant but not actionable)
- Must be >1.5x above OR <0.5x below baseline to report
2. Status changes filter out startup noise
- 'unknown → running' is just system starting, not a real state change
- Backups removed from main list (they have dedicated section)
3. Only show genuinely interesting changes:
- Config changes, migrations, restarts, deletions
- Things that require operator attention
This massively reduces noise while keeping high-signal alerts.
AIStatusIndicator:
- Now shows BOTH patrol findings AND baseline anomalies
- Displays even when only anomaly detection is active (no patrol)
- Badge count includes both findings + anomalies
- Tooltip provides detailed breakdown by severity
Trend Prediction (backend):
- Add TrendPrediction struct for resource exhaustion forecasting
- CalculateTrend() uses linear regression on sample history
- Predicts days until resource is full (or if declining/stable)
- Severity: critical (<7 days), warning (<30 days), info (>30 days)
- Human-readable descriptions like 'full in ~2 weeks (+0.5% per day)'
This creates a more cohesive intelligence experience where anomaly
detection works independently of the pro/patrol features, making
value visible immediately to all users.
Add /api/ai/intelligence/anomalies endpoint that compares live metrics
against learned baselines to surface deviations - all deterministic
(no LLM required).
Backend:
- Add AnomalyReport struct with severity classification
- Add CheckResourceAnomalies method to baseline store
- Add HandleGetAnomalies API handler
- Add GetStateProvider getter to AI service
Frontend:
- Add AnomalyReport and AnomaliesResponse types
- Add getAnomalies API function
- Add AnomalySeverity type
This is the first step toward surfacing deterministic intelligence
directly in the UI without requiring LLM interaction.
- Add integration tests for Ollama provider (17 tests against real API)
- Add unit tests for baseline, correlation, patterns, memory, knowledge, cost packages
- Add context formatter and builder tests
- Add factory tests for provider initialization
- Add Makefile targets: test-integration, test-all
- Clean up test theatre (removed struct field tests)
Integration tests require Ollama at OLLAMA_URL (default: 192.168.0.124:11434)
Run with: make test-integration
Phase 2 of Pulse AI differentiation:
- Create internal/ai/baseline package for learned baselines
- Implement statistical baseline learning with mean, stddev, percentiles
- Add z-score based anomaly detection with severity classification
(low, medium, high, critical based on standard deviations)
- Integrate baseline provider into context builder
- Wire baseline store into patrol service with adapters
- Add anomaly enrichment to resource contexts
Key features:
- Learn computes baseline from historical metric data points
- IsAnomaly and CheckAnomaly detect deviations from normal
- Persists baselines to disk as JSON for durability
- Formatted anomaly descriptions for AI consumption
Example: 'Memory is high above normal (85.2% vs typical 42.1% ± 8.3%)'
The baseline store needs to be initialized and triggered to learn
from metrics history. Next step is adding the learning loop.
All tests passing.