feat(ai): Add enriched context with historical trends and predictions

Phase 1 of Pulse AI differentiation:

- Create internal/ai/context package with types, trends, builder, formatter
- Implement linear regression for trend computation (growing/declining/stable/volatile)
- Add storage capacity predictions (predicts days until 90% and 100%)
- Wire MetricsHistory from monitor to patrol service
- Update patrol to use buildEnrichedContext instead of basic summary
- Update patrol prompt to reference trend indicators and predictions

This gives the AI awareness of historical patterns, enabling it to:
- Identify resources with concerning growth rates
- Predict capacity exhaustion before it happens
- Distinguish between stable high usage vs growing problems
- Provide more actionable, time-aware insights

All tests passing. Falls back to basic summary if metrics history unavailable.
This commit is contained in:
rcourtman 2025-12-12 09:45:57 +00:00
parent cbb89c4b6a
commit 88d419dd5b
24 changed files with 4269 additions and 295 deletions

View file

@ -0,0 +1,407 @@
# Pulse AI Architecture: Long-Term Vision
## The Core Problem
Pulse AI currently provides "AI that can talk to your infrastructure." But this is becoming commodity. Any user can:
1. Install Claude Code / Cursor / Windsurf
2. Give it SSH access to their Proxmox nodes
3. Ask "What's wrong with my infrastructure?"
**We need to provide value that a stateless AI session cannot.**
---
## The Fundamental Insight
A stateless AI with SSH access can answer: **"What is the current state?"**
Pulse, with its continuous monitoring, can answer:
- **"How has this changed over time?"**
- **"What does 'normal' look like for YOUR infrastructure?"**
- **"What's about to go wrong?"**
- **"Have we seen this pattern before?"**
- **"What did you do last time this happened?"**
These require **persistent context** that accumulates over time. This is our moat.
---
## Architecture Principles
### 1. Context is King
The AI is only as useful as the context we provide. We should think of Pulse as a **context accumulation engine** that happens to have an AI interface.
Every piece of data Pulse collects should be available to the AI in a digestible form:
- Real-time metrics
- Historical trends
- User annotations
- Alert history
- Previous AI findings
- Configuration changes
- Remediation history
### 2. Time-Aware Intelligence
The AI should always know:
- What's happening **now**
- What happened **before** (trends, history)
- What will likely happen **next** (forecasts)
- What's **different** from normal (anomalies)
### 3. Learning From Operations
Every interaction with Pulse teaches it about the user's infrastructure:
- Dismissed findings → "This is expected behavior"
- User notes → "This VM runs the critical database"
- Alert patterns → "This resource is flaky on Tuesdays"
- Remediation actions → "Last time this happened, we restarted the service"
### 4. Proactive, Not Just Reactive
The goal isn't just to answer questions. It's to:
- Surface problems before users ask
- Predict capacity issues weeks in advance
- Notice patterns humans would miss
- Remember what humans would forget
---
## Data Architecture
### Layer 1: Real-Time State (Already Have)
```
StateSnapshot
├── Nodes[]
├── VMs[]
├── Containers[]
├── Storage[]
├── DockerHosts[]
├── PBSInstances[]
├── Hosts[]
└── PMGInstances[]
```
This is what we send to the AI today. Point-in-time. Commodity.
### Layer 2: Historical Metrics (Partially Have)
```
MetricsHistory
├── NodeMetrics[nodeID] → {CPU[], Memory[], Disk[]} over time
├── GuestMetrics[guestID] → {CPU[], Memory[], Network[]} over time
└── StorageMetrics[storageID] → {Usage[], Used[], Total[]} over time
```
We collect this for the frontend trendlines, but **don't expose it to the AI**.
### Layer 3: Computed Insights (Need to Build)
```
InsightsStore
├── Trends[resourceID] → {direction, rate_of_change, forecast}
├── Baselines[resourceID] → {normal_cpu_range, normal_memory_range, typical_patterns}
├── Anomalies[resourceID] → {current_deviations, severity}
├── Correlations[] → {resource_a, resource_b, relationship}
└── Predictions[] → {resource, metric, predicted_event, eta}
```
This is computed from historical data and provides **derived intelligence**.
### Layer 4: Operational Memory (Partially Have)
```
OperationalMemory
├── Findings[findingID] → {status, user_response, resolution}
├── Knowledge[guestID] → {user_notes, learned_facts}
├── AlertHistory[] → {alert, duration, resolution, user_action}
├── RemediationLog[] → {problem, action_taken, outcome, timestamp}
└── ChangeLog[] → {resource, what_changed, when, detected_impact}
```
This captures **what happened and how it was handled**.
---
## The AI Context Pipeline
When the AI needs context (for chat, patrol, or alert analysis), we build it in layers:
```
┌─────────────────────────────────────────────────────────────┐
│ CONTEXT ASSEMBLY │
├─────────────────────────────────────────────────────────────┤
│ │
│ 1. CURRENT STATE (required) │
│ - Real-time metrics for relevant resources │
│ - Current alerts and their status │
│ │
│ 2. HISTORICAL CONTEXT (high value) │
│ - Trends: "Memory has been growing 3%/day for 5 days" │
│ - Baselines: "Normal CPU for this VM is 5-15%" │
│ - Anomalies: "Current 45% is 3σ above normal" │
│ │
│ 3. OPERATIONAL CONTEXT (essential for continuity) │
│ - Previous findings for this resource │
│ - User notes: "This is the production database" │
│ - Past remediations: "We increased RAM last month" │
│ │
│ 4. PREDICTIVE CONTEXT (proactive value) │
│ - Forecasts: "At current rate, disk full in 12 days" │
│ - Pattern alerts: "This usually fails after X" │
│ - Correlations: "When A spikes, B usually follows" │
│ │
│ 5. USER CONTEXT (personalization) │
│ - Infrastructure notes: "This is a homelab" │
│ - Preferences: "I prefer conservative recommendations" │
│ - Expertise level: "User is comfortable with CLI" │
│ │
└─────────────────────────────────────────────────────────────┘
```
---
## Implementation Roadmap
### Phase 1: Historical Context Integration
**Goal**: Make the AI aware of trends and history, not just current state.
1. **Create `internal/ai/context/` package**
- `historical.go` - Pull data from MetricsHistory
- `trends.go` - Compute trend direction, rate of change
- `formatter.go` - Format for AI consumption
2. **Trend Computation**
- Simple linear regression for direction
- Rate of change calculation
- Stability classification (stable/growing/declining/volatile)
3. **Integrate into Patrol and Chat**
- `buildEnrichedContext()` replaces `buildInfrastructureSummary()`
- Include "Last 24h" and "Last 7d" summaries
**Example output:**
```markdown
## VM: webserver (node: minipc)
Current: CPU=12%, Memory=67%, Disk=45%
24h Trend: CPU stable (8-15%), Memory growing +1.2%/hr, Disk stable
7d Trend: Memory +15% total (was 52% a week ago)
Baseline: CPU normal=5-20%, Memory normal=45-60% (currently elevated)
```
### Phase 2: Anomaly Detection
**Goal**: Automatically detect when something is "unusual" for this specific infrastructure.
1. **Baseline Learning**
- Track rolling statistics per resource (mean, std dev, percentiles)
- Time-of-day / day-of-week patterns
- Persist baselines across restarts
2. **Anomaly Scoring**
- Statistical deviation from baseline
- Pattern breaks (e.g., usually low at night, now high)
- Sudden changes vs. gradual drift
3. **Anomaly Context for AI**
- "This is unusual" annotations
- Confidence levels
- Similar past anomalies and outcomes
**Example output:**
```markdown
⚠️ ANOMALY: VM 'database' memory at 89%
- Baseline for this time: 45-55%
- Current value is 4.2σ above normal
- Similar anomaly 2 weeks ago led to OOM (resolved by restart)
```
### Phase 3: Operational Memory
**Goal**: The AI remembers what happened and what worked.
1. **Remediation Logging**
- When AI suggests/executes a fix, log it
- Track outcome (did it work? for how long?)
- Link to findings
2. **Change Detection**
- Detect configuration changes (new VMs, resource changes)
- Correlate changes with subsequent issues
- "This problem started 2 days after you added GPU passthrough"
3. **Solution Database**
- Index past problems and solutions
- "We've seen this before: [link to past finding]"
- "Last time, restarting the service fixed it"
**Example output:**
```markdown
## Historical Context for VM 'webserver'
- Created: 6 months ago
- Last modified: 2 weeks ago (RAM increased 4GB→8GB)
- Past issues:
- 2 weeks ago: High memory (resolved by RAM increase)
- 1 month ago: Disk full (resolved by log rotation)
- User note: "Runs production web app, critical 9-5"
```
### Phase 4: Predictive Intelligence
**Goal**: Warn users before problems occur.
1. **Capacity Forecasting**
- Extrapolate growth trends
- "Storage will be full in X days at current rate"
- Account for patterns (e.g., weekly backup spikes)
2. **Failure Prediction**
- Resources that fail periodically (e.g., OOM every 2 weeks)
- Predict next occurrence
- "This container typically OOMs every ~10 days, last was 8 days ago"
3. **Correlation-Based Alerts**
- "When VM A memory exceeds 80%, VM B usually crashes within 2 hours"
- Learn these from historical data
**Example output:**
```markdown
## Predictions
⏰ Storage 'local-zfs': Full in ~18 days at current growth rate
⏰ Container 'logstash': Historically OOMs every 10-14 days (last: 9 days ago)
⏰ Backup jobs: Growing 5% per week, will exceed window in ~6 weeks
```
### Phase 5: Multi-Resource Correlation
**Goal**: Understand relationships between resources.
1. **Automatic Correlation Detection**
- When A spikes, does B spike?
- When A restarts, does B show errors?
- Statistical correlation over time
2. **Dependency Mapping**
- User-provided: "This VM depends on that NFS storage"
- Inferred: "These 3 containers always restart together"
3. **Cascade Analysis**
- "If node X goes down, these 5 critical VMs are affected"
- "Storage Y failing would impact 12 backup jobs"
---
## AI Prompt Structure
With this architecture, a typical AI prompt would look like:
```markdown
# Infrastructure Analysis Request
## Target Resource
VM 'database' (ID: 102, Node: pve-main)
## Current State
- Status: running
- CPU: 78% (normal: 15-30%)
- Memory: 92% (normal: 60-75%)
- Disk: 67% (stable)
- Uptime: 45 days
## Historical Context (7 days)
- Memory: Growing +2.1%/day (was 77% 7 days ago)
- CPU: Elevated since 3 days ago (was 20%)
- Pattern: No daily cycles detected, continuous growth
## Anomaly Score: HIGH
- Memory 2.8σ above baseline
- CPU 3.1σ above baseline
- Combined anomaly score: 87/100
## Operational History
- Last issue: 3 months ago, high memory (user added swap, resolved)
- User notes: "Production PostgreSQL, critical, no downtime allowed"
- Related resources: Depends on storage 'ceph-ssd', accessed by VMs 105, 107, 112
## Recent Changes
- 4 days ago: VM 105 ('app-server') was updated
- 3 days ago: This VM's CPU started increasing
## Predictions
- At current rate, memory will hit 100% in ~4 days
- Similar pattern to last incident (high memory leading to OOM)
## User Question
"Why is my database server slow?"
```
**This context is impossible to replicate with a stateless SSH session.**
---
## Success Metrics
How do we know Pulse AI is providing value?
1. **Predictive Accuracy**
- Did our capacity forecasts come true?
- Did predicted failures occur?
2. **Time to Resolution**
- How long from problem detection to resolution?
- Compare AI-assisted vs. manual
3. **Proactive Catches**
- Problems found by patrol before user noticed
- Predictions that led to preventive action
4. **User Engagement**
- Are users adding notes? (means they trust the system)
- Are they dismissing findings with reasons? (feedback loop)
- Repeat usage of chat feature
5. **Context Utilization**
- Is the AI using historical context in responses?
- Are predictions being cited in findings?
---
## Technical Considerations
### Data Retention
- Short-term (24h): High-resolution metrics for immediate analysis
- Medium-term (7-30d): Hourly aggregates for trend analysis
- Long-term (90d+): Daily summaries for baseline/pattern learning
### Performance
- Context building must be fast (<100ms)
- Precompute expensive analytics (trends, baselines) on schedule
- Cache formatted context, invalidate on significant changes
### Storage
- Baselines and insights are small, store in SQLite or JSON
- Historical metrics can grow; implement rollup/aggregation
- Consider time-series database for scale (InfluxDB, TimescaleDB)
### Privacy
- All data stays local (no cloud sync of infrastructure data)
- AI context is built locally, only prompts go to API
- User controls what context is included
---
## Summary
The path to differentiating Pulse AI:
| Today | Tomorrow |
|-------|----------|
| "Here's your current state" | "Here's what's changed and why it matters" |
| "This metric is high" | "This is unusual for YOUR infrastructure" |
| "You should check X" | "Last time this happened, you did Y and it worked" |
| "Something might be wrong" | "X will fail in 5 days if this continues" |
| Stateless queries | Accumulated operational intelligence |
**The AI becomes more valuable the longer Pulse runs.** This is the moat.

View file

@ -0,0 +1,709 @@
# Pulse AI Implementation Plan
This document outlines the concrete implementation steps to realize the Pulse AI vision.
---
## Current State Audit
### What We Have
| Component | Location | Status |
|-----------|----------|--------|
| Real-time state | `models.StateSnapshot` | ✅ Complete |
| Metrics collection | `monitoring.MetricsHistory` | ✅ Collecting, exposed to AI |
| Finding persistence | `ai.FindingsStore` | ✅ Works |
| Knowledge store | `ai/knowledge.Store` | ✅ Per-guest notes |
| Alert context | `ai.buildAlertContext()` | ✅ Current alerts only |
| User annotations | `buildUserAnnotationsContext()` | ✅ Basic |
| Base patrol | `patrol.go` | ✅ Heuristics + optional AI |
| **AI Context package** | `ai/context/` | ✅ **NEW - Phase 1** |
| **Trend computation** | `ai/context/trends.go` | ✅ **NEW - Linear regression** |
| **Context builder** | `ai/context/builder.go` | ✅ **NEW - Orchestration** |
| **Metrics adapter** | `ai/metrics_history_adapter.go` | ✅ **NEW - Wiring** |
### What's Missing
| Component | Impact | Priority | Status |
|-----------|--------|----------|--------|
| Historical context for AI | Core differentiator | P0 | ✅ Done |
| Trend computation | Predictive capability | P0 | ✅ Done |
| Baseline learning | Anomaly detection | P1 | 🔲 Next |
| Change detection | Root cause analysis | P1 | 🔲 Planned |
| Remediation logging | Operational memory | P2 | 🔲 Planned |
| Correlation engine | Advanced insights | P2 | 🔲 Future |
| Capacity forecasting | Proactive alerts | P1 | ⚡ Partial (storage predictions) |
---
## Phase 1: Foundation - AI Context Package
**Goal**: Create a clean abstraction for building AI context with historical data.
### 1.1 New Package Structure
```
internal/ai/context/
├── builder.go # Main context builder orchestrator
├── current.go # Current state formatting (refactor from patrol)
├── historical.go # Historical metrics integration
├── trends.go # Trend computation
├── insights.go # Combined insights (anomalies, predictions)
├── formatter.go # AI-friendly text formatting
└── types.go # Shared types
```
### 1.2 Core Types
```go
// types.go
// ResourceContext contains all context for a single resource
type ResourceContext struct {
ResourceID string
ResourceType string // "node", "vm", "container", "storage", "docker_host"
ResourceName string
// Current state
Current CurrentState
// Historical analysis
Trends map[string]Trend // metric -> trend
Baselines map[string]Baseline // metric -> baseline
Anomalies []Anomaly
// Operational memory
PastFindings []FindingSummary
UserNotes []string
RecentChanges []Change
LastRemediation *RemediationRecord
}
// Trend represents the direction and rate of change for a metric
type Trend struct {
Metric string
Direction TrendDirection // stable, growing, declining, volatile
RatePerHour float64 // rate of change per hour
RatePerDay float64 // rate of change per day
Current float64
Average24h float64
Average7d float64
Min24h float64
Max24h float64
DataPoints int // how much history we have
Confidence float64 // 0-1, based on data quality
}
type TrendDirection string
const (
TrendStable TrendDirection = "stable"
TrendGrowing TrendDirection = "growing"
TrendDeclining TrendDirection = "declining"
TrendVolatile TrendDirection = "volatile"
)
// Baseline represents learned "normal" for a metric
type Baseline struct {
Metric string
Mean float64
StdDev float64
P5 float64 // 5th percentile
P95 float64 // 95th percentile
SampleSize int
LearnedAt time.Time
}
// Anomaly represents a detected deviation from normal
type Anomaly struct {
Metric string
Current float64
Expected float64 // baseline mean
Deviation float64 // standard deviations from mean
Severity string // "low", "medium", "high", "critical"
Since time.Time
Description string
}
// Prediction represents a forecasted event
type Prediction struct {
ResourceID string
Metric string
Event string // "capacity_full", "oom", "pattern_repeat"
ETA time.Time
Confidence float64
Basis string // explanation of prediction
}
```
### 1.3 Context Builder
```go
// builder.go
type ContextBuilder struct {
stateProvider StateProvider
metricsHistory *monitoring.MetricsHistory
findingsStore *FindingsStore
knowledgeStore *knowledge.Store
baselineStore *BaselineStore
// Configuration
includeTrends bool
includeBaselines bool
includeHistory bool
historicalWindow time.Duration
}
// BuildForResource creates comprehensive context for a single resource
func (b *ContextBuilder) BuildForResource(resourceID string) (*ResourceContext, error)
// BuildForInfrastructure creates summarized context for all infrastructure
func (b *ContextBuilder) BuildForInfrastructure() (*InfrastructureContext, error)
// FormatForAI converts context to AI-consumable markdown
func (b *ContextBuilder) FormatForAI(ctx *ResourceContext) string
// FormatInfrastructureForAI converts full infrastructure context
func (b *ContextBuilder) FormatInfrastructureForAI(ctx *InfrastructureContext) string
```
### 1.4 Trend Computation
```go
// trends.go
// ComputeTrend calculates trend from historical data points
func ComputeTrend(points []monitoring.MetricPoint, window time.Duration) Trend {
if len(points) < 2 {
return Trend{Confidence: 0}
}
// Calculate basic statistics
avg, min, max, stddev := computeStats(points)
// Linear regression for direction and rate
slope, r2 := linearRegression(points)
// Classify direction
direction := classifyTrend(slope, stddev, avg)
// Rate per hour/day
ratePerHour := slope * 3600 // slope is per second
ratePerDay := ratePerHour * 24
return Trend{
Direction: direction,
RatePerHour: ratePerHour,
RatePerDay: ratePerDay,
Current: points[len(points)-1].Value,
Average24h: avg,
Min24h: min,
Max24h: max,
DataPoints: len(points),
Confidence: r2,
}
}
func classifyTrend(slope, stddev, avg float64) TrendDirection {
// Normalize slope relative to value magnitude
if avg == 0 {
avg = 1 // avoid division by zero
}
normalizedSlope := (slope * 3600) / avg // hourly change as fraction of avg
// Threshold based on volatility
threshold := 0.01 // 1% per hour is significant
if stddev/avg > 0.2 {
return TrendVolatile
}
if normalizedSlope > threshold {
return TrendGrowing
}
if normalizedSlope < -threshold {
return TrendDeclining
}
return TrendStable
}
```
### 1.5 Integration with Existing Code
```go
// In patrol.go, replace buildInfrastructureSummary:
func (p *PatrolService) buildEnrichedContext(state models.StateSnapshot) string {
builder := context.NewBuilder(
p.stateProvider,
p.metricsHistory,
p.findings,
p.knowledgeStore,
p.baselineStore,
)
infraCtx, err := builder.BuildForInfrastructure()
if err != nil {
log.Warn().Err(err).Msg("Failed to build enriched context, falling back")
return p.buildBasicSummary(state)
}
return builder.FormatInfrastructureForAI(infraCtx)
}
```
---
## Phase 2: Baseline Learning
**Goal**: Learn what "normal" looks like for each resource so we can detect anomalies.
### 2.1 Baseline Store
```go
// internal/ai/baseline/store.go
type Store struct {
mu sync.RWMutex
baselines map[string]*ResourceBaseline // resourceID -> baselines
persistence Persistence
// Configuration
learningWindow time.Duration // how far back to learn from (default: 7 days)
minSamples int // minimum samples needed (default: 100)
updateInterval time.Duration // how often to recompute (default: 1 hour)
}
type ResourceBaseline struct {
ResourceID string
LastUpdated time.Time
Metrics map[string]*MetricBaseline // metric name -> baseline
}
type MetricBaseline struct {
Mean float64
StdDev float64
Percentiles map[int]float64 // 5, 25, 50, 75, 95
SampleCount int
// Time-of-day patterns (optional, phase 2+)
HourlyMeans [24]float64
}
// Learn computes baselines from historical data
func (s *Store) Learn(resourceID string, history *monitoring.MetricsHistory) error
// GetBaseline returns the baseline for a resource/metric
func (s *Store) GetBaseline(resourceID, metric string) (*MetricBaseline, bool)
// IsAnomaly checks if a value is anomalous given the baseline
func (s *Store) IsAnomaly(resourceID, metric string, value float64) (bool, float64)
```
### 2.2 Background Learning Loop
```go
// Run as part of patrol service or separate goroutine
func (s *Store) StartLearningLoop(ctx context.Context, interval time.Duration) {
ticker := time.NewTicker(interval)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
s.updateAllBaselines()
}
}
}
func (s *Store) updateAllBaselines() {
// Get list of all resources with metrics
resources := s.metricsHistory.GetResourceIDs()
for _, resourceID := range resources {
if err := s.Learn(resourceID, s.metricsHistory); err != nil {
log.Warn().Err(err).Str("resource", resourceID).Msg("Failed to update baseline")
}
}
// Persist updated baselines
s.save()
}
```
### 2.3 Anomaly Detection
```go
// internal/ai/anomaly/detector.go
type Detector struct {
baselineStore *baseline.Store
// Thresholds
warningThreshold float64 // default: 2.0 std devs
criticalThreshold float64 // default: 3.0 std devs
}
type Detection struct {
ResourceID string
Metric string
CurrentValue float64
ExpectedMean float64
StdDev float64
ZScore float64
Severity AnomalySeverity
DetectedAt time.Time
}
func (d *Detector) Check(resourceID, metric string, value float64) *Detection {
baseline, ok := d.baselineStore.GetBaseline(resourceID, metric)
if !ok || baseline.SampleCount < 50 {
return nil // not enough data yet
}
zScore := (value - baseline.Mean) / baseline.StdDev
absZ := math.Abs(zScore)
if absZ < d.warningThreshold {
return nil // within normal range
}
severity := AnomalyWarning
if absZ >= d.criticalThreshold {
severity = AnomalyCritical
}
return &Detection{
ResourceID: resourceID,
Metric: metric,
CurrentValue: value,
ExpectedMean: baseline.Mean,
StdDev: baseline.StdDev,
ZScore: zScore,
Severity: severity,
DetectedAt: time.Now(),
}
}
```
---
## Phase 3: Operational Memory
**Goal**: Remember what happened, what users said, and what worked.
### 3.1 Change Detection
```go
// internal/ai/memory/changes.go
type ChangeDetector struct {
previousState map[string]ResourceSnapshot
mu sync.RWMutex
changes []Change
maxChanges int
persistence Persistence
}
type Change struct {
ID string
ResourceID string
ChangeType ChangeType
Before interface{}
After interface{}
DetectedAt time.Time
Description string
}
type ChangeType string
const (
ChangeCreated ChangeType = "created"
ChangeDeleted ChangeType = "deleted"
ChangeConfig ChangeType = "config" // RAM, CPU allocation changed
ChangeStatus ChangeType = "status" // started, stopped
ChangeMigrated ChangeType = "migrated" // moved to different node
)
func (d *ChangeDetector) Detect(current models.StateSnapshot) []Change {
// Compare current state to previous
// Detect new resources, deleted resources, config changes
// Store changes and return new ones
}
```
### 3.2 Remediation Logging
```go
// internal/ai/memory/remediation.go
type RemediationLog struct {
mu sync.RWMutex
records []RemediationRecord
persistence Persistence
}
type RemediationRecord struct {
ID string
Timestamp time.Time
ResourceID string
FindingID string // linked AI finding if any
Problem string // what was wrong
Action string // what was done
Outcome Outcome // did it work?
Duration time.Duration // how long until resolved
Note string // optional user/AI note
}
type Outcome string
const (
OutcomeResolved Outcome = "resolved"
OutcomePartial Outcome = "partial"
OutcomeFailed Outcome = "failed"
OutcomeUnknown Outcome = "unknown"
)
// Log records a remediation action
func (r *RemediationLog) Log(record RemediationRecord) error
// GetForResource returns remediation history for a resource
func (r *RemediationLog) GetForResource(resourceID string, limit int) []RemediationRecord
// GetSimilar finds similar past remediations
func (r *RemediationLog) GetSimilar(problem string, limit int) []RemediationRecord
```
### 3.3 Integration Points
When the AI executes a command:
```go
func (s *Service) onToolComplete(toolID, command, output string, success bool) {
// Log the remediation attempt
s.remediationLog.Log(RemediationRecord{
ID: uuid.New().String(),
Timestamp: time.Now(),
ResourceID: s.currentContext.TargetID,
FindingID: s.currentContext.FindingID,
Problem: s.currentContext.Problem,
Action: command,
Outcome: outcomeFromSuccess(success),
})
}
```
When a finding is resolved:
```go
func (s *FindingsStore) Resolve(findingID string, auto bool) bool {
// Link to any remediation actions
// Record what was done
}
```
---
## Phase 4: Capacity Forecasting
**Goal**: Predict when resources will run out.
### 4.1 Forecaster
```go
// internal/ai/forecast/capacity.go
type CapacityForecaster struct {
metricsHistory *monitoring.MetricsHistory
minDataPoints int // minimum points needed for forecast
}
type CapacityForecast struct {
ResourceID string
Metric string
CurrentUsage float64
Limit float64
GrowthRate float64 // per day
ETA time.Time // when it hits limit
DaysLeft float64
Confidence float64 // 0-1
// Projection points for visualization
Projection []ProjectionPoint
}
func (f *CapacityForecaster) Forecast(resourceID, metric string, limit float64) (*CapacityForecast, error) {
points := f.metricsHistory.GetMetrics(resourceID, metric, 7*24*time.Hour)
if len(points) < f.minDataPoints {
return nil, ErrInsufficientData
}
// Linear regression for growth rate
slope, r2 := linearRegression(points)
if slope <= 0 {
return nil, nil // not growing
}
current := points[len(points)-1].Value
remaining := limit - current
hoursUntilFull := remaining / (slope * 3600)
if hoursUntilFull <= 0 {
return nil, nil // already at limit
}
eta := time.Now().Add(time.Duration(hoursUntilFull) * time.Hour)
return &CapacityForecast{
ResourceID: resourceID,
Metric: metric,
CurrentUsage: current,
Limit: limit,
GrowthRate: slope * 86400, // per day
ETA: eta,
DaysLeft: hoursUntilFull / 24,
Confidence: r2,
}, nil
}
```
### 4.2 Integration with Patrol
```go
func (p *PatrolService) generateForecasts(state models.StateSnapshot) []Prediction {
var predictions []Prediction
// Forecast storage capacity
for _, storage := range state.Storage {
if storage.Total == 0 {
continue
}
forecast, err := p.forecaster.Forecast(storage.ID, "used", float64(storage.Total))
if err != nil || forecast == nil {
continue
}
if forecast.DaysLeft < 30 && forecast.Confidence > 0.5 {
predictions = append(predictions, Prediction{
ResourceID: storage.ID,
Metric: "storage_capacity",
Event: "capacity_full",
ETA: forecast.ETA,
Confidence: forecast.Confidence,
Basis: fmt.Sprintf("Growing %.1f GB/day", forecast.GrowthRate/1e9),
})
}
}
// Forecast VM memory (could predict OOM)
// Forecast backup storage growth
// etc.
return predictions
}
```
---
## File System Layout (Final)
```
internal/ai/
├── context/
│ ├── builder.go # Main orchestrator
│ ├── current.go # Current state extraction
│ ├── historical.go # Historical data integration
│ ├── trends.go # Trend computation
│ ├── formatter.go # AI-friendly formatting
│ └── types.go # Shared types
├── baseline/
│ ├── store.go # Baseline storage and learning
│ ├── persistence.go # Disk persistence
│ └── learning.go # Statistical learning
├── anomaly/
│ ├── detector.go # Anomaly detection
│ └── types.go
├── forecast/
│ ├── capacity.go # Capacity forecasting
│ └── patterns.go # Pattern-based prediction
├── memory/
│ ├── changes.go # Change detection
│ ├── remediation.go # Remediation logging
│ └── persistence.go
├── knowledge/ # (existing)
│ ├── store.go
│ └── store_test.go
├── providers/ # (existing)
├── findings.go # (existing)
├── patrol.go # (existing, will use new context/)
├── service.go # (existing, will use new context/)
└── routing.go # (existing)
```
---
## Migration Strategy
### Step 1: Add without changing
Create new packages (`context/`, `baseline/`, etc.) that work alongside existing code. Don't break anything.
### Step 2: Wire up to MetricsHistory
Pass `*monitoring.MetricsHistory` to the AI service at startup. Required for historical context.
### Step 3: Switch patrol to enriched context
Replace `buildInfrastructureSummary` with `buildEnrichedContext` behind a feature flag.
### Step 4: Add baseline learning
Start computing baselines in background. Initially just store, don't act.
### Step 5: Enable anomaly annotations
Add anomaly context to AI prompts. Let AI mention anomalies in findings.
### Step 6: Add forecasts
Enable capacity forecasting. Create new finding types for predicted issues.
### Step 7: Phase out old code
Remove deprecated methods once new system is stable.
---
## Testing Strategy
1. **Unit tests** for trend computation, baseline learning, anomaly detection
2. **Integration tests** with mock metrics history
3. **Golden file tests** for AI context formatting (ensure consistent output)
4. **Baseline learning tests** with synthetic time-series data
5. **Forecast accuracy tests** with historical data validation
---
## Success Criteria
Phase 1 complete when:
- AI prompts include historical trends for all resources
- "24h trend" visible in patrol output
Phase 2 complete when:
- Baselines computed automatically
- Anomalies flagged in AI context
- "X is unusual" appearing in findings
Phase 3 complete when:
- Changes detected and logged
- Remediation history queryable
- "Last time this happened..." in AI responses
Phase 4 complete when:
- Capacity forecasts generated
- "Full in X days" predictions accurate
- Predictive findings created before issues occur

View file

@ -0,0 +1,410 @@
package context
import (
"strings"
"time"
"github.com/rcourtman/pulse-go-rewrite/internal/models"
"github.com/rs/zerolog/log"
)
// MetricsHistoryProvider is the interface for accessing historical metrics
// This avoids importing the monitoring package directly
type MetricsHistoryProvider interface {
GetNodeMetrics(nodeID string, metricType string, duration time.Duration) []MetricPoint
GetGuestMetrics(guestID string, metricType string, duration time.Duration) []MetricPoint
GetAllGuestMetrics(guestID string, duration time.Duration) map[string][]MetricPoint
GetAllStorageMetrics(storageID string, duration time.Duration) map[string][]MetricPoint
}
// KnowledgeProvider provides user annotations and notes
type KnowledgeProvider interface {
GetNotes(guestID string) []string
FormatAllForContext() string
}
// FindingsProvider provides past findings for operational memory
type FindingsProvider interface {
GetDismissedForContext() string
GetPastFindingsForResource(resourceID string) []string
}
// Builder constructs enriched AI context from multiple data sources
type Builder struct {
// Data sources
metricsHistory MetricsHistoryProvider
knowledge KnowledgeProvider
findings FindingsProvider
// Configuration
trendWindow24h time.Duration
trendWindow7d time.Duration
includeHistory bool
includeTrends bool
includeBaseline bool
}
// NewBuilder creates a new context builder
func NewBuilder() *Builder {
return &Builder{
trendWindow24h: 24 * time.Hour,
trendWindow7d: 7 * 24 * time.Hour,
includeHistory: true,
includeTrends: true,
includeBaseline: false, // Disabled until baseline store is implemented
}
}
// WithMetricsHistory sets the metrics history provider
func (b *Builder) WithMetricsHistory(mh MetricsHistoryProvider) *Builder {
b.metricsHistory = mh
return b
}
// WithKnowledge sets the knowledge provider for user notes
func (b *Builder) WithKnowledge(k KnowledgeProvider) *Builder {
b.knowledge = k
return b
}
// WithFindings sets the findings provider for operational memory
func (b *Builder) WithFindings(f FindingsProvider) *Builder {
b.findings = f
return b
}
// BuildForInfrastructure creates comprehensive context for the entire infrastructure
func (b *Builder) BuildForInfrastructure(state models.StateSnapshot) *InfrastructureContext {
ctx := &InfrastructureContext{
GeneratedAt: time.Now(),
}
// Process nodes
for _, node := range state.Nodes {
trends := b.computeNodeTrends(node.ID)
resourceCtx := FormatNodeForContext(node, trends)
b.enrichWithNotes(&resourceCtx)
ctx.Nodes = append(ctx.Nodes, resourceCtx)
}
// Process VMs
for _, vm := range state.VMs {
if vm.Template {
continue
}
trends := b.computeGuestTrends(vm.ID)
resourceCtx := FormatGuestForContext(
vm.ID, vm.Name, vm.Node, "vm", vm.Status,
vm.CPU, vm.Memory.Usage, vm.Disk.Usage,
vm.Uptime, vm.LastBackup, trends,
)
b.enrichWithNotes(&resourceCtx)
ctx.VMs = append(ctx.VMs, resourceCtx)
}
// Process containers
for _, ct := range state.Containers {
if ct.Template {
continue
}
trends := b.computeGuestTrends(ct.ID)
resourceCtx := FormatGuestForContext(
ct.ID, ct.Name, ct.Node, "container", ct.Status,
ct.CPU, ct.Memory.Usage, ct.Disk.Usage,
ct.Uptime, ct.LastBackup, trends,
)
b.enrichWithNotes(&resourceCtx)
ctx.Containers = append(ctx.Containers, resourceCtx)
}
// Process storage
for _, storage := range state.Storage {
trends := b.computeStorageTrends(storage.ID)
resourceCtx := FormatStorageForContext(storage, trends)
// Add capacity predictions for storage
if predictions := b.computeStoragePredictions(storage, trends); len(predictions) > 0 {
resourceCtx.Predictions = predictions
ctx.Predictions = append(ctx.Predictions, predictions...)
}
ctx.Storage = append(ctx.Storage, resourceCtx)
}
// Process Docker hosts
for _, dh := range state.DockerHosts {
resourceCtx := b.buildDockerHostContext(dh)
ctx.DockerHosts = append(ctx.DockerHosts, resourceCtx)
}
// Process agent hosts
for _, host := range state.Hosts {
resourceCtx := b.buildHostContext(host)
ctx.Hosts = append(ctx.Hosts, resourceCtx)
}
// Calculate totals
ctx.TotalResources = len(ctx.Nodes) + len(ctx.VMs) + len(ctx.Containers) +
len(ctx.Storage) + len(ctx.DockerHosts) + len(ctx.Hosts)
log.Debug().
Int("nodes", len(ctx.Nodes)).
Int("vms", len(ctx.VMs)).
Int("containers", len(ctx.Containers)).
Int("storage", len(ctx.Storage)).
Int("predictions", len(ctx.Predictions)).
Msg("Built enriched infrastructure context")
return ctx
}
// computeNodeTrends computes trends for a node's metrics
func (b *Builder) computeNodeTrends(nodeID string) map[string]Trend {
trends := make(map[string]Trend)
if b.metricsHistory == nil || !b.includeTrends {
return trends
}
// Compute 24h trends for key metrics
for _, metric := range []string{"cpu", "memory"} {
points := b.metricsHistory.GetNodeMetrics(nodeID, metric, b.trendWindow24h)
if len(points) >= 3 {
trend := ComputeTrend(points, metric, b.trendWindow24h)
trends[metric+"_24h"] = trend
}
}
// Also compute 7d trends for capacity planning
for _, metric := range []string{"cpu", "memory"} {
points := b.metricsHistory.GetNodeMetrics(nodeID, metric, b.trendWindow7d)
if len(points) >= 10 {
trend := ComputeTrend(points, metric, b.trendWindow7d)
trends[metric+"_7d"] = trend
}
}
return trends
}
// computeGuestTrends computes trends for a guest's metrics
func (b *Builder) computeGuestTrends(guestID string) map[string]Trend {
trends := make(map[string]Trend)
if b.metricsHistory == nil || !b.includeTrends {
return trends
}
// Get all metrics at once for efficiency
allMetrics := b.metricsHistory.GetAllGuestMetrics(guestID, b.trendWindow7d)
for metric, points := range allMetrics {
if len(points) < 3 {
continue
}
// Compute 24h trend
recent := filterRecentPoints(points, b.trendWindow24h)
if len(recent) >= 3 {
trend := ComputeTrend(recent, metric, b.trendWindow24h)
trends[metric+"_24h"] = trend
}
// Compute 7d trend if enough data
if len(points) >= 10 {
trend := ComputeTrend(points, metric, b.trendWindow7d)
trends[metric+"_7d"] = trend
}
}
return trends
}
// computeStorageTrends computes trends for storage
func (b *Builder) computeStorageTrends(storageID string) map[string]Trend {
trends := make(map[string]Trend)
if b.metricsHistory == nil || !b.includeTrends {
return trends
}
allMetrics := b.metricsHistory.GetAllStorageMetrics(storageID, b.trendWindow7d)
// Focus on usage metric for storage
if points, ok := allMetrics["usage"]; ok && len(points) >= 3 {
recent := filterRecentPoints(points, b.trendWindow24h)
if len(recent) >= 3 {
trends["usage_24h"] = ComputeTrend(recent, "usage", b.trendWindow24h)
}
if len(points) >= 10 {
trends["usage_7d"] = ComputeTrend(points, "usage", b.trendWindow7d)
}
}
return trends
}
// computeStoragePredictions generates capacity predictions for storage
func (b *Builder) computeStoragePredictions(storage models.Storage, trends map[string]Trend) []Prediction {
var predictions []Prediction
// Use 7d trend for more stable prediction
trend, ok := trends["usage_7d"]
if !ok || trend.DataPoints < 10 {
return predictions
}
// Only predict if growing
if trend.Direction != TrendGrowing || trend.RatePerDay <= 0 {
return predictions
}
// Current usage
currentPct := storage.Usage
if currentPct == 0 && storage.Total > 0 {
currentPct = float64(storage.Used) / float64(storage.Total) * 100
}
// Calculate days until 90% (warning) and 100% (critical)
for _, threshold := range []struct {
pct float64
event string
}{
{90, "storage_warning_90pct"},
{100, "storage_full"},
} {
if currentPct >= threshold.pct {
continue // Already past this threshold
}
remaining := threshold.pct - currentPct
daysUntil := remaining / trend.RatePerDay
if daysUntil > 0 && daysUntil <= 30 { // Only predict within 30 days
predictions = append(predictions, Prediction{
ResourceID: storage.ID,
Metric: "usage",
Event: threshold.event,
ETA: time.Now().Add(time.Duration(daysUntil*24) * time.Hour),
DaysUntil: daysUntil,
Confidence: trend.Confidence,
Basis: formatPredictionBasis(trend),
GrowthRate: trend.RatePerDay,
CurrentPct: currentPct,
})
}
}
return predictions
}
// formatPredictionBasis creates explanation for a prediction
func formatPredictionBasis(trend Trend) string {
return "Growing " + formatRate(trend.RatePerDay) + " based on " +
formatDuration(trend.Period) + " of data"
}
// buildDockerHostContext creates context for a Docker host
func (b *Builder) buildDockerHostContext(host models.DockerHost) ResourceContext {
displayName := host.Hostname
if host.DisplayName != "" {
displayName = host.DisplayName
}
ctx := ResourceContext{
ResourceID: host.ID,
ResourceType: "docker_host",
ResourceName: displayName,
Status: host.Status,
Uptime: time.Duration(host.UptimeSeconds) * time.Second,
}
// Note: Docker hosts don't have the same trend data as Proxmox resources
// We could add container-level trends in the future
return ctx
}
// buildHostContext creates context for an agent host
func (b *Builder) buildHostContext(host models.Host) ResourceContext {
displayName := host.Hostname
if host.DisplayName != "" {
displayName = host.DisplayName
}
// Calculate CPU and memory from host data
cpuPct := 0.0
if len(host.LoadAverage) > 0 && host.CPUCount > 0 {
cpuPct = host.LoadAverage[0] / float64(host.CPUCount) * 100
}
memPct := 0.0
if host.Memory.Total > 0 {
memPct = float64(host.Memory.Used) / float64(host.Memory.Total) * 100
}
ctx := ResourceContext{
ResourceID: host.ID,
ResourceType: "host",
ResourceName: displayName,
CurrentCPU: cpuPct,
CurrentMemory: memPct,
Status: host.Status,
Uptime: time.Duration(host.UptimeSeconds) * time.Second,
}
return ctx
}
// enrichWithNotes adds user annotations to context
func (b *Builder) enrichWithNotes(ctx *ResourceContext) {
if b.knowledge == nil {
return
}
notes := b.knowledge.GetNotes(ctx.ResourceID)
if len(notes) > 0 {
ctx.UserNotes = notes
}
}
// filterRecentPoints filters points to only include those within duration
func filterRecentPoints(points []MetricPoint, duration time.Duration) []MetricPoint {
cutoff := time.Now().Add(-duration)
result := make([]MetricPoint, 0, len(points))
for _, p := range points {
if p.Timestamp.After(cutoff) {
result = append(result, p)
}
}
return result
}
// MergeContexts combines context for targeted analysis with relevant infrastructure context
func (b *Builder) MergeContexts(target *ResourceContext, infrastructure *InfrastructureContext) string {
// For targeted requests, highlight the target first, then add relevant related context
var result strings.Builder
result.WriteString("# Target Resource\n")
result.WriteString(FormatResourceContext(*target))
result.WriteString("\n")
// Add related resources (same node, dependencies, etc.)
// This could be expanded with dependency mapping in the future
if target.Node != "" {
result.WriteString("\n## Related Resources\n")
// Find other resources on the same node
for _, vm := range infrastructure.VMs {
if vm.Node == target.Node && vm.ResourceID != target.ResourceID {
result.WriteString(FormatResourceContext(vm))
}
}
for _, ct := range infrastructure.Containers {
if ct.Node == target.Node && ct.ResourceID != target.ResourceID {
result.WriteString(FormatResourceContext(ct))
}
}
}
return result.String()
}

View file

@ -0,0 +1,429 @@
package context
import (
"fmt"
"strings"
"time"
"github.com/rcourtman/pulse-go-rewrite/internal/models"
)
// FormatResourceContext formats a single resource's context for AI consumption
func FormatResourceContext(ctx ResourceContext) string {
var sb strings.Builder
// Header with resource identity
typeLabel := formatResourceType(ctx.ResourceType)
sb.WriteString(fmt.Sprintf("### %s: %s", typeLabel, ctx.ResourceName))
if ctx.Node != "" && ctx.ResourceType != "node" {
sb.WriteString(fmt.Sprintf(" (on %s)", ctx.Node))
}
sb.WriteString("\n")
// Current state
sb.WriteString(fmt.Sprintf("**Status**: %s", ctx.Status))
if ctx.Uptime > 0 {
sb.WriteString(fmt.Sprintf(" | **Uptime**: %s", formatDuration(ctx.Uptime)))
}
sb.WriteString("\n")
// Current metrics
var metrics []string
if ctx.CurrentCPU >= 0 {
metrics = append(metrics, fmt.Sprintf("CPU: %.1f%%", ctx.CurrentCPU))
}
if ctx.CurrentMemory >= 0 {
metrics = append(metrics, fmt.Sprintf("Memory: %.1f%%", ctx.CurrentMemory))
}
if ctx.CurrentDisk >= 0 {
metrics = append(metrics, fmt.Sprintf("Disk: %.1f%%", ctx.CurrentDisk))
}
if len(metrics) > 0 {
sb.WriteString("**Current**: " + strings.Join(metrics, " | ") + "\n")
}
// Trends section (the differentiating context)
if len(ctx.Trends) > 0 {
var trendLines []string
for metric, trend := range ctx.Trends {
if trend.DataPoints < 3 {
continue // Skip if not enough data
}
line := formatTrendLine(metric, trend)
if line != "" {
trendLines = append(trendLines, line)
}
}
if len(trendLines) > 0 {
sb.WriteString("**Trends**: ")
sb.WriteString(strings.Join(trendLines, " | "))
sb.WriteString("\n")
}
}
// Anomalies (high value - what's unusual)
if len(ctx.Anomalies) > 0 {
sb.WriteString("**⚠️ Anomalies**: ")
var anomalyDescs []string
for _, a := range ctx.Anomalies {
anomalyDescs = append(anomalyDescs, a.Description)
}
sb.WriteString(strings.Join(anomalyDescs, "; "))
sb.WriteString("\n")
}
// Predictions (proactive value)
if len(ctx.Predictions) > 0 {
sb.WriteString("**⏰ Predictions**: ")
var predDescs []string
for _, p := range ctx.Predictions {
predDescs = append(predDescs, fmt.Sprintf("%s in ~%.0f days", p.Event, p.DaysUntil))
}
sb.WriteString(strings.Join(predDescs, "; "))
sb.WriteString("\n")
}
// User notes (context that only Pulse knows)
if len(ctx.UserNotes) > 0 {
sb.WriteString("**User Notes**: ")
sb.WriteString(strings.Join(ctx.UserNotes, "; "))
sb.WriteString("\n")
}
// Past issues (operational memory)
if len(ctx.PastIssues) > 0 || ctx.LastRemediation != "" {
sb.WriteString("**History**: ")
if ctx.LastRemediation != "" {
sb.WriteString(ctx.LastRemediation)
}
if len(ctx.PastIssues) > 0 {
sb.WriteString(" Past issues: " + strings.Join(ctx.PastIssues, "; "))
}
sb.WriteString("\n")
}
return sb.String()
}
// formatTrendLine creates a compact trend description
func formatTrendLine(metric string, trend Trend) string {
if trend.DataPoints < 3 {
return ""
}
metricLabel := strings.Title(metric)
// Direction with rate
var directionStr string
switch trend.Direction {
case TrendGrowing:
rate := formatRate(trend.RatePerDay)
directionStr = fmt.Sprintf("↑ %s", rate)
case TrendDeclining:
rate := formatRate(-trend.RatePerDay) // Make positive for display
directionStr = fmt.Sprintf("↓ %s", rate)
case TrendVolatile:
directionStr = "⚡ volatile"
case TrendStable:
directionStr = "→ stable"
default:
return ""
}
// Include range if interesting
rangeStr := ""
if trend.Max-trend.Min > 5 { // Only show range if variation is significant
rangeStr = fmt.Sprintf(" (%.0f-%.0f%%)", trend.Min, trend.Max)
}
return fmt.Sprintf("%s: %s%s", metricLabel, directionStr, rangeStr)
}
// formatRate formats a rate value appropriately
func formatRate(ratePerDay float64) string {
absRate := ratePerDay
if absRate < 0 {
absRate = -absRate
}
if absRate >= 1 {
return fmt.Sprintf("%.1f/day", absRate)
}
// Convert to per hour if < 1/day
ratePerHour := absRate / 24
if ratePerHour >= 0.1 {
return fmt.Sprintf("%.1f/hr", ratePerHour)
}
return "slow"
}
// FormatInfrastructureContext formats full infrastructure context for AI
func FormatInfrastructureContext(ctx *InfrastructureContext) string {
var sb strings.Builder
sb.WriteString("# Infrastructure State with Historical Context\n\n")
sb.WriteString(fmt.Sprintf("*Generated at %s | Monitoring %d resources*\n\n",
ctx.GeneratedAt.Format("2006-01-02 15:04"),
ctx.TotalResources))
// Global anomalies first (high priority)
if len(ctx.Anomalies) > 0 {
sb.WriteString("## ⚠️ Current Anomalies\n")
for _, a := range ctx.Anomalies {
sb.WriteString(fmt.Sprintf("- **%s**: %s\n", a.Metric, a.Description))
}
sb.WriteString("\n")
}
// Predictions (proactive insights)
if len(ctx.Predictions) > 0 {
sb.WriteString("## ⏰ Predictions\n")
for _, p := range ctx.Predictions {
sb.WriteString(fmt.Sprintf("- **%s** on %s: %s (%.0f days, %.0f%% confidence)\n",
p.Event, p.ResourceID, p.Basis, p.DaysUntil, p.Confidence*100))
}
sb.WriteString("\n")
}
// Recent changes (what's different)
if len(ctx.Changes) > 0 {
sb.WriteString("## 🔄 Recent Changes\n")
for _, c := range ctx.Changes {
sb.WriteString(fmt.Sprintf("- %s: %s\n", c.ResourceName, c.Description))
}
sb.WriteString("\n")
}
// Resources by type
if len(ctx.Nodes) > 0 {
sb.WriteString("## Proxmox Nodes\n")
for _, r := range ctx.Nodes {
sb.WriteString(FormatResourceContext(r))
sb.WriteString("\n")
}
}
if len(ctx.VMs) > 0 {
sb.WriteString("## Virtual Machines\n")
for _, r := range ctx.VMs {
sb.WriteString(FormatResourceContext(r))
}
sb.WriteString("\n")
}
if len(ctx.Containers) > 0 {
sb.WriteString("## LXC Containers\n")
for _, r := range ctx.Containers {
sb.WriteString(FormatResourceContext(r))
}
sb.WriteString("\n")
}
if len(ctx.Storage) > 0 {
sb.WriteString("## Storage\n")
for _, r := range ctx.Storage {
sb.WriteString(FormatResourceContext(r))
}
sb.WriteString("\n")
}
if len(ctx.DockerHosts) > 0 {
sb.WriteString("## Docker Hosts\n")
for _, r := range ctx.DockerHosts {
sb.WriteString(FormatResourceContext(r))
}
sb.WriteString("\n")
}
if len(ctx.Hosts) > 0 {
sb.WriteString("## Agent Hosts\n")
for _, r := range ctx.Hosts {
sb.WriteString(FormatResourceContext(r))
}
sb.WriteString("\n")
}
return sb.String()
}
// FormatCompactSummary creates a brief overview suitable for context-limited prompts
func FormatCompactSummary(ctx *InfrastructureContext) string {
var sb strings.Builder
sb.WriteString(fmt.Sprintf("Infrastructure: %d resources\n", ctx.TotalResources))
// Count by status
var healthy, warning, critical int
countResource := func(resources []ResourceContext) {
for _, r := range resources {
switch {
case len(r.Anomalies) > 0:
critical++
case hasGrowingTrend(r):
warning++
default:
healthy++
}
}
}
countResource(ctx.Nodes)
countResource(ctx.VMs)
countResource(ctx.Containers)
countResource(ctx.Storage)
countResource(ctx.DockerHosts)
countResource(ctx.Hosts)
sb.WriteString(fmt.Sprintf("Health: %d healthy, %d warning, %d critical\n", healthy, warning, critical))
if len(ctx.Anomalies) > 0 {
sb.WriteString(fmt.Sprintf("Anomalies: %d active\n", len(ctx.Anomalies)))
}
if len(ctx.Predictions) > 0 {
// Show most urgent prediction
earliest := ctx.Predictions[0]
for _, p := range ctx.Predictions[1:] {
if p.DaysUntil < earliest.DaysUntil {
earliest = p
}
}
sb.WriteString(fmt.Sprintf("⏰ Nearest: %s in %.0f days\n", earliest.Event, earliest.DaysUntil))
}
return sb.String()
}
// hasGrowingTrend checks if any metric trend is concerning
func hasGrowingTrend(r ResourceContext) bool {
for _, t := range r.Trends {
if t.Direction == TrendGrowing && t.RatePerDay > 1 {
return true
}
}
return false
}
// formatResourceType converts internal type to display label
func formatResourceType(t string) string {
switch t {
case "node":
return "Node"
case "vm":
return "VM"
case "container":
return "Container"
case "storage":
return "Storage"
case "docker_host":
return "Docker Host"
case "docker_container":
return "Docker Container"
case "host":
return "Host"
default:
return strings.Title(t)
}
}
// formatDuration formats a duration in human-readable form
func formatDuration(d time.Duration) string {
if d < time.Minute {
return fmt.Sprintf("%ds", int(d.Seconds()))
}
if d < time.Hour {
return fmt.Sprintf("%dm", int(d.Minutes()))
}
if d < 24*time.Hour {
hours := int(d.Hours())
mins := int(d.Minutes()) % 60
if mins > 0 {
return fmt.Sprintf("%dh%dm", hours, mins)
}
return fmt.Sprintf("%dh", hours)
}
days := int(d.Hours() / 24)
hours := int(d.Hours()) % 24
if hours > 0 {
return fmt.Sprintf("%dd%dh", days, hours)
}
return fmt.Sprintf("%dd", days)
}
// FormatBackupStatus creates a human-readable backup status
func FormatBackupStatus(lastBackup time.Time) string {
if lastBackup.IsZero() {
return "never"
}
age := time.Since(lastBackup)
if age < 24*time.Hour {
return fmt.Sprintf("%.0fh ago", age.Hours())
}
days := age.Hours() / 24
return fmt.Sprintf("%.0fd ago", days)
}
// FormatNodeForContext creates context for a Proxmox node
func FormatNodeForContext(node models.Node, trends map[string]Trend) ResourceContext {
// Calculate memory percentage
memPct := 0.0
if node.Memory.Total > 0 {
memPct = float64(node.Memory.Used) / float64(node.Memory.Total) * 100
}
ctx := ResourceContext{
ResourceID: node.ID,
ResourceType: "node",
ResourceName: node.Name,
CurrentCPU: node.CPU * 100, // Convert from 0-1 to percentage
CurrentMemory: memPct,
Status: node.Status,
Uptime: time.Duration(node.Uptime) * time.Second,
Trends: trends,
}
return ctx
}
// FormatGuestForContext creates context for a VM or container
func FormatGuestForContext(
id, name, node, guestType, status string,
cpu, memUsage, diskUsage float64,
uptime int64,
lastBackup time.Time,
trends map[string]Trend,
) ResourceContext {
ctx := ResourceContext{
ResourceID: id,
ResourceType: guestType,
ResourceName: name,
Node: node,
CurrentCPU: cpu * 100, // Convert from 0-1 to percentage
CurrentMemory: memUsage * 100,
CurrentDisk: diskUsage * 100,
Status: status,
Uptime: time.Duration(uptime) * time.Second,
Trends: trends,
}
return ctx
}
// FormatStorageForContext creates context for storage
func FormatStorageForContext(storage models.Storage, trends map[string]Trend) ResourceContext {
usagePct := storage.Usage
if usagePct == 0 && storage.Total > 0 {
usagePct = float64(storage.Used) / float64(storage.Total) * 100
}
ctx := ResourceContext{
ResourceID: storage.ID,
ResourceType: "storage",
ResourceName: storage.Name,
Node: storage.Node,
CurrentDisk: usagePct,
Status: storage.Status,
Trends: trends,
}
return ctx
}

View file

@ -0,0 +1,327 @@
package context
import (
"math"
"sort"
"time"
)
// ComputeTrend calculates trend from historical data points.
// This is the core function that transforms raw metrics into meaningful insights.
func ComputeTrend(points []MetricPoint, metricName string, period time.Duration) Trend {
trend := Trend{
Metric: metricName,
Direction: TrendStable,
Period: period,
DataPoints: len(points),
}
if len(points) < 2 {
trend.Confidence = 0
return trend
}
// Sort by timestamp to ensure correct order
sorted := make([]MetricPoint, len(points))
copy(sorted, points)
sort.Slice(sorted, func(i, j int) bool {
return sorted[i].Timestamp.Before(sorted[j].Timestamp)
})
// Calculate basic statistics
stats := computeStats(sorted)
trend.Average = stats.Mean
trend.Min = stats.Min
trend.Max = stats.Max
trend.StdDev = stats.StdDev
trend.Current = sorted[len(sorted)-1].Value
// Perform linear regression to get slope and fit quality
regression := linearRegression(sorted)
trend.Confidence = regression.R2
// Convert slope from "per second" to "per hour" and "per day"
// Slope is in units/second
trend.RatePerHour = regression.Slope * 3600
trend.RatePerDay = regression.Slope * 86400
// Classify the trend direction
trend.Direction = classifyTrend(regression.Slope, stats.Mean, stats.StdDev)
return trend
}
// computeStats calculates basic statistics for a set of metric points
func computeStats(points []MetricPoint) Stats {
if len(points) == 0 {
return Stats{}
}
stats := Stats{
Count: len(points),
Min: points[0].Value,
Max: points[0].Value,
}
for _, p := range points {
stats.Sum += p.Value
if p.Value < stats.Min {
stats.Min = p.Value
}
if p.Value > stats.Max {
stats.Max = p.Value
}
}
stats.Mean = stats.Sum / float64(stats.Count)
// Calculate standard deviation
var sumSquares float64
for _, p := range points {
diff := p.Value - stats.Mean
sumSquares += diff * diff
}
stats.StdDev = math.Sqrt(sumSquares / float64(stats.Count))
return stats
}
// linearRegression performs simple linear regression on time-series data.
// Returns slope (change per second), intercept, and R² (goodness of fit).
func linearRegression(points []MetricPoint) LinearRegressionResult {
if len(points) < 2 {
return LinearRegressionResult{}
}
n := float64(len(points))
// Use time relative to first point for numerical stability
baseTime := points[0].Timestamp
var sumX, sumY, sumXY, sumX2, sumY2 float64
for _, p := range points {
x := p.Timestamp.Sub(baseTime).Seconds() // seconds since start
y := p.Value
sumX += x
sumY += y
sumXY += x * y
sumX2 += x * x
sumY2 += y * y
}
// Calculate slope and intercept using least squares
denominator := n*sumX2 - sumX*sumX
if math.Abs(denominator) < 1e-10 {
// All x values are the same (no time span)
return LinearRegressionResult{R2: 0}
}
slope := (n*sumXY - sumX*sumY) / denominator
intercept := (sumY - slope*sumX) / n
// Calculate R² (coefficient of determination)
meanY := sumY / n
var ssRes, ssTot float64 // Sum of squares residual and total
for _, p := range points {
x := p.Timestamp.Sub(baseTime).Seconds()
yPred := slope*x + intercept
ssRes += (p.Value - yPred) * (p.Value - yPred)
ssTot += (p.Value - meanY) * (p.Value - meanY)
}
r2 := 0.0
if ssTot > 0 {
r2 = 1 - (ssRes / ssTot)
}
// Clamp R² to [0, 1] (can be negative for very bad fits)
if r2 < 0 {
r2 = 0
}
return LinearRegressionResult{
Slope: slope,
Intercept: intercept,
R2: r2,
}
}
// classifyTrend determines the trend direction based on slope and statistics.
// We normalize the slope relative to the metric's magnitude to avoid
// false positives on high-value metrics.
func classifyTrend(slopePerSecond, mean, stdDev float64) TrendDirection {
// If there's no significant variation, it's stable
if stdDev < 0.01 && math.Abs(slopePerSecond) < 1e-10 {
return TrendStable
}
// If standard deviation is high relative to mean, it's volatile
if mean > 0 && stdDev/mean > 0.3 {
return TrendVolatile
}
// Convert slope to hourly rate for easier reasoning
hourlyRate := slopePerSecond * 3600
// Determine significance threshold based on the metric's scale
// For percentage metrics (0-100), we care about ~0.1% per hour (~2.4% per day)
// This catches slow-growing issues before they become critical
// For absolute metrics, we care about ~0.5% of mean per hour
threshold := 0.1 // Default threshold for percentage metrics
if mean > 100 {
// For larger absolute values, use relative threshold
threshold = mean * 0.005
}
// Check if the hourly change is significant
if hourlyRate > threshold {
return TrendGrowing
}
if hourlyRate < -threshold {
return TrendDeclining
}
return TrendStable
}
// ComputePercentiles calculates percentile values from a sorted slice of points
func ComputePercentiles(points []MetricPoint, percentiles ...int) map[int]float64 {
result := make(map[int]float64)
if len(points) == 0 {
return result
}
// Extract values and sort
values := make([]float64, len(points))
for i, p := range points {
values[i] = p.Value
}
sort.Float64s(values)
for _, p := range percentiles {
if p < 0 || p > 100 {
continue
}
// Calculate index for percentile
idx := float64(p) / 100.0 * float64(len(values)-1)
lower := int(math.Floor(idx))
upper := int(math.Ceil(idx))
if lower >= len(values) {
lower = len(values) - 1
}
if upper >= len(values) {
upper = len(values) - 1
}
if lower == upper {
result[p] = values[lower]
} else {
// Linear interpolation between adjacent values
frac := idx - float64(lower)
result[p] = values[lower]*(1-frac) + values[upper]*frac
}
}
return result
}
// TrendSummary generates a human-readable summary of a trend
func TrendSummary(t Trend) string {
if t.DataPoints < 2 {
return "insufficient data"
}
directionStr := ""
switch t.Direction {
case TrendGrowing:
directionStr = "growing"
case TrendDeclining:
directionStr = "declining"
case TrendVolatile:
directionStr = "volatile"
case TrendStable:
directionStr = "stable"
}
// Format rate based on magnitude
rateStr := ""
if t.Direction == TrendGrowing || t.Direction == TrendDeclining {
absRate := math.Abs(t.RatePerDay)
if absRate > 1 {
rateStr = formatFloat(absRate, 1) + "/day"
} else {
rateStr = formatFloat(math.Abs(t.RatePerHour), 2) + "/hr"
}
}
if rateStr != "" {
return directionStr + " " + rateStr
}
return directionStr
}
// formatFloat formats a float with the given precision, trimming trailing zeros
func formatFloat(v float64, precision int) string {
return trimTrailingZeros(floatToString(v, precision))
}
func floatToString(v float64, precision int) string {
switch precision {
case 0:
return intToString(int(math.Round(v)))
case 1:
return intToString(int(v)) + "." + intToString(int(math.Round((v-float64(int(v)))*10)))
case 2:
return intToString(int(v)) + "." + padLeft(intToString(int(math.Round((v-float64(int(v)))*100))), 2, '0')
default:
mult := math.Pow(10, float64(precision))
return intToString(int(v)) + "." + padLeft(intToString(int(math.Round((v-float64(int(v)))*mult))), precision, '0')
}
}
func intToString(i int) string {
if i < 0 {
return "-" + intToString(-i)
}
if i < 10 {
return string(rune('0' + i))
}
return intToString(i/10) + string(rune('0'+i%10))
}
func padLeft(s string, length int, pad rune) string {
for len(s) < length {
s = string(pad) + s
}
return s
}
func trimTrailingZeros(s string) string {
if s == "" {
return s
}
// Find decimal point
dotIdx := -1
for i, c := range s {
if c == '.' {
dotIdx = i
break
}
}
if dotIdx == -1 {
return s // No decimal point
}
// Trim trailing zeros after decimal
end := len(s)
for end > dotIdx+1 && s[end-1] == '0' {
end--
}
// Also trim decimal if nothing after it
if end == dotIdx+1 {
end = dotIdx
}
return s[:end]
}

View file

@ -0,0 +1,250 @@
package context
import (
"testing"
"time"
)
func TestComputeTrend_Growing(t *testing.T) {
// Create growing data (10% per day)
now := time.Now()
points := make([]MetricPoint, 24)
for i := 0; i < 24; i++ {
// 10% per day = ~0.417% per hour
points[i] = MetricPoint{
Value: 50 + float64(i)*0.417,
Timestamp: now.Add(time.Duration(-24+i) * time.Hour),
}
}
trend := ComputeTrend(points, "memory", 24*time.Hour)
if trend.Direction != TrendGrowing {
t.Errorf("Expected TrendGrowing, got %s", trend.Direction)
}
// Rate should be ~10% per day
if trend.RatePerDay < 8 || trend.RatePerDay > 12 {
t.Errorf("Expected rate ~10/day, got %.2f", trend.RatePerDay)
}
if trend.DataPoints != 24 {
t.Errorf("Expected 24 data points, got %d", trend.DataPoints)
}
}
func TestComputeTrend_Stable(t *testing.T) {
// Create stable data with small fluctuations
now := time.Now()
points := make([]MetricPoint, 24)
for i := 0; i < 24; i++ {
// Small random-looking variation around 50%, but no trend
offset := float64(i%3 - 1) * 0.2
points[i] = MetricPoint{
Value: 50 + offset,
Timestamp: now.Add(time.Duration(-24+i) * time.Hour),
}
}
trend := ComputeTrend(points, "cpu", 24*time.Hour)
if trend.Direction != TrendStable {
t.Errorf("Expected TrendStable, got %s (rate: %.4f/hr)", trend.Direction, trend.RatePerHour)
}
}
func TestComputeTrend_Declining(t *testing.T) {
// Create declining data
now := time.Now()
points := make([]MetricPoint, 24)
for i := 0; i < 24; i++ {
points[i] = MetricPoint{
Value: 80 - float64(i)*0.5, // -12% per day
Timestamp: now.Add(time.Duration(-24+i) * time.Hour),
}
}
trend := ComputeTrend(points, "disk", 24*time.Hour)
if trend.Direction != TrendDeclining {
t.Errorf("Expected TrendDeclining, got %s", trend.Direction)
}
}
func TestComputeTrend_Volatile(t *testing.T) {
// Create volatile data with high variance
now := time.Now()
points := make([]MetricPoint, 24)
for i := 0; i < 24; i++ {
// Alternating high/low values
value := 50.0
if i%2 == 0 {
value = 80.0
} else {
value = 20.0
}
points[i] = MetricPoint{
Value: value,
Timestamp: now.Add(time.Duration(-24+i) * time.Hour),
}
}
trend := ComputeTrend(points, "cpu", 24*time.Hour)
if trend.Direction != TrendVolatile {
t.Errorf("Expected TrendVolatile, got %s (stddev: %.2f, mean: %.2f)",
trend.Direction, trend.StdDev, trend.Average)
}
}
func TestComputeTrend_InsufficientData(t *testing.T) {
// Only one data point
points := []MetricPoint{
{Value: 50, Timestamp: time.Now()},
}
trend := ComputeTrend(points, "memory", 24*time.Hour)
if trend.Confidence != 0 {
t.Errorf("Expected 0 confidence with insufficient data, got %.2f", trend.Confidence)
}
}
func TestLinearRegression_Perfect(t *testing.T) {
// Perfect linear data: y = 2x + 10
now := time.Now()
points := make([]MetricPoint, 10)
for i := 0; i < 10; i++ {
points[i] = MetricPoint{
Value: 10 + float64(i)*2,
Timestamp: now.Add(time.Duration(i) * time.Second),
}
}
result := linearRegression(points)
// Slope should be 2 per second
if result.Slope < 1.9 || result.Slope > 2.1 {
t.Errorf("Expected slope ~2, got %.4f", result.Slope)
}
// R² should be 1 (perfect fit)
if result.R2 < 0.99 {
t.Errorf("Expected R² ~1, got %.4f", result.R2)
}
}
func TestComputePercentiles(t *testing.T) {
now := time.Now()
// Create 100 points with values 1-100
points := make([]MetricPoint, 100)
for i := 0; i < 100; i++ {
points[i] = MetricPoint{
Value: float64(i + 1),
Timestamp: now.Add(time.Duration(i) * time.Second),
}
}
percentiles := ComputePercentiles(points, 5, 50, 95)
// P5 should be ~5
if percentiles[5] < 4 || percentiles[5] > 6 {
t.Errorf("Expected P5 ~5, got %.2f", percentiles[5])
}
// P50 should be ~50
if percentiles[50] < 49 || percentiles[50] > 51 {
t.Errorf("Expected P50 ~50, got %.2f", percentiles[50])
}
// P95 should be ~95
if percentiles[95] < 94 || percentiles[95] > 96 {
t.Errorf("Expected P95 ~95, got %.2f", percentiles[95])
}
}
func TestTrendSummary(t *testing.T) {
tests := []struct {
name string
trend Trend
expected string
}{
{
name: "growing fast",
trend: Trend{
Direction: TrendGrowing,
RatePerDay: 5.5,
RatePerHour: 0.23,
DataPoints: 24,
},
expected: "growing 5.5/day",
},
{
name: "growing slow",
trend: Trend{
Direction: TrendGrowing,
RatePerDay: 0.5,
RatePerHour: 0.02,
DataPoints: 24,
},
expected: "growing 0.02/hr",
},
{
name: "stable",
trend: Trend{
Direction: TrendStable,
DataPoints: 24,
},
expected: "stable",
},
{
name: "volatile",
trend: Trend{
Direction: TrendVolatile,
DataPoints: 24,
},
expected: "volatile",
},
{
name: "insufficient data",
trend: Trend{
DataPoints: 1,
},
expected: "insufficient data",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
result := TrendSummary(tt.trend)
if result != tt.expected {
t.Errorf("Expected %q, got %q", tt.expected, result)
}
})
}
}
func TestComputeStats(t *testing.T) {
points := []MetricPoint{
{Value: 10},
{Value: 20},
{Value: 30},
{Value: 40},
{Value: 50},
}
stats := computeStats(points)
if stats.Count != 5 {
t.Errorf("Expected count 5, got %d", stats.Count)
}
if stats.Min != 10 {
t.Errorf("Expected min 10, got %.2f", stats.Min)
}
if stats.Max != 50 {
t.Errorf("Expected max 50, got %.2f", stats.Max)
}
if stats.Mean != 30 {
t.Errorf("Expected mean 30, got %.2f", stats.Mean)
}
}

View file

@ -0,0 +1,180 @@
// Package context provides AI context building with historical data integration.
// This package transforms raw metrics and state into meaningful, time-aware context
// that differentiates Pulse AI from stateless AI assistants.
package context
import (
"time"
"github.com/rcourtman/pulse-go-rewrite/internal/types"
)
// MetricPoint is an alias for the shared type
type MetricPoint = types.MetricPoint
// TrendDirection indicates whether a metric is growing, stable, or declining
type TrendDirection string
const (
TrendStable TrendDirection = "stable" // No significant change
TrendGrowing TrendDirection = "growing" // Increasing over time
TrendDeclining TrendDirection = "declining" // Decreasing over time
TrendVolatile TrendDirection = "volatile" // Fluctuating significantly
)
// Trend represents the direction and rate of change for a metric
type Trend struct {
Metric string // Name of the metric (cpu, memory, disk)
Direction TrendDirection // Overall direction
RatePerHour float64 // Change per hour (in metric units, e.g., percentage points)
RatePerDay float64 // Change per day
Current float64 // Most recent value
Average float64 // Average over the period
Min float64 // Minimum value
Max float64 // Maximum value
StdDev float64 // Standard deviation
DataPoints int // Number of data points used
Period time.Duration // Time period analyzed
Confidence float64 // 0-1 confidence based on data quality (R² for linear fit)
}
// Baseline represents learned "normal" behavior for a metric
type Baseline struct {
Metric string // Name of the metric
Mean float64 // Average value
StdDev float64 // Standard deviation
P5 float64 // 5th percentile (low boundary)
P50 float64 // Median
P95 float64 // 95th percentile (high boundary)
Min float64 // Observed minimum
Max float64 // Observed maximum
SampleCount int // Number of samples used
LearnedAt time.Time // When baseline was computed
}
// Anomaly represents a detected deviation from normal behavior
type Anomaly struct {
Metric string // Which metric is anomalous
Current float64 // Current value
Expected float64 // Expected value (baseline mean)
Deviation float64 // Number of standard deviations from mean
Severity string // "low", "medium", "high", "critical"
Since time.Time // When the anomaly started (if known)
Description string // Human-readable description
}
// Prediction represents a forecasted future event
type Prediction struct {
ResourceID string // Which resource this prediction is for
Metric string // Which metric
Event string // Type of predicted event (capacity_full, oom, etc.)
ETA time.Time // When the event is predicted to occur
DaysUntil float64 // Days until event
Confidence float64 // 0-1 confidence level
Basis string // Explanation of how prediction was made
GrowthRate float64 // Rate of change used for projection
CurrentPct float64 // Current usage percentage
}
// Change represents a detected configuration or state change
type Change struct {
ResourceID string // Which resource changed
ResourceName string // Display name
ChangeType ChangeType // Type of change
Before interface{} // Previous value (nil for creation)
After interface{} // New value (nil for deletion)
DetectedAt time.Time // When change was detected
Description string // Human-readable description
}
// ChangeType categorizes types of changes
type ChangeType string
const (
ChangeCreated ChangeType = "created" // New resource appeared
ChangeDeleted ChangeType = "deleted" // Resource disappeared
ChangeConfig ChangeType = "config" // Configuration change (RAM, CPU)
ChangeStatus ChangeType = "status" // Status change (started, stopped)
ChangeMigrated ChangeType = "migrated" // Moved to different node
ChangePerformance ChangeType = "performance" // Significant performance shift
)
// ResourceTrends contains all trend data for a single resource
type ResourceTrends struct {
ResourceID string // Unique identifier
ResourceType string // node, vm, container, storage, docker_host
ResourceName string // Display name
Trends map[string]Trend // Metric name -> trend data
DataAvailable bool // Whether we have historical data for this resource
OldestData time.Time // Timestamp of oldest data point
NewestData time.Time // Timestamp of newest data point
}
// ResourceContext contains all context for a single resource
type ResourceContext struct {
ResourceID string
ResourceType string // "node", "vm", "container", "storage", "docker_host"
ResourceName string
Node string // Parent node (for guests)
// Current state (point-in-time)
CurrentCPU float64
CurrentMemory float64
CurrentDisk float64
Status string
Uptime time.Duration
// Historical analysis
Trends map[string]Trend // metric -> trend (24h and 7d)
Baselines map[string]Baseline // metric -> baseline
Anomalies []Anomaly // Current anomalies
// Predictions
Predictions []Prediction
// Operational memory
UserNotes []string // User-provided annotations
PastIssues []string // Summary of past findings
LastRemediation string // What was done last time
RecentChanges []Change // Recent configuration changes
}
// InfrastructureContext contains summarized context for the entire infrastructure
type InfrastructureContext struct {
// Timestamp of this context snapshot
GeneratedAt time.Time
// Summary statistics
TotalResources int
ResourcesWithData int // Resources with historical data available
// Categorized resources with their context
Nodes []ResourceContext
VMs []ResourceContext
Containers []ResourceContext
Storage []ResourceContext
DockerHosts []ResourceContext
Hosts []ResourceContext
// Global insights
Anomalies []Anomaly // Cross-infrastructure anomalies
Predictions []Prediction // Capacity and failure predictions
Changes []Change // Recent changes across infrastructure
}
// Stats contains summary statistics for a metric
type Stats struct {
Count int
Min float64
Max float64
Sum float64
Mean float64
StdDev float64
}
// LinearRegressionResult contains the results of linear regression
type LinearRegressionResult struct {
Slope float64 // Rate of change per second
Intercept float64 // Y-intercept
R2 float64 // Coefficient of determination (0-1)
}

289
internal/ai/cost/store.go Normal file
View file

@ -0,0 +1,289 @@
package cost
import (
"sort"
"strings"
"sync"
"time"
"github.com/rs/zerolog/log"
)
// UsageEvent represents a single AI provider call for cost/token tracking.
// It intentionally excludes prompt/response content for privacy.
type UsageEvent struct {
Timestamp time.Time `json:"timestamp"`
Provider string `json:"provider"`
RequestModel string `json:"request_model"`
ResponseModel string `json:"response_model,omitempty"`
UseCase string `json:"use_case,omitempty"` // "chat" or "patrol"
InputTokens int `json:"input_tokens,omitempty"`
OutputTokens int `json:"output_tokens,omitempty"`
TargetType string `json:"target_type,omitempty"`
TargetID string `json:"target_id,omitempty"`
FindingID string `json:"finding_id,omitempty"`
}
// Persistence defines the storage contract for usage history.
type Persistence interface {
SaveUsageHistory(events []UsageEvent) error
LoadUsageHistory() ([]UsageEvent, error)
}
// DefaultMaxDays is the default retention window for raw usage events.
const DefaultMaxDays = 90
// Store provides thread-safe usage tracking with optional persistence.
type Store struct {
mu sync.RWMutex
events []UsageEvent
maxDays int
persistence Persistence
// Debounced persistence to avoid frequent disk writes.
saveTimer *time.Timer
savePending bool
saveDebounce time.Duration
}
// NewStore creates a new usage store.
func NewStore(maxDays int) *Store {
if maxDays <= 0 {
maxDays = DefaultMaxDays
}
return &Store{
events: make([]UsageEvent, 0),
maxDays: maxDays,
saveDebounce: 5 * time.Second,
}
}
// SetPersistence sets persistence and loads any existing history.
func (s *Store) SetPersistence(p Persistence) error {
s.mu.Lock()
s.persistence = p
s.mu.Unlock()
if p == nil {
return nil
}
events, err := p.LoadUsageHistory()
if err != nil {
return err
}
s.mu.Lock()
s.events = events
s.trimLocked(time.Now())
s.mu.Unlock()
return nil
}
// Record appends a usage event and schedules persistence.
func (s *Store) Record(event UsageEvent) {
if event.Timestamp.IsZero() {
event.Timestamp = time.Now()
}
s.mu.Lock()
s.events = append(s.events, event)
s.trimLocked(time.Now())
s.scheduleSaveLocked()
s.mu.Unlock()
}
// GetSummary returns a rollup of usage over the last N days.
func (s *Store) GetSummary(days int) Summary {
if days <= 0 {
days = 30
}
now := time.Now()
cutoff := now.AddDate(0, 0, -days)
s.mu.RLock()
events := make([]UsageEvent, 0, len(s.events))
for _, e := range s.events {
if !e.Timestamp.Before(cutoff) {
events = append(events, e)
}
}
s.mu.RUnlock()
type pmKey struct {
provider string
model string
}
pmTotals := make(map[pmKey]*ProviderModelSummary)
dailyTotals := make(map[string]*DailySummary)
var totalInput, totalOutput int64
for _, e := range events {
provider := e.Provider
model := normalizeModel(provider, e.RequestModel, e.ResponseModel)
k := pmKey{provider: provider, model: model}
pm := pmTotals[k]
if pm == nil {
pm = &ProviderModelSummary{Provider: provider, Model: model}
pmTotals[k] = pm
}
pm.InputTokens += int64(e.InputTokens)
pm.OutputTokens += int64(e.OutputTokens)
totalInput += int64(e.InputTokens)
totalOutput += int64(e.OutputTokens)
date := e.Timestamp.Format("2006-01-02")
ds := dailyTotals[date]
if ds == nil {
ds = &DailySummary{Date: date}
dailyTotals[date] = ds
}
ds.InputTokens += int64(e.InputTokens)
ds.OutputTokens += int64(e.OutputTokens)
}
providerModels := make([]ProviderModelSummary, 0, len(pmTotals))
for _, pm := range pmTotals {
pm.TotalTokens = pm.InputTokens + pm.OutputTokens
providerModels = append(providerModels, *pm)
}
sort.Slice(providerModels, func(i, j int) bool {
if providerModels[i].Provider == providerModels[j].Provider {
return providerModels[i].Model < providerModels[j].Model
}
return providerModels[i].Provider < providerModels[j].Provider
})
daily := make([]DailySummary, 0, len(dailyTotals))
for _, ds := range dailyTotals {
ds.TotalTokens = ds.InputTokens + ds.OutputTokens
daily = append(daily, *ds)
}
sort.Slice(daily, func(i, j int) bool {
return daily[i].Date < daily[j].Date
})
totals := ProviderModelSummary{
Provider: "all",
InputTokens: totalInput,
OutputTokens: totalOutput,
TotalTokens: totalInput + totalOutput,
}
return Summary{
Days: days,
ProviderModels: providerModels,
DailyTotals: daily,
Totals: totals,
}
}
// Flush immediately writes any pending changes to persistence.
func (s *Store) Flush() error {
s.mu.Lock()
if s.saveTimer != nil {
s.saveTimer.Stop()
}
s.savePending = false
events := make([]UsageEvent, len(s.events))
copy(events, s.events)
p := s.persistence
s.mu.Unlock()
if p != nil {
return p.SaveUsageHistory(events)
}
return nil
}
func (s *Store) trimLocked(now time.Time) {
if s.maxDays <= 0 {
return
}
cutoff := now.AddDate(0, 0, -s.maxDays)
filtered := s.events[:0]
for _, e := range s.events {
if !e.Timestamp.Before(cutoff) {
filtered = append(filtered, e)
}
}
s.events = filtered
}
func (s *Store) scheduleSaveLocked() {
if s.persistence == nil {
return
}
if s.saveTimer != nil {
s.saveTimer.Stop()
}
s.savePending = true
s.saveTimer = time.AfterFunc(s.saveDebounce, func() {
s.mu.Lock()
if !s.savePending {
s.mu.Unlock()
return
}
s.savePending = false
events := make([]UsageEvent, len(s.events))
copy(events, s.events)
p := s.persistence
s.mu.Unlock()
if p != nil {
if err := p.SaveUsageHistory(events); err != nil {
log.Error().Err(err).Msg("Failed to save AI usage history")
}
}
})
}
func normalizeModel(provider, requestModel, responseModel string) string {
if requestModel != "" {
parts := strings.SplitN(requestModel, ":", 2)
if len(parts) == 2 && parts[0] == provider {
return parts[1]
}
return requestModel
}
if responseModel != "" {
parts := strings.SplitN(responseModel, ":", 2)
if len(parts) == 2 && parts[0] == provider {
return parts[1]
}
return responseModel
}
return ""
}
// ProviderModelSummary is a rollup for a provider/model pair.
type ProviderModelSummary struct {
Provider string `json:"provider"`
Model string `json:"model"`
InputTokens int64 `json:"input_tokens"`
OutputTokens int64 `json:"output_tokens"`
TotalTokens int64 `json:"total_tokens"`
}
// DailySummary is a rollup for a single day across all providers.
type DailySummary struct {
Date string `json:"date"`
InputTokens int64 `json:"input_tokens"`
OutputTokens int64 `json:"output_tokens"`
TotalTokens int64 `json:"total_tokens"`
}
// Summary is returned by the cost summary API.
type Summary struct {
Days int `json:"days"`
ProviderModels []ProviderModelSummary `json:"provider_models"`
DailyTotals []DailySummary `json:"daily_totals"`
Totals ProviderModelSummary `json:"totals"`
}

View file

@ -0,0 +1,113 @@
package cost
import (
"testing"
"time"
)
func TestSummaryGroupsByProviderModelAndDailyTotals(t *testing.T) {
store := NewStore(90)
now := time.Now()
day1 := now.Add(-24 * time.Hour)
day2 := now.Add(-48 * time.Hour)
store.Record(UsageEvent{
Timestamp: day1,
Provider: "openai",
RequestModel: "openai:gpt-4o",
InputTokens: 100,
OutputTokens: 50,
UseCase: "chat",
})
store.Record(UsageEvent{
Timestamp: day1,
Provider: "openai",
RequestModel: "openai:gpt-4o",
InputTokens: 10,
OutputTokens: 5,
UseCase: "chat",
})
store.Record(UsageEvent{
Timestamp: day2,
Provider: "openai",
RequestModel: "openai:gpt-4o-mini",
InputTokens: 20,
OutputTokens: 10,
UseCase: "patrol",
})
store.Record(UsageEvent{
Timestamp: now,
Provider: "anthropic",
RequestModel: "anthropic:claude-opus-4-5-20251101",
InputTokens: 200,
OutputTokens: 100,
UseCase: "chat",
})
summary := store.GetSummary(3)
if len(summary.ProviderModels) != 3 {
t.Fatalf("expected 3 provider models, got %d", len(summary.ProviderModels))
}
type key struct{ provider, model string }
got := make(map[key]ProviderModelSummary)
for _, pm := range summary.ProviderModels {
got[key{pm.Provider, pm.Model}] = pm
}
openaiGpt4o := got[key{"openai", "gpt-4o"}]
if openaiGpt4o.InputTokens != 110 || openaiGpt4o.OutputTokens != 55 {
t.Fatalf("openai gpt-4o tokens wrong: %+v", openaiGpt4o)
}
openaiMini := got[key{"openai", "gpt-4o-mini"}]
if openaiMini.InputTokens != 20 || openaiMini.OutputTokens != 10 {
t.Fatalf("openai gpt-4o-mini tokens wrong: %+v", openaiMini)
}
anthropicOpus := got[key{"anthropic", "claude-opus-4-5-20251101"}]
if anthropicOpus.InputTokens != 200 || anthropicOpus.OutputTokens != 100 {
t.Fatalf("anthropic opus tokens wrong: %+v", anthropicOpus)
}
// Daily totals across all providers.
dailyGot := make(map[string]DailySummary)
for _, d := range summary.DailyTotals {
dailyGot[d.Date] = d
}
d1Key := day1.Format("2006-01-02")
if dailyGot[d1Key].InputTokens != 110 || dailyGot[d1Key].OutputTokens != 55 {
t.Fatalf("daily totals for %s wrong: %+v", d1Key, dailyGot[d1Key])
}
d2Key := day2.Format("2006-01-02")
if dailyGot[d2Key].InputTokens != 20 || dailyGot[d2Key].OutputTokens != 10 {
t.Fatalf("daily totals for %s wrong: %+v", d2Key, dailyGot[d2Key])
}
todayKey := now.Format("2006-01-02")
if dailyGot[todayKey].InputTokens != 200 || dailyGot[todayKey].OutputTokens != 100 {
t.Fatalf("daily totals for %s wrong: %+v", todayKey, dailyGot[todayKey])
}
}
func TestRetentionTrimsOldEvents(t *testing.T) {
store := NewStore(1)
old := time.Now().Add(-48 * time.Hour)
store.Record(UsageEvent{
Timestamp: old,
Provider: "openai",
RequestModel: "openai:gpt-4o",
InputTokens: 10,
OutputTokens: 10,
})
summary := store.GetSummary(7)
if len(summary.ProviderModels) != 0 {
t.Fatalf("expected old event to be trimmed, got %+v", summary.ProviderModels)
}
}

View file

@ -0,0 +1,61 @@
package ai
import (
"github.com/rcourtman/pulse-go-rewrite/internal/ai/cost"
"github.com/rcourtman/pulse-go-rewrite/internal/config"
)
// CostPersistenceAdapter bridges ConfigPersistence to cost.Persistence.
type CostPersistenceAdapter struct {
config *config.ConfigPersistence
}
// NewCostPersistenceAdapter creates a new adapter.
func NewCostPersistenceAdapter(cfg *config.ConfigPersistence) *CostPersistenceAdapter {
return &CostPersistenceAdapter{config: cfg}
}
// SaveUsageHistory saves usage events to disk via ConfigPersistence.
func (a *CostPersistenceAdapter) SaveUsageHistory(events []cost.UsageEvent) error {
records := make([]config.AIUsageEventRecord, len(events))
for i, e := range events {
records[i] = config.AIUsageEventRecord{
Timestamp: e.Timestamp,
Provider: e.Provider,
RequestModel: e.RequestModel,
ResponseModel: e.ResponseModel,
UseCase: e.UseCase,
InputTokens: e.InputTokens,
OutputTokens: e.OutputTokens,
TargetType: e.TargetType,
TargetID: e.TargetID,
FindingID: e.FindingID,
}
}
return a.config.SaveAIUsageHistory(records)
}
// LoadUsageHistory loads usage events from disk via ConfigPersistence.
func (a *CostPersistenceAdapter) LoadUsageHistory() ([]cost.UsageEvent, error) {
data, err := a.config.LoadAIUsageHistory()
if err != nil {
return nil, err
}
events := make([]cost.UsageEvent, len(data.Events))
for i, r := range data.Events {
events[i] = cost.UsageEvent{
Timestamp: r.Timestamp,
Provider: r.Provider,
RequestModel: r.RequestModel,
ResponseModel: r.ResponseModel,
UseCase: r.UseCase,
InputTokens: r.InputTokens,
OutputTokens: r.OutputTokens,
TargetType: r.TargetType,
TargetID: r.TargetID,
FindingID: r.FindingID,
}
}
return events, nil
}

View file

@ -0,0 +1,85 @@
package ai
import (
"time"
"github.com/rcourtman/pulse-go-rewrite/internal/monitoring"
)
// MetricsHistoryAdapter adapts monitoring.MetricsHistory to the MetricsHistoryProvider interface
// This allows the patrol service to use the monitoring package's metrics history
// without creating a direct package dependency
type MetricsHistoryAdapter struct {
history *monitoring.MetricsHistory
}
// NewMetricsHistoryAdapter creates an adapter for the monitoring.MetricsHistory
func NewMetricsHistoryAdapter(history *monitoring.MetricsHistory) *MetricsHistoryAdapter {
if history == nil {
return nil
}
return &MetricsHistoryAdapter{history: history}
}
// GetNodeMetrics returns historical metrics for a node
func (a *MetricsHistoryAdapter) GetNodeMetrics(nodeID string, metricType string, duration time.Duration) []MetricPoint {
if a.history == nil {
return nil
}
points := a.history.GetNodeMetrics(nodeID, metricType, duration)
return convertMetricPoints(points)
}
// GetGuestMetrics returns historical metrics for a guest
func (a *MetricsHistoryAdapter) GetGuestMetrics(guestID string, metricType string, duration time.Duration) []MetricPoint {
if a.history == nil {
return nil
}
points := a.history.GetGuestMetrics(guestID, metricType, duration)
return convertMetricPoints(points)
}
// GetAllGuestMetrics returns all metrics for a guest
func (a *MetricsHistoryAdapter) GetAllGuestMetrics(guestID string, duration time.Duration) map[string][]MetricPoint {
if a.history == nil {
return nil
}
metricsMap := a.history.GetAllGuestMetrics(guestID, duration)
return convertMetricsMap(metricsMap)
}
// GetAllStorageMetrics returns all metrics for storage
func (a *MetricsHistoryAdapter) GetAllStorageMetrics(storageID string, duration time.Duration) map[string][]MetricPoint {
if a.history == nil {
return nil
}
metricsMap := a.history.GetAllStorageMetrics(storageID, duration)
return convertMetricsMap(metricsMap)
}
// convertMetricPoints converts from monitoring.MetricPoint to ai.MetricPoint
func convertMetricPoints(points []monitoring.MetricPoint) []MetricPoint {
if points == nil {
return nil
}
result := make([]MetricPoint, len(points))
for i, p := range points {
result[i] = MetricPoint{
Value: p.Value,
Timestamp: p.Timestamp,
}
}
return result
}
// convertMetricsMap converts a map of metric types to their points
func convertMetricsMap(metricsMap map[string][]monitoring.MetricPoint) map[string][]MetricPoint {
if metricsMap == nil {
return nil
}
result := make(map[string][]MetricPoint, len(metricsMap))
for key, points := range metricsMap {
result[key] = convertMetricPoints(points)
}
return result
}

View file

@ -10,6 +10,7 @@ import (
"sync"
"time"
aicontext "github.com/rcourtman/pulse-go-rewrite/internal/ai/context"
"github.com/rcourtman/pulse-go-rewrite/internal/ai/knowledge"
"github.com/rcourtman/pulse-go-rewrite/internal/models"
"github.com/rs/zerolog/log"
@ -208,6 +209,7 @@ type PatrolService struct {
config PatrolConfig
findings *FindingsStore
knowledgeStore *knowledge.Store // For per-resource notes in patrol context
metricsHistory MetricsHistoryProvider // For trend analysis and predictions
// Cached thresholds (recalculated when thresholdProvider changes)
thresholds PatrolThresholds
@ -329,6 +331,15 @@ func (p *PatrolService) SetKnowledgeStore(store *knowledge.Store) {
p.knowledgeStore = store
}
// SetMetricsHistoryProvider sets the metrics history provider for enriched context
// This enables the patrol service to compute trends and predictions based on historical data
func (p *PatrolService) SetMetricsHistoryProvider(provider MetricsHistoryProvider) {
p.mu.Lock()
defer p.mu.Unlock()
p.metricsHistory = provider
log.Info().Msg("AI Patrol: Metrics history provider set for enriched context")
}
// GetConfig returns the current patrol configuration
func (p *PatrolService) GetConfig() PatrolConfig {
p.mu.RLock()
@ -1441,8 +1452,9 @@ func (p *PatrolService) runAIAnalysis(ctx context.Context, state models.StateSna
return nil, fmt.Errorf("AI service not available")
}
// Build infrastructure summary for the AI
summary := p.buildInfrastructureSummary(state)
// Build enriched infrastructure context with trends and predictions
// Falls back to basic summary if metrics history is not available
summary := p.buildEnrichedContext(state)
if summary == "" {
return nil, nil // Nothing to analyze
}
@ -1656,7 +1668,7 @@ func (p *PatrolService) buildInfrastructureSummary(state models.StateSnapshot) s
dh.Hostname, dh.Status, len(dh.Containers)))
for _, c := range dh.Containers {
sb.WriteString(fmt.Sprintf(" - %s: State=%s, CPU=%.1f%%, Memory=%.1f%%, Restarts=%d\n",
c.Name, c.State, c.CPUPercent, c.MemoryPercent, c.RestartCount))
c.Name, c.State, c.CPUPercent, c.MemoryPercent, c.RestartCount))
}
}
sb.WriteString("\n")
@ -1665,6 +1677,141 @@ func (p *PatrolService) buildInfrastructureSummary(state models.StateSnapshot) s
return sb.String()
}
// buildEnrichedContext creates context with historical trends and predictions
// Falls back to basic summary if metrics history is not available
func (p *PatrolService) buildEnrichedContext(state models.StateSnapshot) string {
p.mu.RLock()
metricsHistory := p.metricsHistory
knowledgeStore := p.knowledgeStore
p.mu.RUnlock()
// If no metrics history, fall back to basic summary
if metricsHistory == nil {
log.Debug().Msg("AI Patrol: No metrics history available, using basic summary")
return p.buildInfrastructureSummary(state)
}
// Build enriched context using the context package
builder := aicontext.NewBuilder().
WithMetricsHistory(&metricsHistoryShim{provider: metricsHistory})
// Add knowledge store if available
if knowledgeStore != nil {
builder = builder.WithKnowledge(&knowledgeShim{store: knowledgeStore})
}
// Build full infrastructure context with trends
infraCtx := builder.BuildForInfrastructure(state)
if infraCtx == nil {
log.Warn().Msg("AI Patrol: Failed to build enriched context, falling back")
return p.buildInfrastructureSummary(state)
}
// Format for AI consumption
formatted := aicontext.FormatInfrastructureContext(infraCtx)
log.Debug().
Int("resources", infraCtx.TotalResources).
Int("predictions", len(infraCtx.Predictions)).
Msg("AI Patrol: Built enriched context with trends")
return formatted
}
// metricsHistoryShim adapts ai.MetricsHistoryProvider to aicontext.MetricsHistoryProvider
type metricsHistoryShim struct {
provider MetricsHistoryProvider
}
func (s *metricsHistoryShim) GetNodeMetrics(nodeID string, metricType string, duration time.Duration) []aicontext.MetricPoint {
if s.provider == nil {
return nil
}
points := s.provider.GetNodeMetrics(nodeID, metricType, duration)
return convertToContextPoints(points)
}
func (s *metricsHistoryShim) GetGuestMetrics(guestID string, metricType string, duration time.Duration) []aicontext.MetricPoint {
if s.provider == nil {
return nil
}
points := s.provider.GetGuestMetrics(guestID, metricType, duration)
return convertToContextPoints(points)
}
func (s *metricsHistoryShim) GetAllGuestMetrics(guestID string, duration time.Duration) map[string][]aicontext.MetricPoint {
if s.provider == nil {
return nil
}
metricsMap := s.provider.GetAllGuestMetrics(guestID, duration)
return convertToContextMetricsMap(metricsMap)
}
func (s *metricsHistoryShim) GetAllStorageMetrics(storageID string, duration time.Duration) map[string][]aicontext.MetricPoint {
if s.provider == nil {
return nil
}
metricsMap := s.provider.GetAllStorageMetrics(storageID, duration)
return convertToContextMetricsMap(metricsMap)
}
// knowledgeShim adapts knowledge.Store to aicontext.KnowledgeProvider
type knowledgeShim struct {
store *knowledge.Store
}
func (k *knowledgeShim) GetNotes(guestID string) []string {
if k.store == nil {
return nil
}
knowledge, err := k.store.GetKnowledge(guestID)
if err != nil || knowledge == nil {
return nil
}
// Extract note contents
var notes []string
for _, note := range knowledge.Notes {
notes = append(notes, note.Content)
}
return notes
}
func (k *knowledgeShim) FormatAllForContext() string {
if k.store == nil {
return ""
}
return k.store.FormatAllForContext()
}
// convertToContextPoints converts ai.MetricPoint to aicontext.MetricPoint
// Since both are aliases for types.MetricPoint, this is just a type assertion
func convertToContextPoints(points []MetricPoint) []aicontext.MetricPoint {
if points == nil {
return nil
}
// Both types are aliases for types.MetricPoint, so they're compatible
result := make([]aicontext.MetricPoint, len(points))
for i, p := range points {
result[i] = aicontext.MetricPoint{
Value: p.Value,
Timestamp: p.Timestamp,
}
}
return result
}
// convertToContextMetricsMap converts a map of metric points
func convertToContextMetricsMap(metricsMap map[string][]MetricPoint) map[string][]aicontext.MetricPoint {
if metricsMap == nil {
return nil
}
result := make(map[string][]aicontext.MetricPoint, len(metricsMap))
for key, points := range metricsMap {
result[key] = convertToContextPoints(points)
}
return result
}
// buildPatrolPrompt creates the prompt for AI analysis
// Includes user feedback context to prevent re-raising dismissed findings
func (p *PatrolService) buildPatrolPrompt(summary string) string {
@ -1685,13 +1832,20 @@ func (p *PatrolService) buildPatrolPrompt(summary string) string {
%s
Analyze the above and report any findings using the structured format. Focus on:
- Resources showing high utilization
- Patterns that might indicate problems
- Resources showing high utilization or concerning trends (look for growing indicators)
- Predictions showing resources approaching capacity (look for predictions)
- Anomalies flagged as unusual (look for anomalies)
- Patterns that might indicate problems over time (compare 24h vs 7d trends)
- Missing backups or stale backup schedules
- Unbalanced resource distribution
- Any anomalies or concerns
If everything looks healthy, say so briefly.`, summary)
IMPORTANT: The context includes historical trends (24h and 7d) where available. Use this to provide actionable insights:
- A resource that's "growing 5%%/day" needs proactive attention
- A resource that's "stable" with high usage may just need monitoring
- A "volatile" resource may indicate workload issues
If predictions show a resource will be full within 7 days, flag it as high priority.
If everything looks healthy with stable trends, say so briefly.`, summary)
var contextAdditions strings.Builder

View file

@ -3,6 +3,7 @@ package ai
import (
"context"
"encoding/base64"
"encoding/json"
"fmt"
"io"
"net/http"
@ -15,10 +16,12 @@ import (
"github.com/google/uuid"
"github.com/rcourtman/pulse-go-rewrite/internal/agentexec"
"github.com/rcourtman/pulse-go-rewrite/internal/ai/cost"
"github.com/rcourtman/pulse-go-rewrite/internal/ai/knowledge"
"github.com/rcourtman/pulse-go-rewrite/internal/ai/providers"
"github.com/rcourtman/pulse-go-rewrite/internal/config"
"github.com/rcourtman/pulse-go-rewrite/internal/models"
"github.com/rcourtman/pulse-go-rewrite/internal/types"
"github.com/rs/zerolog/log"
)
@ -38,6 +41,7 @@ type Service struct {
stateProvider StateProvider
alertProvider AlertProvider
knowledgeStore *knowledge.Store
costStore *cost.Store
resourceProvider ResourceProvider // Unified resource model provider (Phase 2)
patrolService *PatrolService // Background AI monitoring service
metadataProvider MetadataProvider // Enables AI to update resource URLs
@ -50,12 +54,16 @@ type Service struct {
func NewService(persistence *config.ConfigPersistence, agentServer *agentexec.Server) *Service {
// Initialize knowledge store
var knowledgeStore *knowledge.Store
costStore := cost.NewStore(cost.DefaultMaxDays)
if persistence != nil {
var err error
knowledgeStore, err = knowledge.NewStore(persistence.DataDir())
if err != nil {
log.Warn().Err(err).Msg("Failed to initialize knowledge store")
}
if err := costStore.SetPersistence(NewCostPersistenceAdapter(persistence)); err != nil {
log.Warn().Err(err).Msg("Failed to initialize AI usage cost store")
}
}
return &Service{
@ -63,6 +71,7 @@ func NewService(persistence *config.ConfigPersistence, agentServer *agentexec.Se
agentServer: agentServer,
policy: agentexec.DefaultPolicy(),
knowledgeStore: knowledgeStore,
costStore: costStore,
}
}
@ -108,6 +117,28 @@ func (s *Service) GetAIConfig() *config.AIConfig {
return s.cfg
}
// GetCostSummary returns usage rollups for the last N days.
func (s *Service) GetCostSummary(days int) cost.Summary {
s.mu.RLock()
store := s.costStore
s.mu.RUnlock()
if store == nil {
if days <= 0 {
days = 30
}
return cost.Summary{
Days: days,
ProviderModels: []cost.ProviderModelSummary{},
DailyTotals: []cost.DailySummary{},
Totals: cost.ProviderModelSummary{
Provider: "all",
},
}
}
return store.GetSummary(days)
}
// SetPatrolThresholdProvider sets the threshold provider for patrol
// This should be called with an AlertThresholdAdapter to connect patrol to user-configured thresholds
func (s *Service) SetPatrolThresholdProvider(provider ThresholdProvider) {
@ -120,6 +151,30 @@ func (s *Service) SetPatrolThresholdProvider(provider ThresholdProvider) {
}
}
// MetricsHistoryProvider provides access to historical metrics for trend analysis
// This interface matches the monitoring.MetricsHistory methods we need
type MetricsHistoryProvider interface {
GetNodeMetrics(nodeID string, metricType string, duration time.Duration) []MetricPoint
GetGuestMetrics(guestID string, metricType string, duration time.Duration) []MetricPoint
GetAllGuestMetrics(guestID string, duration time.Duration) map[string][]MetricPoint
GetAllStorageMetrics(storageID string, duration time.Duration) map[string][]MetricPoint
}
// MetricPoint is an alias for the shared metric point type
type MetricPoint = types.MetricPoint
// SetMetricsHistoryProvider sets the metrics history provider for enriched AI context
// This enables the AI to see trends, anomalies, and predictions based on historical data
func (s *Service) SetMetricsHistoryProvider(provider MetricsHistoryProvider) {
s.mu.RLock()
patrol := s.patrolService
s.mu.RUnlock()
if patrol != nil {
patrol.SetMetricsHistoryProvider(provider)
}
}
// StartPatrol starts the background patrol service
func (s *Service) StartPatrol(ctx context.Context) {
s.mu.RLock()
@ -325,6 +380,40 @@ func extractVMIDFromCommand(command string) (vmid int, requiresOwnerNode bool, f
return 0, false, false
}
// formatApprovalNeededToolResult returns a structured tool result for commands that require approval.
// It is encoded as a marker + JSON so the LLM can reliably detect it.
func formatApprovalNeededToolResult(command, toolID, reason string) string {
payload := map[string]interface{}{
"type": "approval_required",
"command": command,
"tool_id": toolID,
"reason": reason,
"how_to_approve": "Ask the user to click the approval button shown in the UI.",
"do_not_retry": true,
}
b, err := json.Marshal(payload)
if err != nil {
// Fallback to a safe plain-text marker.
return fmt.Sprintf("APPROVAL_REQUIRED: %s", command)
}
return "APPROVAL_REQUIRED: " + string(b)
}
// formatPolicyBlockedToolResult returns a structured tool result for commands blocked by policy.
func formatPolicyBlockedToolResult(command, reason string) string {
payload := map[string]interface{}{
"type": "policy_blocked",
"command": command,
"reason": reason,
"do_not_retry": true,
}
b, err := json.Marshal(payload)
if err != nil {
return fmt.Sprintf("POLICY_BLOCKED: %s", reason)
}
return "POLICY_BLOCKED: " + string(b)
}
// LoadConfig loads the AI configuration and initializes the provider
func (s *Service) LoadConfig() error {
s.mu.Lock()
@ -343,17 +432,41 @@ func (s *Service) LoadConfig() error {
return nil
}
provider, err := providers.NewFromConfig(cfg)
selectedModel := cfg.GetModel()
selectedProvider, _ := config.ParseModelString(selectedModel)
providerClient, err := providers.NewForModel(cfg, selectedModel)
if err != nil {
log.Warn().Err(err).Msg("Failed to initialize AI provider")
s.provider = nil
return nil // Don't fail startup if provider can't be initialized
// Only fall back to legacy config if no multi-provider credentials are set.
if len(cfg.GetConfiguredProviders()) == 0 && (cfg.Provider != "" || cfg.APIKey != "") {
if legacyClient, legacyErr := providers.NewFromConfig(cfg); legacyErr == nil {
providerClient = legacyClient
selectedProvider = providerClient.Name()
log.Info().
Str("provider", selectedProvider).
Str("model", cfg.GetModel()).
Msg("AI service initialized via legacy config (migration path)")
} else {
log.Warn().Err(legacyErr).Msg("Failed to initialize legacy AI provider")
s.provider = nil
return nil
}
} else {
log.Warn().
Err(err).
Str("selected_model", selectedModel).
Str("selected_provider", selectedProvider).
Strs("configured_providers", cfg.GetConfiguredProviders()).
Msg("AI enabled but selected provider is not configured; check API keys or model selection")
s.provider = nil
return nil
}
}
s.provider = provider
s.provider = providerClient
log.Info().
Str("provider", cfg.Provider).
Str("model", cfg.GetModel()).
Str("provider", selectedProvider).
Str("model", selectedModel).
Bool("autonomous_mode", cfg.AutonomousMode).
Msg("AI service initialized")
@ -400,7 +513,7 @@ func (s *Service) GetDebugContext(req ExecuteRequest) map[string]interface{} {
"hosts": len(state.Hosts),
"pbs_instances": len(state.PBSInstances),
}
// List some VMs/containers for verification
var vmNames []string
for _, vm := range state.VMs {
@ -491,48 +604,48 @@ func isDangerousCommand(cmd string) bool {
"unlink": true,
"shred": true,
// Disk/filesystem destructive operations
"dd": true,
"mkfs": true,
"fdisk": true,
"parted": true,
"wipefs": true,
"sgdisk": true,
"gdisk": true,
"zpool": true, // Allow reads but not modifications
"zfs": true, // Allow reads but not modifications
"lvremove": true,
"vgremove": true,
"pvremove": true,
"dd": true,
"mkfs": true,
"fdisk": true,
"parted": true,
"wipefs": true,
"sgdisk": true,
"gdisk": true,
"zpool": true, // Allow reads but not modifications
"zfs": true, // Allow reads but not modifications
"lvremove": true,
"vgremove": true,
"pvremove": true,
// System state changes
"reboot": true,
"shutdown": true,
"poweroff": true,
"halt": true,
"init": true,
"systemctl": true, // could stop critical services
"service": true,
"reboot": true,
"shutdown": true,
"poweroff": true,
"halt": true,
"init": true,
"systemctl": true, // could stop critical services
"service": true,
// User/permission changes
"chmod": true,
"chown": true,
"useradd": true,
"userdel": true,
"passwd": true,
"chmod": true,
"chown": true,
"useradd": true,
"userdel": true,
"passwd": true,
// Package management
"apt": true,
"apt-get": true,
"dpkg": true,
"yum": true,
"dnf": true,
"pacman": true,
"pip": true,
"npm": true,
"apt": true,
"apt-get": true,
"dpkg": true,
"yum": true,
"dnf": true,
"pacman": true,
"pip": true,
"npm": true,
// Proxmox destructive
"vzdump": true,
"vzrestore": true,
"pveam": true,
"vzdump": true,
"vzrestore": true,
"pveam": true,
// Network changes
"iptables": true,
"nft": true,
"iptables": true,
"nft": true,
"firewall-cmd": true,
}
@ -569,7 +682,7 @@ func isDangerousCommand(cmd string) bool {
}
}
}
// Special case: allow read-only dpkg operations
// Special case: allow read-only dpkg operations
if baseCmd == "dpkg" {
safeDpkgOps := []string{"-l", "--list", "-L", "--listfiles", "-s", "--status", "-S", "--search", "-p", "--print-avail", "--get-selections"}
for _, safeOp := range safeDpkgOps {
@ -698,14 +811,14 @@ func isReadOnlyCommand(cmd string) bool {
// ConversationMessage represents a message in conversation history
type ConversationMessage struct {
Role string `json:"role"` // "user" or "assistant"
Role string `json:"role"` // "user" or "assistant"
Content string `json:"content"`
}
// ExecuteRequest represents a request to execute an AI prompt
type ExecuteRequest struct {
Prompt string `json:"prompt"`
TargetType string `json:"target_type,omitempty"` // "host", "container", "vm", "node"
TargetType string `json:"target_type,omitempty"` // "host", "container", "vm", "node"
TargetID string `json:"target_id,omitempty"`
Context map[string]interface{} `json:"context,omitempty"` // Current metrics, state, etc.
SystemPrompt string `json:"system_prompt,omitempty"` // Override system prompt
@ -717,18 +830,18 @@ type ExecuteRequest struct {
// ExecuteResponse represents the AI's response
type ExecuteResponse struct {
Content string `json:"content"`
Model string `json:"model"`
InputTokens int `json:"input_tokens"`
OutputTokens int `json:"output_tokens"`
Content string `json:"content"`
Model string `json:"model"`
InputTokens int `json:"input_tokens"`
OutputTokens int `json:"output_tokens"`
ToolCalls []ToolExecution `json:"tool_calls,omitempty"` // Commands that were executed
}
// ToolExecution represents a tool that was executed during the AI conversation
type ToolExecution struct {
Name string `json:"name"`
Input string `json:"input"` // Human-readable input (e.g., the command)
Output string `json:"output"` // Result of execution
Input string `json:"input"` // Human-readable input (e.g., the command)
Output string `json:"output"` // Result of execution
Success bool `json:"success"`
}
@ -786,8 +899,8 @@ type ToolEndData struct {
// ApprovalNeededData is sent when a command needs user approval
type ApprovalNeededData struct {
Command string `json:"command"`
ToolID string `json:"tool_id"` // ID to reference when approving
ToolName string `json:"tool_name"` // "run_command", "read_file", etc.
ToolID string `json:"tool_id"` // ID to reference when approving
ToolName string `json:"tool_name"` // "run_command", "read_file", etc.
RunOnHost bool `json:"run_on_host"`
TargetHost string `json:"target_host,omitempty"` // Explicit host to route to
}
@ -799,6 +912,7 @@ func (s *Service) Execute(ctx context.Context, req ExecuteRequest) (*ExecuteResp
defaultProvider := s.provider
agentServer := s.agentServer
cfg := s.cfg
costStore := s.costStore
s.mu.RUnlock()
// Determine the model to use for this request
@ -887,6 +1001,21 @@ Always execute the commands rather than telling the user how to do it.`
return nil, fmt.Errorf("AI request failed: %w", err)
}
if costStore != nil {
costStore.Record(cost.UsageEvent{
Timestamp: time.Now(),
Provider: provider.Name(),
RequestModel: modelString,
ResponseModel: resp.Model,
UseCase: req.UseCase,
InputTokens: resp.InputTokens,
OutputTokens: resp.OutputTokens,
TargetType: req.TargetType,
TargetID: req.TargetID,
FindingID: req.FindingID,
})
}
totalInputTokens += resp.InputTokens
totalOutputTokens += resp.OutputTokens
model = resp.Model
@ -938,6 +1067,7 @@ func (s *Service) ExecuteStream(ctx context.Context, req ExecuteRequest, callbac
defaultProvider := s.provider
agentServer := s.agentServer
cfg := s.cfg
costStore := s.costStore
s.mu.RUnlock()
// Determine the model to use for this request
@ -1058,6 +1188,21 @@ Always execute the commands rather than telling the user how to do it.`
return nil, fmt.Errorf("AI request failed: %w", err)
}
if costStore != nil {
costStore.Record(cost.UsageEvent{
Timestamp: time.Now(),
Provider: provider.Name(),
RequestModel: modelString,
ResponseModel: resp.Model,
UseCase: req.UseCase,
InputTokens: resp.InputTokens,
OutputTokens: resp.OutputTokens,
TargetType: req.TargetType,
TargetID: req.TargetID,
FindingID: req.FindingID,
})
}
log.Debug().Int("iteration", iteration).Msg("AI provider returned successfully")
totalInputTokens += resp.InputTokens
@ -1161,7 +1306,6 @@ Always execute the commands rather than telling the user how to do it.`
}
}
var result string
var execution ToolExecution
@ -1170,7 +1314,14 @@ Always execute the commands rather than telling the user how to do it.`
// We'll break out of the loop after processing all tool calls
// Note: We don't add to toolExecutions here because the approval_needed event
// already tells the frontend to show the approval UI
result = fmt.Sprintf("Awaiting user approval: %s", toolInput)
cmd, _ := tc.Input["command"].(string)
result = formatApprovalNeededToolResult(cmd, tc.ID, "Command requires user approval")
execution = ToolExecution{
Name: tc.Name,
Input: toolInput,
Output: result,
Success: true, // Not an error; awaiting approval
}
} else {
// Stream tool start event
callback(StreamEvent{
@ -1440,15 +1591,11 @@ func (s *Service) executeTool(ctx context.Context, req ExecuteRequest, tc provid
if !s.IsAutonomous() {
decision := s.policy.Evaluate(command)
if decision == agentexec.PolicyBlock {
execution.Output = "Error: This command is blocked by security policy"
execution.Output = formatPolicyBlockedToolResult(command, "This command is blocked by security policy")
return execution.Output, execution
}
if decision == agentexec.PolicyRequireApproval {
// Direct the AI to tell the user about the approval button
execution.Output = fmt.Sprintf("COMMAND_BLOCKED: This command (%s) requires user approval and was NOT executed. "+
"An approval button has been displayed to the user. "+
"DO NOT attempt to run this command again. "+
"Tell the user to click the 'Run' button to execute it.", command)
execution.Output = formatApprovalNeededToolResult(command, tc.ID, "Security policy requires approval")
execution.Success = true // Not an error, just needs approval
return execution.Output, execution
}
@ -1456,7 +1603,7 @@ func (s *Service) executeTool(ctx context.Context, req ExecuteRequest, tc provid
// Build execution request with proper targeting
execReq := req
// If target_host is explicitly specified by AI, use it for routing
if targetHost != "" {
// Ensure Context map exists
@ -1477,7 +1624,7 @@ func (s *Service) executeTool(ctx context.Context, req ExecuteRequest, tc provid
Str("command", command).
Msg("AI explicitly specified target_host for command routing")
}
// If run_on_host is true, override the target type to run on host
if runOnHost {
log.Debug().
@ -1576,7 +1723,7 @@ func (s *Service) executeTool(ctx context.Context, req ExecuteRequest, tc provid
// Build the write command using base64 to safely handle any content
// This avoids issues with special characters, quotes, newlines, etc.
encoded := base64.StdEncoding.EncodeToString([]byte(content))
var command string
if appendMode {
// Append mode: decode and append to file (no backup needed for append)
@ -1591,7 +1738,7 @@ func (s *Service) executeTool(ctx context.Context, req ExecuteRequest, tc provid
dir := filepath.Dir(path)
tempFile := path + ".pulse-tmp"
backupFile := path + ".bak"
// Build a safe multi-step command:
// - mkdir -p for parent dir
// - if file exists, copy to .bak
@ -1731,7 +1878,7 @@ func (s *Service) getGuestID(req ExecuteRequest) string {
if req.TargetType == "" || req.TargetID == "" {
return ""
}
// For Proxmox targets, include the node info
// Format: instance-node-type-vmid or instance-targetid
return fmt.Sprintf("%s-%s", req.TargetType, req.TargetID)
@ -1811,11 +1958,11 @@ func sanitizeError(err error) error {
if err == nil {
return nil
}
errMsg := err.Error()
// Replace raw TCP connection details with generic message
// e.g., "write tcp 192.168.0.123:7655->192.168.0.134:58004: i/o timeout"
// e.g., "write tcp 192.168.0.123:7655->192.168.0.134:58004: i/o timeout"
// becomes "connection to agent timed out"
if strings.Contains(errMsg, "i/o timeout") {
if strings.Contains(errMsg, "failed to send command") {
@ -1823,22 +1970,22 @@ func sanitizeError(err error) error {
}
return fmt.Errorf("network timeout - the target may be unreachable")
}
// Replace "write tcp ... connection refused" style errors
if strings.Contains(errMsg, "connection refused") {
return fmt.Errorf("connection refused - the agent may not be running on the target host")
}
// Replace "no such host" errors
if strings.Contains(errMsg, "no such host") {
return fmt.Errorf("host not found - verify the hostname is correct and DNS is working")
}
// Replace "context deadline exceeded" with friendlier message
if strings.Contains(errMsg, "context deadline exceeded") {
return fmt.Errorf("operation timed out - the command may have taken too long")
}
return err
}
@ -1850,7 +1997,7 @@ func (s *Service) executeOnAgent(ctx context.Context, req ExecuteRequest, comman
// Find the appropriate agent using robust routing
agents := s.agentServer.GetConnectedAgents()
// Use the new robust routing logic
routeResult, err := s.routeToAgent(req, command, agents)
if err != nil {
@ -1870,7 +2017,7 @@ func (s *Service) executeOnAgent(ctx context.Context, req ExecuteRequest, comman
}
agentID := routeResult.AgentID
log.Debug().
Str("agent_id", agentID).
Str("agent_hostname", routeResult.AgentHostname).
@ -1952,7 +2099,7 @@ type RunCommandRequest struct {
Command string `json:"command"`
TargetType string `json:"target_type"` // "host", "container", "vm"
TargetID string `json:"target_id"`
RunOnHost bool `json:"run_on_host"` // If true, run on host instead of target
RunOnHost bool `json:"run_on_host"` // If true, run on host instead of target
VMID string `json:"vmid,omitempty"`
TargetHost string `json:"target_host,omitempty"` // Explicit host for routing
}
@ -1997,7 +2144,6 @@ func (s *Service) RunCommand(ctx context.Context, req RunCommandRequest) (*RunCo
Msg("RunCommand using explicit target_host for routing")
}
output, err := s.executeOnAgent(ctx, execReq, req.Command)
if err != nil {
return &RunCommandResponse{
@ -2132,7 +2278,6 @@ After install, enable and start the service:
The latest version can be found at: https://api.github.com/repos/rcourtman/Pulse/releases/latest
This is a 3-command job. Don't over-investigate.`
// Add custom context from AI settings (user's infrastructure description)
s.mu.RLock()
cfg := s.cfg
@ -2147,7 +2292,7 @@ This is a 3-command job. Don't over-investigate.`
s.mu.RLock()
hasResourceProvider := s.resourceProvider != nil
s.mu.RUnlock()
if hasResourceProvider {
prompt += s.buildUnifiedResourceContext()
} else {
@ -2194,7 +2339,6 @@ This is a 3-command job. Don't over-investigate.`
}
}
// Add any provided context in a structured way
if len(req.Context) > 0 {
prompt += "\n\n## Current Metrics and State"
@ -2259,39 +2403,39 @@ This is a 3-command job. Don't over-investigate.`
// formatContextKey converts snake_case keys to readable labels
func formatContextKey(key string) string {
replacements := map[string]string{
"guestName": "Guest Name",
"name": "Name",
"type": "Type",
"vmid": "VMID",
"node": "PVE Node (host)",
"guest_node": "PVE Node (host)",
"status": "Status",
"uptime": "Uptime",
"cpu_usage": "CPU Usage",
"cpu_cores": "CPU Cores",
"memory_used": "Memory Used",
"memory_total": "Memory Total",
"memory_usage": "Memory Usage",
"memory_balloon": "Memory Balloon",
"swap_used": "Swap Used",
"swap_total": "Swap Total",
"disk_used": "Disk Used",
"disk_total": "Disk Total",
"disk_usage": "Disk Usage",
"disk_read_rate": "Disk Read Rate",
"disk_write_rate": "Disk Write Rate",
"network_in_rate": "Network In Rate",
"network_out_rate": "Network Out Rate",
"backup_status": "Backup Status",
"last_backup": "Last Backup",
"days_since_backup": "Days Since Backup",
"os_name": "OS Name",
"os_version": "OS Version",
"guest_agent": "Guest Agent",
"ip_addresses": "IP Addresses",
"tags": "Tags",
"user_notes": "User Notes",
"user_annotations": "User Annotations",
"guestName": "Guest Name",
"name": "Name",
"type": "Type",
"vmid": "VMID",
"node": "PVE Node (host)",
"guest_node": "PVE Node (host)",
"status": "Status",
"uptime": "Uptime",
"cpu_usage": "CPU Usage",
"cpu_cores": "CPU Cores",
"memory_used": "Memory Used",
"memory_total": "Memory Total",
"memory_usage": "Memory Usage",
"memory_balloon": "Memory Balloon",
"swap_used": "Swap Used",
"swap_total": "Swap Total",
"disk_used": "Disk Used",
"disk_total": "Disk Total",
"disk_usage": "Disk Usage",
"disk_read_rate": "Disk Read Rate",
"disk_write_rate": "Disk Write Rate",
"network_in_rate": "Network In Rate",
"network_out_rate": "Network Out Rate",
"backup_status": "Backup Status",
"last_backup": "Last Backup",
"days_since_backup": "Days Since Backup",
"os_name": "OS Name",
"os_version": "OS Version",
"guest_agent": "Guest Agent",
"ip_addresses": "IP Addresses",
"tags": "Tags",
"user_notes": "User Notes",
"user_annotations": "User Annotations",
}
if label, ok := replacements[key]; ok {
@ -2474,4 +2618,3 @@ func providerDisplayName(provider string) string {
func (s *Service) Reload() error {
return s.LoadConfig()
}

View file

@ -12,13 +12,13 @@ import (
"github.com/rcourtman/pulse-go-rewrite/internal/agentexec"
"github.com/rcourtman/pulse-go-rewrite/internal/ai"
"github.com/rcourtman/pulse-go-rewrite/internal/ai/cost"
"github.com/rcourtman/pulse-go-rewrite/internal/ai/providers"
"github.com/rcourtman/pulse-go-rewrite/internal/config"
"github.com/rcourtman/pulse-go-rewrite/internal/utils"
"github.com/rs/zerolog/log"
)
// AISettingsHandler handles AI settings endpoints
type AISettingsHandler struct {
config *config.Config
@ -91,6 +91,11 @@ func (h *AISettingsHandler) SetPatrolRunHistoryPersistence(persistence ai.Patrol
return nil
}
// SetMetricsHistoryProvider sets the metrics history provider for enriched AI context
func (h *AISettingsHandler) SetMetricsHistoryProvider(provider ai.MetricsHistoryProvider) {
h.aiService.SetMetricsHistoryProvider(provider)
}
// StopPatrol stops the background AI patrol service
func (h *AISettingsHandler) StopPatrol() {
h.aiService.StopPatrol()
@ -105,38 +110,38 @@ func (h *AISettingsHandler) GetAlertTriggeredAnalyzer() *ai.AlertTriggeredAnalyz
// API keys are masked for security
type AISettingsResponse struct {
Enabled bool `json:"enabled"`
Provider string `json:"provider"` // DEPRECATED: legacy single provider
APIKeySet bool `json:"api_key_set"` // DEPRECATED: true if legacy API key is configured
Provider string `json:"provider"` // DEPRECATED: legacy single provider
APIKeySet bool `json:"api_key_set"` // DEPRECATED: true if legacy API key is configured
Model string `json:"model"`
ChatModel string `json:"chat_model,omitempty"` // Model for interactive chat (empty = use default)
PatrolModel string `json:"patrol_model,omitempty"` // Model for patrol (empty = use default)
BaseURL string `json:"base_url,omitempty"` // DEPRECATED: legacy base URL
Configured bool `json:"configured"` // true if AI is ready to use
AutonomousMode bool `json:"autonomous_mode"` // true if AI can execute without approval
CustomContext string `json:"custom_context"` // user-provided infrastructure context
Configured bool `json:"configured"` // true if AI is ready to use
AutonomousMode bool `json:"autonomous_mode"` // true if AI can execute without approval
CustomContext string `json:"custom_context"` // user-provided infrastructure context
// OAuth fields for Claude Pro/Max subscription authentication
AuthMethod string `json:"auth_method"` // "api_key" or "oauth"
OAuthConnected bool `json:"oauth_connected"` // true if OAuth tokens are configured
// Patrol settings for token efficiency
PatrolSchedulePreset string `json:"patrol_schedule_preset"` // DEPRECATED: legacy preset
PatrolIntervalMinutes int `json:"patrol_interval_minutes"` // Patrol interval in minutes (0 = disabled)
AlertTriggeredAnalysis bool `json:"alert_triggered_analysis"` // true if AI analyzes when alerts fire
AvailableModels []config.ModelInfo `json:"available_models"` // List of models for current provider
PatrolSchedulePreset string `json:"patrol_schedule_preset"` // DEPRECATED: legacy preset
PatrolIntervalMinutes int `json:"patrol_interval_minutes"` // Patrol interval in minutes (0 = disabled)
AlertTriggeredAnalysis bool `json:"alert_triggered_analysis"` // true if AI analyzes when alerts fire
AvailableModels []config.ModelInfo `json:"available_models"` // List of models for current provider
// Multi-provider credentials - shows which providers are configured
AnthropicConfigured bool `json:"anthropic_configured"` // true if Anthropic API key or OAuth is set
OpenAIConfigured bool `json:"openai_configured"` // true if OpenAI API key is set
DeepSeekConfigured bool `json:"deepseek_configured"` // true if DeepSeek API key is set
OllamaConfigured bool `json:"ollama_configured"` // true (always available for attempt)
OllamaBaseURL string `json:"ollama_base_url"` // Ollama server URL
OpenAIBaseURL string `json:"openai_base_url,omitempty"` // Custom OpenAI base URL
ConfiguredProviders []string `json:"configured_providers"` // List of provider names with credentials
AnthropicConfigured bool `json:"anthropic_configured"` // true if Anthropic API key or OAuth is set
OpenAIConfigured bool `json:"openai_configured"` // true if OpenAI API key is set
DeepSeekConfigured bool `json:"deepseek_configured"` // true if DeepSeek API key is set
OllamaConfigured bool `json:"ollama_configured"` // true (always available for attempt)
OllamaBaseURL string `json:"ollama_base_url"` // Ollama server URL
OpenAIBaseURL string `json:"openai_base_url,omitempty"` // Custom OpenAI base URL
ConfiguredProviders []string `json:"configured_providers"` // List of provider names with credentials
}
// AISettingsUpdateRequest is the request body for PUT /api/settings/ai
type AISettingsUpdateRequest struct {
Enabled *bool `json:"enabled,omitempty"`
Provider *string `json:"provider,omitempty"` // DEPRECATED: use model selection instead
APIKey *string `json:"api_key,omitempty"` // DEPRECATED: use per-provider keys
Provider *string `json:"provider,omitempty"` // DEPRECATED: use model selection instead
APIKey *string `json:"api_key,omitempty"` // DEPRECATED: use per-provider keys
Model *string `json:"model,omitempty"`
ChatModel *string `json:"chat_model,omitempty"` // Model for interactive chat
PatrolModel *string `json:"patrol_model,omitempty"` // Model for background patrol
@ -582,9 +587,9 @@ func (h *AISettingsHandler) HandleListModels(w http.ResponseWriter, r *http.Requ
}
type Response struct {
Models []ModelInfo `json:"models"`
Error string `json:"error,omitempty"`
Cached bool `json:"cached"`
Models []ModelInfo `json:"models"`
Error string `json:"error,omitempty"`
Cached bool `json:"cached"`
}
models, err := h.aiService.ListModels(ctx)
@ -622,25 +627,25 @@ func (h *AISettingsHandler) HandleListModels(w http.ResponseWriter, r *http.Requ
// AIExecuteRequest is the request body for POST /api/ai/execute
// AIConversationMessage represents a message in conversation history
type AIConversationMessage struct {
Role string `json:"role"` // "user" or "assistant"
Role string `json:"role"` // "user" or "assistant"
Content string `json:"content"`
}
type AIExecuteRequest struct {
Prompt string `json:"prompt"`
TargetType string `json:"target_type,omitempty"` // "host", "container", "vm", "node"
TargetID string `json:"target_id,omitempty"`
Context map[string]interface{} `json:"context,omitempty"` // Current metrics, state, etc.
History []AIConversationMessage `json:"history,omitempty"` // Previous conversation messages
Prompt string `json:"prompt"`
TargetType string `json:"target_type,omitempty"` // "host", "container", "vm", "node"
TargetID string `json:"target_id,omitempty"`
Context map[string]interface{} `json:"context,omitempty"` // Current metrics, state, etc.
History []AIConversationMessage `json:"history,omitempty"` // Previous conversation messages
}
// AIExecuteResponse is the response from POST /api/ai/execute
type AIExecuteResponse struct {
Content string `json:"content"`
Model string `json:"model"`
InputTokens int `json:"input_tokens"`
OutputTokens int `json:"output_tokens"`
ToolCalls []ai.ToolExecution `json:"tool_calls,omitempty"` // Commands that were executed
Content string `json:"content"`
Model string `json:"model"`
InputTokens int `json:"input_tokens"`
OutputTokens int `json:"output_tokens"`
ToolCalls []ai.ToolExecution `json:"tool_calls,omitempty"` // Commands that were executed
}
// HandleExecute executes an AI prompt (POST /api/ai/execute)
@ -935,7 +940,6 @@ type AIRunCommandRequest struct {
TargetHost string `json:"target_host,omitempty"` // Explicit host for routing
}
// HandleRunCommand executes a single approved command (POST /api/ai/run-command)
func (h *AISettingsHandler) HandleRunCommand(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
@ -957,7 +961,7 @@ func (h *AISettingsHandler) HandleRunCommand(w http.ResponseWriter, r *http.Requ
return
}
log.Debug().Str("body", string(bodyBytes)).Msg("run-command request body")
var req AIRunCommandRequest
if err := json.Unmarshal(bodyBytes, &req); err != nil {
log.Error().Err(err).Str("body", string(bodyBytes)).Msg("Failed to decode JSON body")
@ -2059,7 +2063,7 @@ func (h *AISettingsHandler) HandleAcknowledgeFinding(w http.ResponseWriter, r *h
}
findings := patrol.GetFindings()
// Just acknowledge - don't resolve. Finding stays visible but marked as seen.
// Auto-resolve will remove it when the underlying condition clears.
if !findings.Acknowledge(req.FindingID) {
@ -2126,7 +2130,7 @@ func (h *AISettingsHandler) HandleSnoozeFinding(w http.ResponseWriter, r *http.R
findings := patrol.GetFindings()
duration := time.Duration(req.DurationHours) * time.Hour
if !findings.Snooze(req.FindingID, duration) {
http.Error(w, "Finding not found or already resolved", http.StatusNotFound)
return
@ -2180,7 +2184,7 @@ func (h *AISettingsHandler) HandleResolveFinding(w http.ResponseWriter, r *http.
}
findings := patrol.GetFindings()
// Mark as manually resolved (auto=false since user did it)
if !findings.Resolve(req.FindingID, false) {
http.Error(w, "Finding not found or already resolved", http.StatusNotFound)
@ -2223,8 +2227,8 @@ func (h *AISettingsHandler) HandleDismissFinding(w http.ResponseWriter, r *http.
var req struct {
FindingID string `json:"finding_id"`
Reason string `json:"reason"` // "not_an_issue", "expected_behavior", "will_fix_later"
Note string `json:"note"` // Optional freeform note
Reason string `json:"reason"` // "not_an_issue", "expected_behavior", "will_fix_later"
Note string `json:"note"` // Optional freeform note
}
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
http.Error(w, "Invalid request body", http.StatusBadRequest)
@ -2248,7 +2252,7 @@ func (h *AISettingsHandler) HandleDismissFinding(w http.ResponseWriter, r *http.
}
findings := patrol.GetFindings()
if !findings.Dismiss(req.FindingID, req.Reason, req.Note) {
http.Error(w, "Finding not found", http.StatusNotFound)
return
@ -2303,7 +2307,7 @@ func (h *AISettingsHandler) HandleSuppressFinding(w http.ResponseWriter, r *http
}
findings := patrol.GetFindings()
if !findings.Suppress(req.FindingID) {
http.Error(w, "Finding not found", http.StatusNotFound)
return
@ -2392,6 +2396,40 @@ func (h *AISettingsHandler) HandleGetPatrolRunHistory(w http.ResponseWriter, r *
}
}
// HandleGetAICostSummary returns AI usage rollups (GET /api/ai/cost/summary?days=N).
func (h *AISettingsHandler) HandleGetAICostSummary(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
return
}
// Parse optional days query parameter (default: 30, max: 365)
days := 30
if daysStr := r.URL.Query().Get("days"); daysStr != "" {
if _, err := fmt.Sscanf(daysStr, "%d", &days); err == nil && days > 0 {
if days > 365 {
days = 365
}
}
}
var summary cost.Summary
if h.aiService != nil {
summary = h.aiService.GetCostSummary(days)
} else {
summary = cost.Summary{
Days: days,
ProviderModels: []cost.ProviderModelSummary{},
DailyTotals: []cost.DailySummary{},
Totals: cost.ProviderModelSummary{Provider: "all"},
}
}
if err := utils.WriteJSONResponse(w, summary); err != nil {
log.Error().Err(err).Msg("Failed to write AI cost summary response")
}
}
// HandleGetSuppressionRules returns all suppression rules (GET /api/ai/patrol/suppressions)
func (h *AISettingsHandler) HandleGetSuppressionRules(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
@ -2427,7 +2465,7 @@ func (h *AISettingsHandler) HandleAddSuppressionRule(w http.ResponseWriter, r *h
return
}
// Require authentication
// Require authentication
if !CheckAuth(h.config, w, r) {
return
}
@ -2523,7 +2561,7 @@ func (h *AISettingsHandler) HandleDeleteSuppressionRule(w http.ResponseWriter, r
}
findings := patrol.GetFindings()
if !findings.DeleteSuppressionRule(ruleID) {
http.Error(w, "Rule not found", http.StatusNotFound)
return

View file

@ -61,6 +61,7 @@ type Router struct {
updateManager *updates.Manager
updateHistory *updates.UpdateHistory
exportLimiter *RateLimiter
downloadLimiter *RateLimiter
persistence *config.ConfigPersistence
oidcMu sync.Mutex
oidcService *OIDCService
@ -75,6 +76,8 @@ type Router struct {
publicURLDetected bool
bootstrapTokenHash string
bootstrapTokenPath string
checksumMu sync.RWMutex
checksumCache map[string]checksumCacheEntry
}
func pulseBinDir() string {
@ -124,17 +127,19 @@ func NewRouter(cfg *config.Config, monitor *monitoring.Monitor, wsHub *websocket
updateManager.SetHistory(updateHistory)
r := &Router{
mux: http.NewServeMux(),
config: cfg,
monitor: monitor,
wsHub: wsHub,
reloadFunc: reloadFunc,
updateManager: updateManager,
updateHistory: updateHistory,
exportLimiter: NewRateLimiter(5, 1*time.Minute), // 5 attempts per minute
persistence: config.NewConfigPersistence(cfg.DataPath),
serverVersion: strings.TrimSpace(serverVersion),
projectRoot: projectRoot,
mux: http.NewServeMux(),
config: cfg,
monitor: monitor,
wsHub: wsHub,
reloadFunc: reloadFunc,
updateManager: updateManager,
updateHistory: updateHistory,
exportLimiter: NewRateLimiter(5, 1*time.Minute), // 5 attempts per minute
downloadLimiter: NewRateLimiter(60, 1*time.Minute), // downloads/installers per minute per IP
persistence: config.NewConfigPersistence(cfg.DataPath),
serverVersion: strings.TrimSpace(serverVersion),
projectRoot: projectRoot,
checksumCache: make(map[string]checksumCacheEntry),
}
r.initializeBootstrapToken()
@ -1090,10 +1095,11 @@ func (r *Router) setupRoutes() {
r.mux.HandleFunc("/api/ai/knowledge/clear", RequireAuth(r.config, r.aiSettingsHandler.HandleClearGuestKnowledge))
r.mux.HandleFunc("/api/ai/debug/context", RequireAdmin(r.config, r.aiSettingsHandler.HandleDebugContext))
r.mux.HandleFunc("/api/ai/agents", RequireAuth(r.config, r.aiSettingsHandler.HandleGetConnectedAgents))
r.mux.HandleFunc("/api/ai/cost/summary", RequireAuth(r.config, r.aiSettingsHandler.HandleGetAICostSummary))
// OAuth endpoints for Claude Pro/Max subscription authentication
r.mux.HandleFunc("/api/ai/oauth/start", RequireAdmin(r.config, r.aiSettingsHandler.HandleOAuthStart))
r.mux.HandleFunc("/api/ai/oauth/exchange", RequireAdmin(r.config, r.aiSettingsHandler.HandleOAuthExchange)) // Manual code input
r.mux.HandleFunc("/api/ai/oauth/callback", r.aiSettingsHandler.HandleOAuthCallback) // Public - receives redirect from Anthropic
r.mux.HandleFunc("/api/ai/oauth/callback", r.aiSettingsHandler.HandleOAuthCallback) // Public - receives redirect from Anthropic
r.mux.HandleFunc("/api/ai/oauth/disconnect", RequireAdmin(r.config, r.aiSettingsHandler.HandleOAuthDisconnect))
// AI Patrol routes for background monitoring
@ -1103,8 +1109,8 @@ func (r *Router) setupRoutes() {
r.mux.HandleFunc("/api/ai/patrol/history", RequireAuth(r.config, r.aiSettingsHandler.HandleGetFindingsHistory))
r.mux.HandleFunc("/api/ai/patrol/run", RequireAdmin(r.config, r.aiSettingsHandler.HandleForcePatrol))
r.mux.HandleFunc("/api/ai/patrol/acknowledge", RequireAuth(r.config, r.aiSettingsHandler.HandleAcknowledgeFinding))
r.mux.HandleFunc("/api/ai/patrol/dismiss", RequireAuth(r.config, r.aiSettingsHandler.HandleDismissFinding)) // Dismiss with reason (LLM memory)
r.mux.HandleFunc("/api/ai/patrol/suppress", RequireAuth(r.config, r.aiSettingsHandler.HandleSuppressFinding)) // Permanently suppress (LLM memory)
r.mux.HandleFunc("/api/ai/patrol/dismiss", RequireAuth(r.config, r.aiSettingsHandler.HandleDismissFinding)) // Dismiss with reason (LLM memory)
r.mux.HandleFunc("/api/ai/patrol/suppress", RequireAuth(r.config, r.aiSettingsHandler.HandleSuppressFinding)) // Permanently suppress (LLM memory)
r.mux.HandleFunc("/api/ai/patrol/snooze", RequireAuth(r.config, r.aiSettingsHandler.HandleSnoozeFinding))
r.mux.HandleFunc("/api/ai/patrol/resolve", RequireAuth(r.config, r.aiSettingsHandler.HandleResolveFinding))
r.mux.HandleFunc("/api/ai/patrol/runs", RequireAuth(r.config, r.aiSettingsHandler.HandleGetPatrolRunHistory))
@ -1125,23 +1131,23 @@ func (r *Router) setupRoutes() {
// Agent WebSocket for AI command execution
r.mux.HandleFunc("/api/agent/ws", r.handleAgentWebSocket)
// Docker agent download endpoints
r.mux.HandleFunc("/install-docker-agent.sh", r.handleDownloadInstallScript) // Serves the Docker agent install script
r.mux.HandleFunc("/install-container-agent.sh", r.handleDownloadContainerAgentInstallScript)
r.mux.HandleFunc("/download/pulse-docker-agent", r.handleDownloadAgent)
// Docker agent download endpoints (public but rate limited)
r.mux.HandleFunc("/install-docker-agent.sh", r.downloadLimiter.Middleware(r.handleDownloadInstallScript)) // Serves the Docker agent install script
r.mux.HandleFunc("/install-container-agent.sh", r.downloadLimiter.Middleware(r.handleDownloadContainerAgentInstallScript))
r.mux.HandleFunc("/download/pulse-docker-agent", r.downloadLimiter.Middleware(r.handleDownloadAgent))
// Host agent download endpoints
r.mux.HandleFunc("/install-host-agent.sh", r.handleDownloadHostAgentInstallScript)
r.mux.HandleFunc("/install-host-agent.ps1", r.handleDownloadHostAgentInstallScriptPS)
r.mux.HandleFunc("/uninstall-host-agent.sh", r.handleDownloadHostAgentUninstallScript)
r.mux.HandleFunc("/uninstall-host-agent.ps1", r.handleDownloadHostAgentUninstallScriptPS)
r.mux.HandleFunc("/download/pulse-host-agent", r.handleDownloadHostAgent)
r.mux.HandleFunc("/download/pulse-host-agent.sha256", r.handleDownloadHostAgent)
// Host agent download endpoints (public but rate limited)
r.mux.HandleFunc("/install-host-agent.sh", r.downloadLimiter.Middleware(r.handleDownloadHostAgentInstallScript))
r.mux.HandleFunc("/install-host-agent.ps1", r.downloadLimiter.Middleware(r.handleDownloadHostAgentInstallScriptPS))
r.mux.HandleFunc("/uninstall-host-agent.sh", r.downloadLimiter.Middleware(r.handleDownloadHostAgentUninstallScript))
r.mux.HandleFunc("/uninstall-host-agent.ps1", r.downloadLimiter.Middleware(r.handleDownloadHostAgentUninstallScriptPS))
r.mux.HandleFunc("/download/pulse-host-agent", r.downloadLimiter.Middleware(r.handleDownloadHostAgent))
r.mux.HandleFunc("/download/pulse-host-agent.sha256", r.downloadLimiter.Middleware(r.handleDownloadHostAgent))
// Unified Agent endpoints
r.mux.HandleFunc("/install.sh", r.handleDownloadUnifiedInstallScript)
r.mux.HandleFunc("/install.ps1", r.handleDownloadUnifiedInstallScriptPS)
r.mux.HandleFunc("/download/pulse-agent", r.handleDownloadUnifiedAgent)
// Unified Agent endpoints (public but rate limited)
r.mux.HandleFunc("/install.sh", r.downloadLimiter.Middleware(r.handleDownloadUnifiedInstallScript))
r.mux.HandleFunc("/install.ps1", r.downloadLimiter.Middleware(r.handleDownloadUnifiedInstallScriptPS))
r.mux.HandleFunc("/download/pulse-agent", r.downloadLimiter.Middleware(r.handleDownloadUnifiedAgent))
r.mux.HandleFunc("/api/agent/version", r.handleAgentVersion)
r.mux.HandleFunc("/api/server/info", r.handleServerInfo)
@ -1405,6 +1411,16 @@ func (r *Router) StartPatrol(ctx context.Context) {
}
}
// Connect patrol to metrics history for enriched context (trends, predictions)
if r.monitor != nil {
if metricsHistory := r.monitor.GetMetricsHistory(); metricsHistory != nil {
adapter := ai.NewMetricsHistoryAdapter(metricsHistory)
if adapter != nil {
r.aiSettingsHandler.SetMetricsHistoryProvider(adapter)
}
}
}
r.aiSettingsHandler.StartPatrol(ctx)
}
}
@ -2620,13 +2636,13 @@ func (r *Router) handleState(w http.ResponseWriter, req *http.Request) {
}
state := r.monitor.GetState()
// Also populate the unified resource store (Phase 1 of unified architecture)
// This runs on every state request to keep resources up-to-date
if r.resourceHandlers != nil {
r.resourceHandlers.PopulateFromSnapshot(state)
}
frontendState := state.ToFrontend()
if err := utils.WriteJSONResponse(w, frontendState); err != nil {
@ -3771,26 +3787,19 @@ func (r *Router) handleDownloadAgent(w http.ResponseWriter, req *http.Request) {
continue
}
checksum, err := r.cachedSHA256(candidate, info)
if err != nil {
log.Error().Err(err).Str("path", candidate).Msg("Failed to compute docker agent checksum")
continue
}
file, err := os.Open(candidate)
if err != nil {
log.Error().Err(err).Str("path", candidate).Msg("Failed to open docker agent binary for download")
continue
}
hasher := sha256.New()
if _, err := io.Copy(hasher, file); err != nil {
file.Close()
log.Error().Err(err).Str("path", candidate).Msg("Failed to hash docker agent binary")
continue
}
if _, err := file.Seek(0, io.SeekStart); err != nil {
file.Close()
log.Error().Err(err).Str("path", candidate).Msg("Failed to rewind docker agent binary")
continue
}
w.Header().Set("X-Checksum-Sha256", hex.EncodeToString(hasher.Sum(nil)))
w.Header().Set("X-Checksum-Sha256", checksum)
http.ServeContent(w, req, filepath.Base(candidate), info.ModTime(), file)
file.Close()
return
@ -4041,22 +4050,73 @@ func sortedHostAgentKeys(missing map[string]agentbinaries.HostAgentBinary) []str
return keys
}
// serveChecksum computes and serves the SHA256 checksum of a file
func (r *Router) serveChecksum(w http.ResponseWriter, filepath string) {
file, err := os.Open(filepath)
type checksumCacheEntry struct {
checksum string
modTime time.Time
size int64
}
func (r *Router) cachedSHA256(filePath string, info os.FileInfo) (string, error) {
if filePath == "" {
return "", fmt.Errorf("empty file path")
}
if info == nil {
var err error
info, err = os.Stat(filePath)
if err != nil {
return "", err
}
}
r.checksumMu.RLock()
entry, ok := r.checksumCache[filePath]
r.checksumMu.RUnlock()
if ok && entry.size == info.Size() && entry.modTime.Equal(info.ModTime()) {
return entry.checksum, nil
}
file, err := os.Open(filePath)
if err != nil {
http.Error(w, "Failed to open file", http.StatusInternalServerError)
return
return "", err
}
defer file.Close()
hasher := sha256.New()
if _, err := io.Copy(hasher, file); err != nil {
return "", err
}
checksum := hex.EncodeToString(hasher.Sum(nil))
r.checksumMu.Lock()
if r.checksumCache == nil {
r.checksumCache = make(map[string]checksumCacheEntry)
}
r.checksumCache[filePath] = checksumCacheEntry{
checksum: checksum,
modTime: info.ModTime(),
size: info.Size(),
}
r.checksumMu.Unlock()
return checksum, nil
}
// serveChecksum computes and serves the SHA256 checksum of a file
func (r *Router) serveChecksum(w http.ResponseWriter, filePath string) {
info, err := os.Stat(filePath)
if err != nil {
http.Error(w, "Failed to stat file", http.StatusInternalServerError)
return
}
checksum, err := r.cachedSHA256(filePath, info)
if err != nil {
http.Error(w, "Failed to compute checksum", http.StatusInternalServerError)
return
}
checksum := hex.EncodeToString(hasher.Sum(nil))
w.Header().Set("Content-Type", "text/plain")
fmt.Fprintf(w, "%s\n", checksum)
}

View file

@ -1,9 +1,6 @@
package api
import (
"crypto/sha256"
"encoding/hex"
"io"
"net/http"
"os"
"path/filepath"
@ -139,26 +136,19 @@ func (r *Router) handleDownloadUnifiedAgent(w http.ResponseWriter, req *http.Req
continue
}
checksum, err := r.cachedSHA256(candidate, info)
if err != nil {
log.Error().Err(err).Str("path", candidate).Msg("Failed to compute unified agent checksum")
continue
}
file, err := os.Open(candidate)
if err != nil {
log.Error().Err(err).Str("path", candidate).Msg("Failed to open unified agent binary for download")
continue
}
hasher := sha256.New()
if _, err := io.Copy(hasher, file); err != nil {
file.Close()
log.Error().Err(err).Str("path", candidate).Msg("Failed to hash unified agent binary")
continue
}
if _, err := file.Seek(0, io.SeekStart); err != nil {
file.Close()
log.Error().Err(err).Str("path", candidate).Msg("Failed to rewind unified agent binary")
continue
}
w.Header().Set("X-Checksum-Sha256", hex.EncodeToString(hasher.Sum(nil)))
w.Header().Set("X-Checksum-Sha256", checksum)
http.ServeContent(w, req, filepath.Base(candidate), info.ModTime(), file)
file.Close()
return

View file

@ -0,0 +1,59 @@
package api
import (
"crypto/sha256"
"fmt"
"net/http"
"net/http/httptest"
"os"
"path/filepath"
"strings"
"testing"
"time"
)
func TestHandleDownloadUnifiedAgentSetsChecksumAndInvalidatesOnChange(t *testing.T) {
binDir := setupTempPulseBin(t)
filePath := filepath.Join(binDir, "pulse-agent-linux-amd64")
payload1 := []byte("agent-binary-v1")
if err := os.WriteFile(filePath, payload1, 0o755); err != nil {
t.Fatalf("failed to write test binary: %v", err)
}
req1 := httptest.NewRequest(http.MethodGet, "/download/pulse-agent?arch=linux-amd64", nil)
rr1 := httptest.NewRecorder()
router := &Router{checksumCache: make(map[string]checksumCacheEntry)}
router.handleDownloadUnifiedAgent(rr1, req1)
if rr1.Code != http.StatusOK {
t.Fatalf("expected 200 OK, got %d", rr1.Code)
}
expected1 := fmt.Sprintf("%x", sha256.Sum256(payload1))
if got := rr1.Header().Get("X-Checksum-Sha256"); got != expected1 {
t.Fatalf("unexpected checksum header: got %q want %q", got, expected1)
}
// Ensure modtime changes for invalidation.
time.Sleep(10 * time.Millisecond)
payload2 := []byte("agent-binary-v2")
if err := os.WriteFile(filePath, payload2, 0o755); err != nil {
t.Fatalf("failed to rewrite test binary: %v", err)
}
req2 := httptest.NewRequest(http.MethodGet, "/download/pulse-agent?arch=linux-amd64", nil)
rr2 := httptest.NewRecorder()
router.handleDownloadUnifiedAgent(rr2, req2)
expected2 := fmt.Sprintf("%x", sha256.Sum256(payload2))
if got := rr2.Header().Get("X-Checksum-Sha256"); got != expected2 {
t.Fatalf("checksum did not update after file change: got %q want %q", got, expected2)
}
if strings.TrimSpace(rr2.Body.String()) != string(payload2) {
t.Fatalf("unexpected response body after update")
}
}

View file

@ -0,0 +1,41 @@
package config
import (
"testing"
"time"
)
func TestSaveLoadAIUsageHistory(t *testing.T) {
dir := t.TempDir()
cp := NewConfigPersistence(dir)
now := time.Now()
events := []AIUsageEventRecord{
{
Timestamp: now,
Provider: "openai",
RequestModel: "openai:gpt-4o",
InputTokens: 123,
OutputTokens: 45,
UseCase: "chat",
TargetType: "vm",
TargetID: "vm-101",
},
}
if err := cp.SaveAIUsageHistory(events); err != nil {
t.Fatalf("SaveAIUsageHistory: %v", err)
}
loaded, err := cp.LoadAIUsageHistory()
if err != nil {
t.Fatalf("LoadAIUsageHistory: %v", err)
}
if len(loaded.Events) != 1 {
t.Fatalf("expected 1 event, got %d", len(loaded.Events))
}
if loaded.Events[0].Provider != "openai" || loaded.Events[0].InputTokens != 123 {
t.Fatalf("loaded event mismatch: %+v", loaded.Events[0])
}
}

View file

@ -171,6 +171,7 @@ type DiscoveryConfig struct {
EnvironmentOverride string `json:"environment_override,omitempty"`
SubnetAllowlist []string `json:"subnet_allowlist,omitempty"`
SubnetBlocklist []string `json:"subnet_blocklist,omitempty"`
IPBlocklist []string `json:"ip_blocklist,omitempty"` // Individual IPs to skip (auto-populated with configured Proxmox hosts)
MaxHostsPerScan int `json:"max_hosts_per_scan,omitempty"`
MaxConcurrent int `json:"max_concurrent,omitempty"`
EnableReverseDNS bool `json:"enable_reverse_dns"`
@ -203,6 +204,9 @@ func CloneDiscoveryConfig(cfg DiscoveryConfig) DiscoveryConfig {
if cfg.SubnetBlocklist != nil {
clone.SubnetBlocklist = append([]string(nil), cfg.SubnetBlocklist...)
}
if cfg.IPBlocklist != nil {
clone.IPBlocklist = append([]string(nil), cfg.IPBlocklist...)
}
return clone
}

View file

@ -20,21 +20,22 @@ import (
// ConfigPersistence handles saving and loading configuration
type ConfigPersistence struct {
mu sync.RWMutex
tx *importTransaction
configDir string
alertFile string
emailFile string
webhookFile string
appriseFile string
nodesFile string
systemFile string
oidcFile string
apiTokensFile string
aiFile string
aiFindingsFile string
aiPatrolRunsFile string
crypto *crypto.CryptoManager
mu sync.RWMutex
tx *importTransaction
configDir string
alertFile string
emailFile string
webhookFile string
appriseFile string
nodesFile string
systemFile string
oidcFile string
apiTokensFile string
aiFile string
aiFindingsFile string
aiPatrolRunsFile string
aiUsageHistoryFile string
crypto *crypto.CryptoManager
}
// NewConfigPersistence creates a new config persistence manager.
@ -67,19 +68,20 @@ func newConfigPersistence(configDir string) (*ConfigPersistence, error) {
}
cp := &ConfigPersistence{
configDir: configDir,
alertFile: filepath.Join(configDir, "alerts.json"),
emailFile: filepath.Join(configDir, "email.enc"),
webhookFile: filepath.Join(configDir, "webhooks.enc"),
appriseFile: filepath.Join(configDir, "apprise.enc"),
nodesFile: filepath.Join(configDir, "nodes.enc"),
systemFile: filepath.Join(configDir, "system.json"),
oidcFile: filepath.Join(configDir, "oidc.enc"),
apiTokensFile: filepath.Join(configDir, "api_tokens.json"),
aiFile: filepath.Join(configDir, "ai.enc"),
aiFindingsFile: filepath.Join(configDir, "ai_findings.json"),
aiPatrolRunsFile: filepath.Join(configDir, "ai_patrol_runs.json"),
crypto: cryptoMgr,
configDir: configDir,
alertFile: filepath.Join(configDir, "alerts.json"),
emailFile: filepath.Join(configDir, "email.enc"),
webhookFile: filepath.Join(configDir, "webhooks.enc"),
appriseFile: filepath.Join(configDir, "apprise.enc"),
nodesFile: filepath.Join(configDir, "nodes.enc"),
systemFile: filepath.Join(configDir, "system.json"),
oidcFile: filepath.Join(configDir, "oidc.enc"),
apiTokensFile: filepath.Join(configDir, "api_tokens.json"),
aiFile: filepath.Join(configDir, "ai.enc"),
aiFindingsFile: filepath.Join(configDir, "ai_findings.json"),
aiPatrolRunsFile: filepath.Join(configDir, "ai_patrol_runs.json"),
aiUsageHistoryFile: filepath.Join(configDir, "ai_usage_history.json"),
crypto: cryptoMgr,
}
log.Debug().
@ -1382,24 +1384,24 @@ type AIFindingsData struct {
// AIFindingRecord is a persisted finding with full history
type AIFindingRecord struct {
ID string `json:"id"`
Severity string `json:"severity"`
Category string `json:"category"`
ResourceID string `json:"resource_id"`
ResourceName string `json:"resource_name"`
ResourceType string `json:"resource_type"`
Node string `json:"node,omitempty"`
Title string `json:"title"`
Description string `json:"description"`
Recommendation string `json:"recommendation,omitempty"`
Evidence string `json:"evidence,omitempty"`
DetectedAt time.Time `json:"detected_at"`
LastSeenAt time.Time `json:"last_seen_at"`
ID string `json:"id"`
Severity string `json:"severity"`
Category string `json:"category"`
ResourceID string `json:"resource_id"`
ResourceName string `json:"resource_name"`
ResourceType string `json:"resource_type"`
Node string `json:"node,omitempty"`
Title string `json:"title"`
Description string `json:"description"`
Recommendation string `json:"recommendation,omitempty"`
Evidence string `json:"evidence,omitempty"`
DetectedAt time.Time `json:"detected_at"`
LastSeenAt time.Time `json:"last_seen_at"`
ResolvedAt *time.Time `json:"resolved_at,omitempty"`
AutoResolved bool `json:"auto_resolved"`
AutoResolved bool `json:"auto_resolved"`
AcknowledgedAt *time.Time `json:"acknowledged_at,omitempty"`
SnoozedUntil *time.Time `json:"snoozed_until,omitempty"`
AlertID string `json:"alert_id,omitempty"`
AlertID string `json:"alert_id,omitempty"`
}
// SaveAIFindings persists AI findings to disk
@ -1474,26 +1476,26 @@ func (c *ConfigPersistence) LoadAIFindings() (*AIFindingsData, error) {
// PatrolRunHistoryData represents persisted patrol run history with metadata
type PatrolRunHistoryData struct {
Version int `json:"version"`
LastSaved time.Time `json:"last_saved"`
Runs []PatrolRunRecord `json:"runs"`
Version int `json:"version"`
LastSaved time.Time `json:"last_saved"`
Runs []PatrolRunRecord `json:"runs"`
}
// PatrolRunRecord represents a single patrol check run
type PatrolRunRecord struct {
ID string `json:"id"`
StartedAt time.Time `json:"started_at"`
CompletedAt time.Time `json:"completed_at"`
DurationMs int64 `json:"duration_ms"`
Type string `json:"type"` // "quick" or "deep"
ResourcesChecked int `json:"resources_checked"`
ID string `json:"id"`
StartedAt time.Time `json:"started_at"`
CompletedAt time.Time `json:"completed_at"`
DurationMs int64 `json:"duration_ms"`
Type string `json:"type"` // "quick" or "deep"
ResourcesChecked int `json:"resources_checked"`
// Breakdown by resource type
NodesChecked int `json:"nodes_checked"`
GuestsChecked int `json:"guests_checked"`
DockerChecked int `json:"docker_checked"`
StorageChecked int `json:"storage_checked"`
HostsChecked int `json:"hosts_checked"`
PBSChecked int `json:"pbs_checked"`
NodesChecked int `json:"nodes_checked"`
GuestsChecked int `json:"guests_checked"`
DockerChecked int `json:"docker_checked"`
StorageChecked int `json:"storage_checked"`
HostsChecked int `json:"hosts_checked"`
PBSChecked int `json:"pbs_checked"`
// Findings from this run
NewFindings int `json:"new_findings"`
ExistingFindings int `json:"existing_findings"`
@ -1508,6 +1510,96 @@ type PatrolRunRecord struct {
OutputTokens int `json:"output_tokens,omitempty"` // Tokens received from AI
}
// AIUsageHistoryData represents persisted AI usage history with metadata
type AIUsageHistoryData struct {
Version int `json:"version"`
LastSaved time.Time `json:"last_saved"`
Events []AIUsageEventRecord `json:"events"`
}
// AIUsageEventRecord is a persisted usage event for an AI provider call.
// This intentionally excludes prompt/response content for privacy.
type AIUsageEventRecord struct {
Timestamp time.Time `json:"timestamp"`
Provider string `json:"provider"`
RequestModel string `json:"request_model"`
ResponseModel string `json:"response_model,omitempty"`
UseCase string `json:"use_case,omitempty"` // "chat" or "patrol"
InputTokens int `json:"input_tokens,omitempty"`
OutputTokens int `json:"output_tokens,omitempty"`
TargetType string `json:"target_type,omitempty"`
TargetID string `json:"target_id,omitempty"`
FindingID string `json:"finding_id,omitempty"`
}
// SaveAIUsageHistory persists AI usage events to disk.
func (c *ConfigPersistence) SaveAIUsageHistory(events []AIUsageEventRecord) error {
c.mu.Lock()
defer c.mu.Unlock()
if err := c.EnsureConfigDir(); err != nil {
return err
}
data := AIUsageHistoryData{
Version: 1,
LastSaved: time.Now(),
Events: events,
}
jsonData, err := json.Marshal(data)
if err != nil {
return err
}
if err := c.writeConfigFileLocked(c.aiUsageHistoryFile, jsonData, 0600); err != nil {
return err
}
log.Debug().
Str("file", c.aiUsageHistoryFile).
Int("count", len(events)).
Msg("AI usage history saved")
return nil
}
// LoadAIUsageHistory loads AI usage events from disk.
func (c *ConfigPersistence) LoadAIUsageHistory() (*AIUsageHistoryData, error) {
c.mu.RLock()
defer c.mu.RUnlock()
data, err := os.ReadFile(c.aiUsageHistoryFile)
if err != nil {
if os.IsNotExist(err) {
return &AIUsageHistoryData{
Version: 1,
Events: make([]AIUsageEventRecord, 0),
}, nil
}
return nil, err
}
var usageData AIUsageHistoryData
if err := json.Unmarshal(data, &usageData); err != nil {
log.Error().Err(err).Str("file", c.aiUsageHistoryFile).Msg("Failed to parse AI usage history file")
return &AIUsageHistoryData{
Version: 1,
Events: make([]AIUsageEventRecord, 0),
}, nil
}
if usageData.Events == nil {
usageData.Events = make([]AIUsageEventRecord, 0)
}
log.Info().
Str("file", c.aiUsageHistoryFile).
Int("count", len(usageData.Events)).
Time("last_saved", usageData.LastSaved).
Msg("AI usage history loaded")
return &usageData, nil
}
// SavePatrolRunHistory persists patrol run history to disk
func (c *ConfigPersistence) SavePatrolRunHistory(runs []PatrolRunRecord) error {
c.mu.Lock()

View file

@ -113,6 +113,19 @@ func ApplyConfigToProfile(profile *envdetect.EnvironmentProfile, cfg config.Disc
if cfg.HTTPTimeout > 0 {
profile.Policy.HTTPTimeout = time.Duration(cfg.HTTPTimeout) * time.Millisecond
}
// Apply IP blocklist (individual IPs to skip, e.g. already-configured Proxmox hosts)
for _, ipStr := range cfg.IPBlocklist {
ipStr = strings.TrimSpace(ipStr)
if ipStr == "" {
continue
}
if ip := net.ParseIP(ipStr); ip != nil {
profile.IPBlocklist = append(profile.IPBlocklist, ip)
} else {
profile.Warnings = append(profile.Warnings, fmt.Sprintf("Invalid IP in blocklist: %s", ipStr))
}
}
}
func shouldPruneContainerNetworks(env envdetect.Environment) bool {

View file

@ -3265,6 +3265,91 @@ func (m *Monitor) baseIntervalForInstanceType(instanceType InstanceType) time.Du
}
}
// getConfiguredHostIPs returns a list of IP addresses from all configured Proxmox hosts.
// This is used to prevent discovery from probing hosts we already know about.
// Caller must hold m.mu.RLock or m.mu.Lock.
func (m *Monitor) getConfiguredHostIPs() []string {
if m.config == nil {
return nil
}
seen := make(map[string]struct{})
var ips []string
addHost := func(host string) {
// Parse the host to extract IP/hostname
host = strings.TrimSpace(host)
if host == "" {
return
}
// Remove scheme if present
if strings.HasPrefix(host, "https://") {
host = strings.TrimPrefix(host, "https://")
} else if strings.HasPrefix(host, "http://") {
host = strings.TrimPrefix(host, "http://")
}
// Remove port if present
if colonIdx := strings.LastIndex(host, ":"); colonIdx != -1 {
// Check if it's an IPv6 address
if !strings.Contains(host[colonIdx:], "]") {
host = host[:colonIdx]
}
}
// Remove trailing path
if slashIdx := strings.Index(host, "/"); slashIdx != -1 {
host = host[:slashIdx]
}
host = strings.TrimSpace(host)
if host == "" {
return
}
// Check if it's already an IP
if ip := net.ParseIP(host); ip != nil {
if _, exists := seen[host]; !exists {
seen[host] = struct{}{}
ips = append(ips, host)
}
return
}
// Try to resolve hostname to IP
if addrs, err := net.LookupIP(host); err == nil && len(addrs) > 0 {
for _, addr := range addrs {
// Prefer IPv4
if v4 := addr.To4(); v4 != nil {
ipStr := v4.String()
if _, exists := seen[ipStr]; !exists {
seen[ipStr] = struct{}{}
ips = append(ips, ipStr)
}
break
}
}
}
}
// Add PVE hosts
for _, pve := range m.config.PVEInstances {
addHost(pve.Host)
// Also add cluster endpoints
for _, ep := range pve.ClusterEndpoints {
addHost(ep.Host)
addHost(ep.IP)
}
}
// Add PBS hosts
for _, pbs := range m.config.PBSInstances {
addHost(pbs.Host)
}
// Add PMG hosts
for _, pmg := range m.config.PMGInstances {
addHost(pmg.Host)
}
return ips
}
// Start begins the monitoring loop
func (m *Monitor) Start(ctx context.Context, wsHub *websocket.Hub) {
pollingInterval := m.effectivePVEPollingInterval()
@ -3292,7 +3377,11 @@ func (m *Monitor) Start(ctx context.Context, wsHub *websocket.Hub) {
if m.config == nil {
return config.DefaultDiscoveryConfig()
}
return config.CloneDiscoveryConfig(m.config.Discovery)
cfg := config.CloneDiscoveryConfig(m.config.Discovery)
// Auto-populate IPBlocklist with configured Proxmox host IPs to avoid
// probing hosts we already know about (reduces PBS auth failure log spam)
cfg.IPBlocklist = m.getConfiguredHostIPs()
return cfg
}
m.discoveryService = discovery.NewService(wsHub, 5*time.Minute, discoverySubnet, cfgProvider)
if m.discoveryService != nil {
@ -7301,6 +7390,12 @@ func (m *Monitor) GetMetricsStore() *metrics.Store {
return m.metricsStore
}
// GetMetricsHistory returns the in-memory metrics history for trend analysis
// This is used by the AI context builder to compute trends and predictions
func (m *Monitor) GetMetricsHistory() *MetricsHistory {
return m.metricsHistory
}
// shouldSkipNodeMetrics returns true if we should skip detailed metric polling
// for the given node because a host agent is providing richer data.
// This helps reduce API load when agents are active.

View file

@ -234,9 +234,33 @@ func (s *Scanner) DiscoverServersWithCallbacks(ctx context.Context, subnet strin
seenIPs := make(map[string]struct{})
// Pre-populate seenIPs with blocked IPs to skip them during scanning
// This prevents probing already-configured Proxmox hosts (reduces PBS auth failure log spam)
blockedCount := 0
if activeProfile != nil {
for _, ip := range activeProfile.IPBlocklist {
if ip == nil {
continue
}
if ip4 := ip.To4(); ip4 != nil {
seenIPs[ip4.String()] = struct{}{}
blockedCount++
}
}
if blockedCount > 0 {
log.Debug().
Int("blocked_ips", blockedCount).
Msg("Pre-populated blocked IPs to skip during discovery")
}
}
// Calculate total targets and phases for progress tracking
// Use a preview map to ensure we count only unique IPs that will actually be scanned
// Copy blocked IPs to preview map as well
previewSeen := make(map[string]struct{})
for ip := range seenIPs {
previewSeen[ip] = struct{}{}
}
var totalTargets int
var validPhases []envdetect.SubnetPhase
phases := append([]envdetect.SubnetPhase(nil), activeProfile.Phases...)

View file

@ -86,6 +86,7 @@ type EnvironmentProfile struct {
Type Environment // Detected environment.
Phases []SubnetPhase // Subnet scanning phases.
ExtraTargets []net.IP // IPs to always probe.
IPBlocklist []net.IP // Individual IPs to skip (auto-populated with configured Proxmox hosts).
Policy ScanPolicy // Applied scan policy.
Confidence float64 // Overall confidence (0.0 - 1.0).
Warnings []string // Non-fatal detection warnings.