mirror of
https://github.com/rcourtman/Pulse.git
synced 2026-04-28 03:20:11 +00:00
feat(ai): Add enriched context with historical trends and predictions
Phase 1 of Pulse AI differentiation: - Create internal/ai/context package with types, trends, builder, formatter - Implement linear regression for trend computation (growing/declining/stable/volatile) - Add storage capacity predictions (predicts days until 90% and 100%) - Wire MetricsHistory from monitor to patrol service - Update patrol to use buildEnrichedContext instead of basic summary - Update patrol prompt to reference trend indicators and predictions This gives the AI awareness of historical patterns, enabling it to: - Identify resources with concerning growth rates - Predict capacity exhaustion before it happens - Distinguish between stable high usage vs growing problems - Provide more actionable, time-aware insights All tests passing. Falls back to basic summary if metrics history unavailable.
This commit is contained in:
parent
cbb89c4b6a
commit
88d419dd5b
24 changed files with 4269 additions and 295 deletions
407
.agent/docs/PULSE_AI_ARCHITECTURE.md
Normal file
407
.agent/docs/PULSE_AI_ARCHITECTURE.md
Normal file
|
|
@ -0,0 +1,407 @@
|
|||
# Pulse AI Architecture: Long-Term Vision
|
||||
|
||||
## The Core Problem
|
||||
|
||||
Pulse AI currently provides "AI that can talk to your infrastructure." But this is becoming commodity. Any user can:
|
||||
1. Install Claude Code / Cursor / Windsurf
|
||||
2. Give it SSH access to their Proxmox nodes
|
||||
3. Ask "What's wrong with my infrastructure?"
|
||||
|
||||
**We need to provide value that a stateless AI session cannot.**
|
||||
|
||||
---
|
||||
|
||||
## The Fundamental Insight
|
||||
|
||||
A stateless AI with SSH access can answer: **"What is the current state?"**
|
||||
|
||||
Pulse, with its continuous monitoring, can answer:
|
||||
- **"How has this changed over time?"**
|
||||
- **"What does 'normal' look like for YOUR infrastructure?"**
|
||||
- **"What's about to go wrong?"**
|
||||
- **"Have we seen this pattern before?"**
|
||||
- **"What did you do last time this happened?"**
|
||||
|
||||
These require **persistent context** that accumulates over time. This is our moat.
|
||||
|
||||
---
|
||||
|
||||
## Architecture Principles
|
||||
|
||||
### 1. Context is King
|
||||
|
||||
The AI is only as useful as the context we provide. We should think of Pulse as a **context accumulation engine** that happens to have an AI interface.
|
||||
|
||||
Every piece of data Pulse collects should be available to the AI in a digestible form:
|
||||
- Real-time metrics
|
||||
- Historical trends
|
||||
- User annotations
|
||||
- Alert history
|
||||
- Previous AI findings
|
||||
- Configuration changes
|
||||
- Remediation history
|
||||
|
||||
### 2. Time-Aware Intelligence
|
||||
|
||||
The AI should always know:
|
||||
- What's happening **now**
|
||||
- What happened **before** (trends, history)
|
||||
- What will likely happen **next** (forecasts)
|
||||
- What's **different** from normal (anomalies)
|
||||
|
||||
### 3. Learning From Operations
|
||||
|
||||
Every interaction with Pulse teaches it about the user's infrastructure:
|
||||
- Dismissed findings → "This is expected behavior"
|
||||
- User notes → "This VM runs the critical database"
|
||||
- Alert patterns → "This resource is flaky on Tuesdays"
|
||||
- Remediation actions → "Last time this happened, we restarted the service"
|
||||
|
||||
### 4. Proactive, Not Just Reactive
|
||||
|
||||
The goal isn't just to answer questions. It's to:
|
||||
- Surface problems before users ask
|
||||
- Predict capacity issues weeks in advance
|
||||
- Notice patterns humans would miss
|
||||
- Remember what humans would forget
|
||||
|
||||
---
|
||||
|
||||
## Data Architecture
|
||||
|
||||
### Layer 1: Real-Time State (Already Have)
|
||||
|
||||
```
|
||||
StateSnapshot
|
||||
├── Nodes[]
|
||||
├── VMs[]
|
||||
├── Containers[]
|
||||
├── Storage[]
|
||||
├── DockerHosts[]
|
||||
├── PBSInstances[]
|
||||
├── Hosts[]
|
||||
└── PMGInstances[]
|
||||
```
|
||||
|
||||
This is what we send to the AI today. Point-in-time. Commodity.
|
||||
|
||||
### Layer 2: Historical Metrics (Partially Have)
|
||||
|
||||
```
|
||||
MetricsHistory
|
||||
├── NodeMetrics[nodeID] → {CPU[], Memory[], Disk[]} over time
|
||||
├── GuestMetrics[guestID] → {CPU[], Memory[], Network[]} over time
|
||||
└── StorageMetrics[storageID] → {Usage[], Used[], Total[]} over time
|
||||
```
|
||||
|
||||
We collect this for the frontend trendlines, but **don't expose it to the AI**.
|
||||
|
||||
### Layer 3: Computed Insights (Need to Build)
|
||||
|
||||
```
|
||||
InsightsStore
|
||||
├── Trends[resourceID] → {direction, rate_of_change, forecast}
|
||||
├── Baselines[resourceID] → {normal_cpu_range, normal_memory_range, typical_patterns}
|
||||
├── Anomalies[resourceID] → {current_deviations, severity}
|
||||
├── Correlations[] → {resource_a, resource_b, relationship}
|
||||
└── Predictions[] → {resource, metric, predicted_event, eta}
|
||||
```
|
||||
|
||||
This is computed from historical data and provides **derived intelligence**.
|
||||
|
||||
### Layer 4: Operational Memory (Partially Have)
|
||||
|
||||
```
|
||||
OperationalMemory
|
||||
├── Findings[findingID] → {status, user_response, resolution}
|
||||
├── Knowledge[guestID] → {user_notes, learned_facts}
|
||||
├── AlertHistory[] → {alert, duration, resolution, user_action}
|
||||
├── RemediationLog[] → {problem, action_taken, outcome, timestamp}
|
||||
└── ChangeLog[] → {resource, what_changed, when, detected_impact}
|
||||
```
|
||||
|
||||
This captures **what happened and how it was handled**.
|
||||
|
||||
---
|
||||
|
||||
## The AI Context Pipeline
|
||||
|
||||
When the AI needs context (for chat, patrol, or alert analysis), we build it in layers:
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────┐
|
||||
│ CONTEXT ASSEMBLY │
|
||||
├─────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ 1. CURRENT STATE (required) │
|
||||
│ - Real-time metrics for relevant resources │
|
||||
│ - Current alerts and their status │
|
||||
│ │
|
||||
│ 2. HISTORICAL CONTEXT (high value) │
|
||||
│ - Trends: "Memory has been growing 3%/day for 5 days" │
|
||||
│ - Baselines: "Normal CPU for this VM is 5-15%" │
|
||||
│ - Anomalies: "Current 45% is 3σ above normal" │
|
||||
│ │
|
||||
│ 3. OPERATIONAL CONTEXT (essential for continuity) │
|
||||
│ - Previous findings for this resource │
|
||||
│ - User notes: "This is the production database" │
|
||||
│ - Past remediations: "We increased RAM last month" │
|
||||
│ │
|
||||
│ 4. PREDICTIVE CONTEXT (proactive value) │
|
||||
│ - Forecasts: "At current rate, disk full in 12 days" │
|
||||
│ - Pattern alerts: "This usually fails after X" │
|
||||
│ - Correlations: "When A spikes, B usually follows" │
|
||||
│ │
|
||||
│ 5. USER CONTEXT (personalization) │
|
||||
│ - Infrastructure notes: "This is a homelab" │
|
||||
│ - Preferences: "I prefer conservative recommendations" │
|
||||
│ - Expertise level: "User is comfortable with CLI" │
|
||||
│ │
|
||||
└─────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Implementation Roadmap
|
||||
|
||||
### Phase 1: Historical Context Integration
|
||||
|
||||
**Goal**: Make the AI aware of trends and history, not just current state.
|
||||
|
||||
1. **Create `internal/ai/context/` package**
|
||||
- `historical.go` - Pull data from MetricsHistory
|
||||
- `trends.go` - Compute trend direction, rate of change
|
||||
- `formatter.go` - Format for AI consumption
|
||||
|
||||
2. **Trend Computation**
|
||||
- Simple linear regression for direction
|
||||
- Rate of change calculation
|
||||
- Stability classification (stable/growing/declining/volatile)
|
||||
|
||||
3. **Integrate into Patrol and Chat**
|
||||
- `buildEnrichedContext()` replaces `buildInfrastructureSummary()`
|
||||
- Include "Last 24h" and "Last 7d" summaries
|
||||
|
||||
**Example output:**
|
||||
```markdown
|
||||
## VM: webserver (node: minipc)
|
||||
Current: CPU=12%, Memory=67%, Disk=45%
|
||||
24h Trend: CPU stable (8-15%), Memory growing +1.2%/hr, Disk stable
|
||||
7d Trend: Memory +15% total (was 52% a week ago)
|
||||
Baseline: CPU normal=5-20%, Memory normal=45-60% (currently elevated)
|
||||
```
|
||||
|
||||
### Phase 2: Anomaly Detection
|
||||
|
||||
**Goal**: Automatically detect when something is "unusual" for this specific infrastructure.
|
||||
|
||||
1. **Baseline Learning**
|
||||
- Track rolling statistics per resource (mean, std dev, percentiles)
|
||||
- Time-of-day / day-of-week patterns
|
||||
- Persist baselines across restarts
|
||||
|
||||
2. **Anomaly Scoring**
|
||||
- Statistical deviation from baseline
|
||||
- Pattern breaks (e.g., usually low at night, now high)
|
||||
- Sudden changes vs. gradual drift
|
||||
|
||||
3. **Anomaly Context for AI**
|
||||
- "This is unusual" annotations
|
||||
- Confidence levels
|
||||
- Similar past anomalies and outcomes
|
||||
|
||||
**Example output:**
|
||||
```markdown
|
||||
⚠️ ANOMALY: VM 'database' memory at 89%
|
||||
- Baseline for this time: 45-55%
|
||||
- Current value is 4.2σ above normal
|
||||
- Similar anomaly 2 weeks ago led to OOM (resolved by restart)
|
||||
```
|
||||
|
||||
### Phase 3: Operational Memory
|
||||
|
||||
**Goal**: The AI remembers what happened and what worked.
|
||||
|
||||
1. **Remediation Logging**
|
||||
- When AI suggests/executes a fix, log it
|
||||
- Track outcome (did it work? for how long?)
|
||||
- Link to findings
|
||||
|
||||
2. **Change Detection**
|
||||
- Detect configuration changes (new VMs, resource changes)
|
||||
- Correlate changes with subsequent issues
|
||||
- "This problem started 2 days after you added GPU passthrough"
|
||||
|
||||
3. **Solution Database**
|
||||
- Index past problems and solutions
|
||||
- "We've seen this before: [link to past finding]"
|
||||
- "Last time, restarting the service fixed it"
|
||||
|
||||
**Example output:**
|
||||
```markdown
|
||||
## Historical Context for VM 'webserver'
|
||||
- Created: 6 months ago
|
||||
- Last modified: 2 weeks ago (RAM increased 4GB→8GB)
|
||||
- Past issues:
|
||||
- 2 weeks ago: High memory (resolved by RAM increase)
|
||||
- 1 month ago: Disk full (resolved by log rotation)
|
||||
- User note: "Runs production web app, critical 9-5"
|
||||
```
|
||||
|
||||
### Phase 4: Predictive Intelligence
|
||||
|
||||
**Goal**: Warn users before problems occur.
|
||||
|
||||
1. **Capacity Forecasting**
|
||||
- Extrapolate growth trends
|
||||
- "Storage will be full in X days at current rate"
|
||||
- Account for patterns (e.g., weekly backup spikes)
|
||||
|
||||
2. **Failure Prediction**
|
||||
- Resources that fail periodically (e.g., OOM every 2 weeks)
|
||||
- Predict next occurrence
|
||||
- "This container typically OOMs every ~10 days, last was 8 days ago"
|
||||
|
||||
3. **Correlation-Based Alerts**
|
||||
- "When VM A memory exceeds 80%, VM B usually crashes within 2 hours"
|
||||
- Learn these from historical data
|
||||
|
||||
**Example output:**
|
||||
```markdown
|
||||
## Predictions
|
||||
⏰ Storage 'local-zfs': Full in ~18 days at current growth rate
|
||||
⏰ Container 'logstash': Historically OOMs every 10-14 days (last: 9 days ago)
|
||||
⏰ Backup jobs: Growing 5% per week, will exceed window in ~6 weeks
|
||||
```
|
||||
|
||||
### Phase 5: Multi-Resource Correlation
|
||||
|
||||
**Goal**: Understand relationships between resources.
|
||||
|
||||
1. **Automatic Correlation Detection**
|
||||
- When A spikes, does B spike?
|
||||
- When A restarts, does B show errors?
|
||||
- Statistical correlation over time
|
||||
|
||||
2. **Dependency Mapping**
|
||||
- User-provided: "This VM depends on that NFS storage"
|
||||
- Inferred: "These 3 containers always restart together"
|
||||
|
||||
3. **Cascade Analysis**
|
||||
- "If node X goes down, these 5 critical VMs are affected"
|
||||
- "Storage Y failing would impact 12 backup jobs"
|
||||
|
||||
---
|
||||
|
||||
## AI Prompt Structure
|
||||
|
||||
With this architecture, a typical AI prompt would look like:
|
||||
|
||||
```markdown
|
||||
# Infrastructure Analysis Request
|
||||
|
||||
## Target Resource
|
||||
VM 'database' (ID: 102, Node: pve-main)
|
||||
|
||||
## Current State
|
||||
- Status: running
|
||||
- CPU: 78% (normal: 15-30%)
|
||||
- Memory: 92% (normal: 60-75%)
|
||||
- Disk: 67% (stable)
|
||||
- Uptime: 45 days
|
||||
|
||||
## Historical Context (7 days)
|
||||
- Memory: Growing +2.1%/day (was 77% 7 days ago)
|
||||
- CPU: Elevated since 3 days ago (was 20%)
|
||||
- Pattern: No daily cycles detected, continuous growth
|
||||
|
||||
## Anomaly Score: HIGH
|
||||
- Memory 2.8σ above baseline
|
||||
- CPU 3.1σ above baseline
|
||||
- Combined anomaly score: 87/100
|
||||
|
||||
## Operational History
|
||||
- Last issue: 3 months ago, high memory (user added swap, resolved)
|
||||
- User notes: "Production PostgreSQL, critical, no downtime allowed"
|
||||
- Related resources: Depends on storage 'ceph-ssd', accessed by VMs 105, 107, 112
|
||||
|
||||
## Recent Changes
|
||||
- 4 days ago: VM 105 ('app-server') was updated
|
||||
- 3 days ago: This VM's CPU started increasing
|
||||
|
||||
## Predictions
|
||||
- At current rate, memory will hit 100% in ~4 days
|
||||
- Similar pattern to last incident (high memory leading to OOM)
|
||||
|
||||
## User Question
|
||||
"Why is my database server slow?"
|
||||
```
|
||||
|
||||
**This context is impossible to replicate with a stateless SSH session.**
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
How do we know Pulse AI is providing value?
|
||||
|
||||
1. **Predictive Accuracy**
|
||||
- Did our capacity forecasts come true?
|
||||
- Did predicted failures occur?
|
||||
|
||||
2. **Time to Resolution**
|
||||
- How long from problem detection to resolution?
|
||||
- Compare AI-assisted vs. manual
|
||||
|
||||
3. **Proactive Catches**
|
||||
- Problems found by patrol before user noticed
|
||||
- Predictions that led to preventive action
|
||||
|
||||
4. **User Engagement**
|
||||
- Are users adding notes? (means they trust the system)
|
||||
- Are they dismissing findings with reasons? (feedback loop)
|
||||
- Repeat usage of chat feature
|
||||
|
||||
5. **Context Utilization**
|
||||
- Is the AI using historical context in responses?
|
||||
- Are predictions being cited in findings?
|
||||
|
||||
---
|
||||
|
||||
## Technical Considerations
|
||||
|
||||
### Data Retention
|
||||
- Short-term (24h): High-resolution metrics for immediate analysis
|
||||
- Medium-term (7-30d): Hourly aggregates for trend analysis
|
||||
- Long-term (90d+): Daily summaries for baseline/pattern learning
|
||||
|
||||
### Performance
|
||||
- Context building must be fast (<100ms)
|
||||
- Precompute expensive analytics (trends, baselines) on schedule
|
||||
- Cache formatted context, invalidate on significant changes
|
||||
|
||||
### Storage
|
||||
- Baselines and insights are small, store in SQLite or JSON
|
||||
- Historical metrics can grow; implement rollup/aggregation
|
||||
- Consider time-series database for scale (InfluxDB, TimescaleDB)
|
||||
|
||||
### Privacy
|
||||
- All data stays local (no cloud sync of infrastructure data)
|
||||
- AI context is built locally, only prompts go to API
|
||||
- User controls what context is included
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
The path to differentiating Pulse AI:
|
||||
|
||||
| Today | Tomorrow |
|
||||
|-------|----------|
|
||||
| "Here's your current state" | "Here's what's changed and why it matters" |
|
||||
| "This metric is high" | "This is unusual for YOUR infrastructure" |
|
||||
| "You should check X" | "Last time this happened, you did Y and it worked" |
|
||||
| "Something might be wrong" | "X will fail in 5 days if this continues" |
|
||||
| Stateless queries | Accumulated operational intelligence |
|
||||
|
||||
**The AI becomes more valuable the longer Pulse runs.** This is the moat.
|
||||
709
.agent/docs/PULSE_AI_IMPLEMENTATION_PLAN.md
Normal file
709
.agent/docs/PULSE_AI_IMPLEMENTATION_PLAN.md
Normal file
|
|
@ -0,0 +1,709 @@
|
|||
# Pulse AI Implementation Plan
|
||||
|
||||
This document outlines the concrete implementation steps to realize the Pulse AI vision.
|
||||
|
||||
---
|
||||
|
||||
## Current State Audit
|
||||
|
||||
### What We Have
|
||||
|
||||
| Component | Location | Status |
|
||||
|-----------|----------|--------|
|
||||
| Real-time state | `models.StateSnapshot` | ✅ Complete |
|
||||
| Metrics collection | `monitoring.MetricsHistory` | ✅ Collecting, exposed to AI |
|
||||
| Finding persistence | `ai.FindingsStore` | ✅ Works |
|
||||
| Knowledge store | `ai/knowledge.Store` | ✅ Per-guest notes |
|
||||
| Alert context | `ai.buildAlertContext()` | ✅ Current alerts only |
|
||||
| User annotations | `buildUserAnnotationsContext()` | ✅ Basic |
|
||||
| Base patrol | `patrol.go` | ✅ Heuristics + optional AI |
|
||||
| **AI Context package** | `ai/context/` | ✅ **NEW - Phase 1** |
|
||||
| **Trend computation** | `ai/context/trends.go` | ✅ **NEW - Linear regression** |
|
||||
| **Context builder** | `ai/context/builder.go` | ✅ **NEW - Orchestration** |
|
||||
| **Metrics adapter** | `ai/metrics_history_adapter.go` | ✅ **NEW - Wiring** |
|
||||
|
||||
### What's Missing
|
||||
|
||||
| Component | Impact | Priority | Status |
|
||||
|-----------|--------|----------|--------|
|
||||
| Historical context for AI | Core differentiator | P0 | ✅ Done |
|
||||
| Trend computation | Predictive capability | P0 | ✅ Done |
|
||||
| Baseline learning | Anomaly detection | P1 | 🔲 Next |
|
||||
| Change detection | Root cause analysis | P1 | 🔲 Planned |
|
||||
| Remediation logging | Operational memory | P2 | 🔲 Planned |
|
||||
| Correlation engine | Advanced insights | P2 | 🔲 Future |
|
||||
| Capacity forecasting | Proactive alerts | P1 | ⚡ Partial (storage predictions) |
|
||||
|
||||
---
|
||||
|
||||
## Phase 1: Foundation - AI Context Package
|
||||
|
||||
**Goal**: Create a clean abstraction for building AI context with historical data.
|
||||
|
||||
### 1.1 New Package Structure
|
||||
|
||||
```
|
||||
internal/ai/context/
|
||||
├── builder.go # Main context builder orchestrator
|
||||
├── current.go # Current state formatting (refactor from patrol)
|
||||
├── historical.go # Historical metrics integration
|
||||
├── trends.go # Trend computation
|
||||
├── insights.go # Combined insights (anomalies, predictions)
|
||||
├── formatter.go # AI-friendly text formatting
|
||||
└── types.go # Shared types
|
||||
```
|
||||
|
||||
### 1.2 Core Types
|
||||
|
||||
```go
|
||||
// types.go
|
||||
|
||||
// ResourceContext contains all context for a single resource
|
||||
type ResourceContext struct {
|
||||
ResourceID string
|
||||
ResourceType string // "node", "vm", "container", "storage", "docker_host"
|
||||
ResourceName string
|
||||
|
||||
// Current state
|
||||
Current CurrentState
|
||||
|
||||
// Historical analysis
|
||||
Trends map[string]Trend // metric -> trend
|
||||
Baselines map[string]Baseline // metric -> baseline
|
||||
Anomalies []Anomaly
|
||||
|
||||
// Operational memory
|
||||
PastFindings []FindingSummary
|
||||
UserNotes []string
|
||||
RecentChanges []Change
|
||||
LastRemediation *RemediationRecord
|
||||
}
|
||||
|
||||
// Trend represents the direction and rate of change for a metric
|
||||
type Trend struct {
|
||||
Metric string
|
||||
Direction TrendDirection // stable, growing, declining, volatile
|
||||
RatePerHour float64 // rate of change per hour
|
||||
RatePerDay float64 // rate of change per day
|
||||
Current float64
|
||||
Average24h float64
|
||||
Average7d float64
|
||||
Min24h float64
|
||||
Max24h float64
|
||||
DataPoints int // how much history we have
|
||||
Confidence float64 // 0-1, based on data quality
|
||||
}
|
||||
|
||||
type TrendDirection string
|
||||
const (
|
||||
TrendStable TrendDirection = "stable"
|
||||
TrendGrowing TrendDirection = "growing"
|
||||
TrendDeclining TrendDirection = "declining"
|
||||
TrendVolatile TrendDirection = "volatile"
|
||||
)
|
||||
|
||||
// Baseline represents learned "normal" for a metric
|
||||
type Baseline struct {
|
||||
Metric string
|
||||
Mean float64
|
||||
StdDev float64
|
||||
P5 float64 // 5th percentile
|
||||
P95 float64 // 95th percentile
|
||||
SampleSize int
|
||||
LearnedAt time.Time
|
||||
}
|
||||
|
||||
// Anomaly represents a detected deviation from normal
|
||||
type Anomaly struct {
|
||||
Metric string
|
||||
Current float64
|
||||
Expected float64 // baseline mean
|
||||
Deviation float64 // standard deviations from mean
|
||||
Severity string // "low", "medium", "high", "critical"
|
||||
Since time.Time
|
||||
Description string
|
||||
}
|
||||
|
||||
// Prediction represents a forecasted event
|
||||
type Prediction struct {
|
||||
ResourceID string
|
||||
Metric string
|
||||
Event string // "capacity_full", "oom", "pattern_repeat"
|
||||
ETA time.Time
|
||||
Confidence float64
|
||||
Basis string // explanation of prediction
|
||||
}
|
||||
```
|
||||
|
||||
### 1.3 Context Builder
|
||||
|
||||
```go
|
||||
// builder.go
|
||||
|
||||
type ContextBuilder struct {
|
||||
stateProvider StateProvider
|
||||
metricsHistory *monitoring.MetricsHistory
|
||||
findingsStore *FindingsStore
|
||||
knowledgeStore *knowledge.Store
|
||||
baselineStore *BaselineStore
|
||||
|
||||
// Configuration
|
||||
includeTrends bool
|
||||
includeBaselines bool
|
||||
includeHistory bool
|
||||
historicalWindow time.Duration
|
||||
}
|
||||
|
||||
// BuildForResource creates comprehensive context for a single resource
|
||||
func (b *ContextBuilder) BuildForResource(resourceID string) (*ResourceContext, error)
|
||||
|
||||
// BuildForInfrastructure creates summarized context for all infrastructure
|
||||
func (b *ContextBuilder) BuildForInfrastructure() (*InfrastructureContext, error)
|
||||
|
||||
// FormatForAI converts context to AI-consumable markdown
|
||||
func (b *ContextBuilder) FormatForAI(ctx *ResourceContext) string
|
||||
|
||||
// FormatInfrastructureForAI converts full infrastructure context
|
||||
func (b *ContextBuilder) FormatInfrastructureForAI(ctx *InfrastructureContext) string
|
||||
```
|
||||
|
||||
### 1.4 Trend Computation
|
||||
|
||||
```go
|
||||
// trends.go
|
||||
|
||||
// ComputeTrend calculates trend from historical data points
|
||||
func ComputeTrend(points []monitoring.MetricPoint, window time.Duration) Trend {
|
||||
if len(points) < 2 {
|
||||
return Trend{Confidence: 0}
|
||||
}
|
||||
|
||||
// Calculate basic statistics
|
||||
avg, min, max, stddev := computeStats(points)
|
||||
|
||||
// Linear regression for direction and rate
|
||||
slope, r2 := linearRegression(points)
|
||||
|
||||
// Classify direction
|
||||
direction := classifyTrend(slope, stddev, avg)
|
||||
|
||||
// Rate per hour/day
|
||||
ratePerHour := slope * 3600 // slope is per second
|
||||
ratePerDay := ratePerHour * 24
|
||||
|
||||
return Trend{
|
||||
Direction: direction,
|
||||
RatePerHour: ratePerHour,
|
||||
RatePerDay: ratePerDay,
|
||||
Current: points[len(points)-1].Value,
|
||||
Average24h: avg,
|
||||
Min24h: min,
|
||||
Max24h: max,
|
||||
DataPoints: len(points),
|
||||
Confidence: r2,
|
||||
}
|
||||
}
|
||||
|
||||
func classifyTrend(slope, stddev, avg float64) TrendDirection {
|
||||
// Normalize slope relative to value magnitude
|
||||
if avg == 0 {
|
||||
avg = 1 // avoid division by zero
|
||||
}
|
||||
normalizedSlope := (slope * 3600) / avg // hourly change as fraction of avg
|
||||
|
||||
// Threshold based on volatility
|
||||
threshold := 0.01 // 1% per hour is significant
|
||||
|
||||
if stddev/avg > 0.2 {
|
||||
return TrendVolatile
|
||||
}
|
||||
if normalizedSlope > threshold {
|
||||
return TrendGrowing
|
||||
}
|
||||
if normalizedSlope < -threshold {
|
||||
return TrendDeclining
|
||||
}
|
||||
return TrendStable
|
||||
}
|
||||
```
|
||||
|
||||
### 1.5 Integration with Existing Code
|
||||
|
||||
```go
|
||||
// In patrol.go, replace buildInfrastructureSummary:
|
||||
|
||||
func (p *PatrolService) buildEnrichedContext(state models.StateSnapshot) string {
|
||||
builder := context.NewBuilder(
|
||||
p.stateProvider,
|
||||
p.metricsHistory,
|
||||
p.findings,
|
||||
p.knowledgeStore,
|
||||
p.baselineStore,
|
||||
)
|
||||
|
||||
infraCtx, err := builder.BuildForInfrastructure()
|
||||
if err != nil {
|
||||
log.Warn().Err(err).Msg("Failed to build enriched context, falling back")
|
||||
return p.buildBasicSummary(state)
|
||||
}
|
||||
|
||||
return builder.FormatInfrastructureForAI(infraCtx)
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 2: Baseline Learning
|
||||
|
||||
**Goal**: Learn what "normal" looks like for each resource so we can detect anomalies.
|
||||
|
||||
### 2.1 Baseline Store
|
||||
|
||||
```go
|
||||
// internal/ai/baseline/store.go
|
||||
|
||||
type Store struct {
|
||||
mu sync.RWMutex
|
||||
baselines map[string]*ResourceBaseline // resourceID -> baselines
|
||||
|
||||
persistence Persistence
|
||||
|
||||
// Configuration
|
||||
learningWindow time.Duration // how far back to learn from (default: 7 days)
|
||||
minSamples int // minimum samples needed (default: 100)
|
||||
updateInterval time.Duration // how often to recompute (default: 1 hour)
|
||||
}
|
||||
|
||||
type ResourceBaseline struct {
|
||||
ResourceID string
|
||||
LastUpdated time.Time
|
||||
|
||||
Metrics map[string]*MetricBaseline // metric name -> baseline
|
||||
}
|
||||
|
||||
type MetricBaseline struct {
|
||||
Mean float64
|
||||
StdDev float64
|
||||
Percentiles map[int]float64 // 5, 25, 50, 75, 95
|
||||
SampleCount int
|
||||
|
||||
// Time-of-day patterns (optional, phase 2+)
|
||||
HourlyMeans [24]float64
|
||||
}
|
||||
|
||||
// Learn computes baselines from historical data
|
||||
func (s *Store) Learn(resourceID string, history *monitoring.MetricsHistory) error
|
||||
|
||||
// GetBaseline returns the baseline for a resource/metric
|
||||
func (s *Store) GetBaseline(resourceID, metric string) (*MetricBaseline, bool)
|
||||
|
||||
// IsAnomaly checks if a value is anomalous given the baseline
|
||||
func (s *Store) IsAnomaly(resourceID, metric string, value float64) (bool, float64)
|
||||
```
|
||||
|
||||
### 2.2 Background Learning Loop
|
||||
|
||||
```go
|
||||
// Run as part of patrol service or separate goroutine
|
||||
|
||||
func (s *Store) StartLearningLoop(ctx context.Context, interval time.Duration) {
|
||||
ticker := time.NewTicker(interval)
|
||||
defer ticker.Stop()
|
||||
|
||||
for {
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return
|
||||
case <-ticker.C:
|
||||
s.updateAllBaselines()
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (s *Store) updateAllBaselines() {
|
||||
// Get list of all resources with metrics
|
||||
resources := s.metricsHistory.GetResourceIDs()
|
||||
|
||||
for _, resourceID := range resources {
|
||||
if err := s.Learn(resourceID, s.metricsHistory); err != nil {
|
||||
log.Warn().Err(err).Str("resource", resourceID).Msg("Failed to update baseline")
|
||||
}
|
||||
}
|
||||
|
||||
// Persist updated baselines
|
||||
s.save()
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3 Anomaly Detection
|
||||
|
||||
```go
|
||||
// internal/ai/anomaly/detector.go
|
||||
|
||||
type Detector struct {
|
||||
baselineStore *baseline.Store
|
||||
|
||||
// Thresholds
|
||||
warningThreshold float64 // default: 2.0 std devs
|
||||
criticalThreshold float64 // default: 3.0 std devs
|
||||
}
|
||||
|
||||
type Detection struct {
|
||||
ResourceID string
|
||||
Metric string
|
||||
CurrentValue float64
|
||||
ExpectedMean float64
|
||||
StdDev float64
|
||||
ZScore float64
|
||||
Severity AnomalySeverity
|
||||
DetectedAt time.Time
|
||||
}
|
||||
|
||||
func (d *Detector) Check(resourceID, metric string, value float64) *Detection {
|
||||
baseline, ok := d.baselineStore.GetBaseline(resourceID, metric)
|
||||
if !ok || baseline.SampleCount < 50 {
|
||||
return nil // not enough data yet
|
||||
}
|
||||
|
||||
zScore := (value - baseline.Mean) / baseline.StdDev
|
||||
absZ := math.Abs(zScore)
|
||||
|
||||
if absZ < d.warningThreshold {
|
||||
return nil // within normal range
|
||||
}
|
||||
|
||||
severity := AnomalyWarning
|
||||
if absZ >= d.criticalThreshold {
|
||||
severity = AnomalyCritical
|
||||
}
|
||||
|
||||
return &Detection{
|
||||
ResourceID: resourceID,
|
||||
Metric: metric,
|
||||
CurrentValue: value,
|
||||
ExpectedMean: baseline.Mean,
|
||||
StdDev: baseline.StdDev,
|
||||
ZScore: zScore,
|
||||
Severity: severity,
|
||||
DetectedAt: time.Now(),
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 3: Operational Memory
|
||||
|
||||
**Goal**: Remember what happened, what users said, and what worked.
|
||||
|
||||
### 3.1 Change Detection
|
||||
|
||||
```go
|
||||
// internal/ai/memory/changes.go
|
||||
|
||||
type ChangeDetector struct {
|
||||
previousState map[string]ResourceSnapshot
|
||||
mu sync.RWMutex
|
||||
|
||||
changes []Change
|
||||
maxChanges int
|
||||
persistence Persistence
|
||||
}
|
||||
|
||||
type Change struct {
|
||||
ID string
|
||||
ResourceID string
|
||||
ChangeType ChangeType
|
||||
Before interface{}
|
||||
After interface{}
|
||||
DetectedAt time.Time
|
||||
Description string
|
||||
}
|
||||
|
||||
type ChangeType string
|
||||
const (
|
||||
ChangeCreated ChangeType = "created"
|
||||
ChangeDeleted ChangeType = "deleted"
|
||||
ChangeConfig ChangeType = "config" // RAM, CPU allocation changed
|
||||
ChangeStatus ChangeType = "status" // started, stopped
|
||||
ChangeMigrated ChangeType = "migrated" // moved to different node
|
||||
)
|
||||
|
||||
func (d *ChangeDetector) Detect(current models.StateSnapshot) []Change {
|
||||
// Compare current state to previous
|
||||
// Detect new resources, deleted resources, config changes
|
||||
// Store changes and return new ones
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Remediation Logging
|
||||
|
||||
```go
|
||||
// internal/ai/memory/remediation.go
|
||||
|
||||
type RemediationLog struct {
|
||||
mu sync.RWMutex
|
||||
records []RemediationRecord
|
||||
|
||||
persistence Persistence
|
||||
}
|
||||
|
||||
type RemediationRecord struct {
|
||||
ID string
|
||||
Timestamp time.Time
|
||||
ResourceID string
|
||||
FindingID string // linked AI finding if any
|
||||
Problem string // what was wrong
|
||||
Action string // what was done
|
||||
Outcome Outcome // did it work?
|
||||
Duration time.Duration // how long until resolved
|
||||
Note string // optional user/AI note
|
||||
}
|
||||
|
||||
type Outcome string
|
||||
const (
|
||||
OutcomeResolved Outcome = "resolved"
|
||||
OutcomePartial Outcome = "partial"
|
||||
OutcomeFailed Outcome = "failed"
|
||||
OutcomeUnknown Outcome = "unknown"
|
||||
)
|
||||
|
||||
// Log records a remediation action
|
||||
func (r *RemediationLog) Log(record RemediationRecord) error
|
||||
|
||||
// GetForResource returns remediation history for a resource
|
||||
func (r *RemediationLog) GetForResource(resourceID string, limit int) []RemediationRecord
|
||||
|
||||
// GetSimilar finds similar past remediations
|
||||
func (r *RemediationLog) GetSimilar(problem string, limit int) []RemediationRecord
|
||||
```
|
||||
|
||||
### 3.3 Integration Points
|
||||
|
||||
When the AI executes a command:
|
||||
```go
|
||||
func (s *Service) onToolComplete(toolID, command, output string, success bool) {
|
||||
// Log the remediation attempt
|
||||
s.remediationLog.Log(RemediationRecord{
|
||||
ID: uuid.New().String(),
|
||||
Timestamp: time.Now(),
|
||||
ResourceID: s.currentContext.TargetID,
|
||||
FindingID: s.currentContext.FindingID,
|
||||
Problem: s.currentContext.Problem,
|
||||
Action: command,
|
||||
Outcome: outcomeFromSuccess(success),
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
When a finding is resolved:
|
||||
```go
|
||||
func (s *FindingsStore) Resolve(findingID string, auto bool) bool {
|
||||
// Link to any remediation actions
|
||||
// Record what was done
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Phase 4: Capacity Forecasting
|
||||
|
||||
**Goal**: Predict when resources will run out.
|
||||
|
||||
### 4.1 Forecaster
|
||||
|
||||
```go
|
||||
// internal/ai/forecast/capacity.go
|
||||
|
||||
type CapacityForecaster struct {
|
||||
metricsHistory *monitoring.MetricsHistory
|
||||
minDataPoints int // minimum points needed for forecast
|
||||
}
|
||||
|
||||
type CapacityForecast struct {
|
||||
ResourceID string
|
||||
Metric string
|
||||
CurrentUsage float64
|
||||
Limit float64
|
||||
|
||||
GrowthRate float64 // per day
|
||||
ETA time.Time // when it hits limit
|
||||
DaysLeft float64
|
||||
Confidence float64 // 0-1
|
||||
|
||||
// Projection points for visualization
|
||||
Projection []ProjectionPoint
|
||||
}
|
||||
|
||||
func (f *CapacityForecaster) Forecast(resourceID, metric string, limit float64) (*CapacityForecast, error) {
|
||||
points := f.metricsHistory.GetMetrics(resourceID, metric, 7*24*time.Hour)
|
||||
if len(points) < f.minDataPoints {
|
||||
return nil, ErrInsufficientData
|
||||
}
|
||||
|
||||
// Linear regression for growth rate
|
||||
slope, r2 := linearRegression(points)
|
||||
if slope <= 0 {
|
||||
return nil, nil // not growing
|
||||
}
|
||||
|
||||
current := points[len(points)-1].Value
|
||||
remaining := limit - current
|
||||
hoursUntilFull := remaining / (slope * 3600)
|
||||
|
||||
if hoursUntilFull <= 0 {
|
||||
return nil, nil // already at limit
|
||||
}
|
||||
|
||||
eta := time.Now().Add(time.Duration(hoursUntilFull) * time.Hour)
|
||||
|
||||
return &CapacityForecast{
|
||||
ResourceID: resourceID,
|
||||
Metric: metric,
|
||||
CurrentUsage: current,
|
||||
Limit: limit,
|
||||
GrowthRate: slope * 86400, // per day
|
||||
ETA: eta,
|
||||
DaysLeft: hoursUntilFull / 24,
|
||||
Confidence: r2,
|
||||
}, nil
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 Integration with Patrol
|
||||
|
||||
```go
|
||||
func (p *PatrolService) generateForecasts(state models.StateSnapshot) []Prediction {
|
||||
var predictions []Prediction
|
||||
|
||||
// Forecast storage capacity
|
||||
for _, storage := range state.Storage {
|
||||
if storage.Total == 0 {
|
||||
continue
|
||||
}
|
||||
forecast, err := p.forecaster.Forecast(storage.ID, "used", float64(storage.Total))
|
||||
if err != nil || forecast == nil {
|
||||
continue
|
||||
}
|
||||
|
||||
if forecast.DaysLeft < 30 && forecast.Confidence > 0.5 {
|
||||
predictions = append(predictions, Prediction{
|
||||
ResourceID: storage.ID,
|
||||
Metric: "storage_capacity",
|
||||
Event: "capacity_full",
|
||||
ETA: forecast.ETA,
|
||||
Confidence: forecast.Confidence,
|
||||
Basis: fmt.Sprintf("Growing %.1f GB/day", forecast.GrowthRate/1e9),
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
// Forecast VM memory (could predict OOM)
|
||||
// Forecast backup storage growth
|
||||
// etc.
|
||||
|
||||
return predictions
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## File System Layout (Final)
|
||||
|
||||
```
|
||||
internal/ai/
|
||||
├── context/
|
||||
│ ├── builder.go # Main orchestrator
|
||||
│ ├── current.go # Current state extraction
|
||||
│ ├── historical.go # Historical data integration
|
||||
│ ├── trends.go # Trend computation
|
||||
│ ├── formatter.go # AI-friendly formatting
|
||||
│ └── types.go # Shared types
|
||||
├── baseline/
|
||||
│ ├── store.go # Baseline storage and learning
|
||||
│ ├── persistence.go # Disk persistence
|
||||
│ └── learning.go # Statistical learning
|
||||
├── anomaly/
|
||||
│ ├── detector.go # Anomaly detection
|
||||
│ └── types.go
|
||||
├── forecast/
|
||||
│ ├── capacity.go # Capacity forecasting
|
||||
│ └── patterns.go # Pattern-based prediction
|
||||
├── memory/
|
||||
│ ├── changes.go # Change detection
|
||||
│ ├── remediation.go # Remediation logging
|
||||
│ └── persistence.go
|
||||
├── knowledge/ # (existing)
|
||||
│ ├── store.go
|
||||
│ └── store_test.go
|
||||
├── providers/ # (existing)
|
||||
├── findings.go # (existing)
|
||||
├── patrol.go # (existing, will use new context/)
|
||||
├── service.go # (existing, will use new context/)
|
||||
└── routing.go # (existing)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Migration Strategy
|
||||
|
||||
### Step 1: Add without changing
|
||||
|
||||
Create new packages (`context/`, `baseline/`, etc.) that work alongside existing code. Don't break anything.
|
||||
|
||||
### Step 2: Wire up to MetricsHistory
|
||||
|
||||
Pass `*monitoring.MetricsHistory` to the AI service at startup. Required for historical context.
|
||||
|
||||
### Step 3: Switch patrol to enriched context
|
||||
|
||||
Replace `buildInfrastructureSummary` with `buildEnrichedContext` behind a feature flag.
|
||||
|
||||
### Step 4: Add baseline learning
|
||||
|
||||
Start computing baselines in background. Initially just store, don't act.
|
||||
|
||||
### Step 5: Enable anomaly annotations
|
||||
|
||||
Add anomaly context to AI prompts. Let AI mention anomalies in findings.
|
||||
|
||||
### Step 6: Add forecasts
|
||||
|
||||
Enable capacity forecasting. Create new finding types for predicted issues.
|
||||
|
||||
### Step 7: Phase out old code
|
||||
|
||||
Remove deprecated methods once new system is stable.
|
||||
|
||||
---
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
1. **Unit tests** for trend computation, baseline learning, anomaly detection
|
||||
2. **Integration tests** with mock metrics history
|
||||
3. **Golden file tests** for AI context formatting (ensure consistent output)
|
||||
4. **Baseline learning tests** with synthetic time-series data
|
||||
5. **Forecast accuracy tests** with historical data validation
|
||||
|
||||
---
|
||||
|
||||
## Success Criteria
|
||||
|
||||
Phase 1 complete when:
|
||||
- AI prompts include historical trends for all resources
|
||||
- "24h trend" visible in patrol output
|
||||
|
||||
Phase 2 complete when:
|
||||
- Baselines computed automatically
|
||||
- Anomalies flagged in AI context
|
||||
- "X is unusual" appearing in findings
|
||||
|
||||
Phase 3 complete when:
|
||||
- Changes detected and logged
|
||||
- Remediation history queryable
|
||||
- "Last time this happened..." in AI responses
|
||||
|
||||
Phase 4 complete when:
|
||||
- Capacity forecasts generated
|
||||
- "Full in X days" predictions accurate
|
||||
- Predictive findings created before issues occur
|
||||
410
internal/ai/context/builder.go
Normal file
410
internal/ai/context/builder.go
Normal file
|
|
@ -0,0 +1,410 @@
|
|||
package context
|
||||
|
||||
import (
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/models"
|
||||
"github.com/rs/zerolog/log"
|
||||
)
|
||||
|
||||
// MetricsHistoryProvider is the interface for accessing historical metrics
|
||||
// This avoids importing the monitoring package directly
|
||||
type MetricsHistoryProvider interface {
|
||||
GetNodeMetrics(nodeID string, metricType string, duration time.Duration) []MetricPoint
|
||||
GetGuestMetrics(guestID string, metricType string, duration time.Duration) []MetricPoint
|
||||
GetAllGuestMetrics(guestID string, duration time.Duration) map[string][]MetricPoint
|
||||
GetAllStorageMetrics(storageID string, duration time.Duration) map[string][]MetricPoint
|
||||
}
|
||||
|
||||
// KnowledgeProvider provides user annotations and notes
|
||||
type KnowledgeProvider interface {
|
||||
GetNotes(guestID string) []string
|
||||
FormatAllForContext() string
|
||||
}
|
||||
|
||||
// FindingsProvider provides past findings for operational memory
|
||||
type FindingsProvider interface {
|
||||
GetDismissedForContext() string
|
||||
GetPastFindingsForResource(resourceID string) []string
|
||||
}
|
||||
|
||||
// Builder constructs enriched AI context from multiple data sources
|
||||
type Builder struct {
|
||||
// Data sources
|
||||
metricsHistory MetricsHistoryProvider
|
||||
knowledge KnowledgeProvider
|
||||
findings FindingsProvider
|
||||
|
||||
// Configuration
|
||||
trendWindow24h time.Duration
|
||||
trendWindow7d time.Duration
|
||||
includeHistory bool
|
||||
includeTrends bool
|
||||
includeBaseline bool
|
||||
}
|
||||
|
||||
// NewBuilder creates a new context builder
|
||||
func NewBuilder() *Builder {
|
||||
return &Builder{
|
||||
trendWindow24h: 24 * time.Hour,
|
||||
trendWindow7d: 7 * 24 * time.Hour,
|
||||
includeHistory: true,
|
||||
includeTrends: true,
|
||||
includeBaseline: false, // Disabled until baseline store is implemented
|
||||
}
|
||||
}
|
||||
|
||||
// WithMetricsHistory sets the metrics history provider
|
||||
func (b *Builder) WithMetricsHistory(mh MetricsHistoryProvider) *Builder {
|
||||
b.metricsHistory = mh
|
||||
return b
|
||||
}
|
||||
|
||||
// WithKnowledge sets the knowledge provider for user notes
|
||||
func (b *Builder) WithKnowledge(k KnowledgeProvider) *Builder {
|
||||
b.knowledge = k
|
||||
return b
|
||||
}
|
||||
|
||||
// WithFindings sets the findings provider for operational memory
|
||||
func (b *Builder) WithFindings(f FindingsProvider) *Builder {
|
||||
b.findings = f
|
||||
return b
|
||||
}
|
||||
|
||||
// BuildForInfrastructure creates comprehensive context for the entire infrastructure
|
||||
func (b *Builder) BuildForInfrastructure(state models.StateSnapshot) *InfrastructureContext {
|
||||
ctx := &InfrastructureContext{
|
||||
GeneratedAt: time.Now(),
|
||||
}
|
||||
|
||||
// Process nodes
|
||||
for _, node := range state.Nodes {
|
||||
trends := b.computeNodeTrends(node.ID)
|
||||
resourceCtx := FormatNodeForContext(node, trends)
|
||||
b.enrichWithNotes(&resourceCtx)
|
||||
ctx.Nodes = append(ctx.Nodes, resourceCtx)
|
||||
}
|
||||
|
||||
// Process VMs
|
||||
for _, vm := range state.VMs {
|
||||
if vm.Template {
|
||||
continue
|
||||
}
|
||||
trends := b.computeGuestTrends(vm.ID)
|
||||
resourceCtx := FormatGuestForContext(
|
||||
vm.ID, vm.Name, vm.Node, "vm", vm.Status,
|
||||
vm.CPU, vm.Memory.Usage, vm.Disk.Usage,
|
||||
vm.Uptime, vm.LastBackup, trends,
|
||||
)
|
||||
b.enrichWithNotes(&resourceCtx)
|
||||
ctx.VMs = append(ctx.VMs, resourceCtx)
|
||||
}
|
||||
|
||||
// Process containers
|
||||
for _, ct := range state.Containers {
|
||||
if ct.Template {
|
||||
continue
|
||||
}
|
||||
trends := b.computeGuestTrends(ct.ID)
|
||||
resourceCtx := FormatGuestForContext(
|
||||
ct.ID, ct.Name, ct.Node, "container", ct.Status,
|
||||
ct.CPU, ct.Memory.Usage, ct.Disk.Usage,
|
||||
ct.Uptime, ct.LastBackup, trends,
|
||||
)
|
||||
b.enrichWithNotes(&resourceCtx)
|
||||
ctx.Containers = append(ctx.Containers, resourceCtx)
|
||||
}
|
||||
|
||||
// Process storage
|
||||
for _, storage := range state.Storage {
|
||||
trends := b.computeStorageTrends(storage.ID)
|
||||
resourceCtx := FormatStorageForContext(storage, trends)
|
||||
|
||||
// Add capacity predictions for storage
|
||||
if predictions := b.computeStoragePredictions(storage, trends); len(predictions) > 0 {
|
||||
resourceCtx.Predictions = predictions
|
||||
ctx.Predictions = append(ctx.Predictions, predictions...)
|
||||
}
|
||||
|
||||
ctx.Storage = append(ctx.Storage, resourceCtx)
|
||||
}
|
||||
|
||||
// Process Docker hosts
|
||||
for _, dh := range state.DockerHosts {
|
||||
resourceCtx := b.buildDockerHostContext(dh)
|
||||
ctx.DockerHosts = append(ctx.DockerHosts, resourceCtx)
|
||||
}
|
||||
|
||||
// Process agent hosts
|
||||
for _, host := range state.Hosts {
|
||||
resourceCtx := b.buildHostContext(host)
|
||||
ctx.Hosts = append(ctx.Hosts, resourceCtx)
|
||||
}
|
||||
|
||||
// Calculate totals
|
||||
ctx.TotalResources = len(ctx.Nodes) + len(ctx.VMs) + len(ctx.Containers) +
|
||||
len(ctx.Storage) + len(ctx.DockerHosts) + len(ctx.Hosts)
|
||||
|
||||
log.Debug().
|
||||
Int("nodes", len(ctx.Nodes)).
|
||||
Int("vms", len(ctx.VMs)).
|
||||
Int("containers", len(ctx.Containers)).
|
||||
Int("storage", len(ctx.Storage)).
|
||||
Int("predictions", len(ctx.Predictions)).
|
||||
Msg("Built enriched infrastructure context")
|
||||
|
||||
return ctx
|
||||
}
|
||||
|
||||
// computeNodeTrends computes trends for a node's metrics
|
||||
func (b *Builder) computeNodeTrends(nodeID string) map[string]Trend {
|
||||
trends := make(map[string]Trend)
|
||||
|
||||
if b.metricsHistory == nil || !b.includeTrends {
|
||||
return trends
|
||||
}
|
||||
|
||||
// Compute 24h trends for key metrics
|
||||
for _, metric := range []string{"cpu", "memory"} {
|
||||
points := b.metricsHistory.GetNodeMetrics(nodeID, metric, b.trendWindow24h)
|
||||
if len(points) >= 3 {
|
||||
trend := ComputeTrend(points, metric, b.trendWindow24h)
|
||||
trends[metric+"_24h"] = trend
|
||||
}
|
||||
}
|
||||
|
||||
// Also compute 7d trends for capacity planning
|
||||
for _, metric := range []string{"cpu", "memory"} {
|
||||
points := b.metricsHistory.GetNodeMetrics(nodeID, metric, b.trendWindow7d)
|
||||
if len(points) >= 10 {
|
||||
trend := ComputeTrend(points, metric, b.trendWindow7d)
|
||||
trends[metric+"_7d"] = trend
|
||||
}
|
||||
}
|
||||
|
||||
return trends
|
||||
}
|
||||
|
||||
// computeGuestTrends computes trends for a guest's metrics
|
||||
func (b *Builder) computeGuestTrends(guestID string) map[string]Trend {
|
||||
trends := make(map[string]Trend)
|
||||
|
||||
if b.metricsHistory == nil || !b.includeTrends {
|
||||
return trends
|
||||
}
|
||||
|
||||
// Get all metrics at once for efficiency
|
||||
allMetrics := b.metricsHistory.GetAllGuestMetrics(guestID, b.trendWindow7d)
|
||||
|
||||
for metric, points := range allMetrics {
|
||||
if len(points) < 3 {
|
||||
continue
|
||||
}
|
||||
|
||||
// Compute 24h trend
|
||||
recent := filterRecentPoints(points, b.trendWindow24h)
|
||||
if len(recent) >= 3 {
|
||||
trend := ComputeTrend(recent, metric, b.trendWindow24h)
|
||||
trends[metric+"_24h"] = trend
|
||||
}
|
||||
|
||||
// Compute 7d trend if enough data
|
||||
if len(points) >= 10 {
|
||||
trend := ComputeTrend(points, metric, b.trendWindow7d)
|
||||
trends[metric+"_7d"] = trend
|
||||
}
|
||||
}
|
||||
|
||||
return trends
|
||||
}
|
||||
|
||||
// computeStorageTrends computes trends for storage
|
||||
func (b *Builder) computeStorageTrends(storageID string) map[string]Trend {
|
||||
trends := make(map[string]Trend)
|
||||
|
||||
if b.metricsHistory == nil || !b.includeTrends {
|
||||
return trends
|
||||
}
|
||||
|
||||
allMetrics := b.metricsHistory.GetAllStorageMetrics(storageID, b.trendWindow7d)
|
||||
|
||||
// Focus on usage metric for storage
|
||||
if points, ok := allMetrics["usage"]; ok && len(points) >= 3 {
|
||||
recent := filterRecentPoints(points, b.trendWindow24h)
|
||||
if len(recent) >= 3 {
|
||||
trends["usage_24h"] = ComputeTrend(recent, "usage", b.trendWindow24h)
|
||||
}
|
||||
if len(points) >= 10 {
|
||||
trends["usage_7d"] = ComputeTrend(points, "usage", b.trendWindow7d)
|
||||
}
|
||||
}
|
||||
|
||||
return trends
|
||||
}
|
||||
|
||||
// computeStoragePredictions generates capacity predictions for storage
|
||||
func (b *Builder) computeStoragePredictions(storage models.Storage, trends map[string]Trend) []Prediction {
|
||||
var predictions []Prediction
|
||||
|
||||
// Use 7d trend for more stable prediction
|
||||
trend, ok := trends["usage_7d"]
|
||||
if !ok || trend.DataPoints < 10 {
|
||||
return predictions
|
||||
}
|
||||
|
||||
// Only predict if growing
|
||||
if trend.Direction != TrendGrowing || trend.RatePerDay <= 0 {
|
||||
return predictions
|
||||
}
|
||||
|
||||
// Current usage
|
||||
currentPct := storage.Usage
|
||||
if currentPct == 0 && storage.Total > 0 {
|
||||
currentPct = float64(storage.Used) / float64(storage.Total) * 100
|
||||
}
|
||||
|
||||
// Calculate days until 90% (warning) and 100% (critical)
|
||||
for _, threshold := range []struct {
|
||||
pct float64
|
||||
event string
|
||||
}{
|
||||
{90, "storage_warning_90pct"},
|
||||
{100, "storage_full"},
|
||||
} {
|
||||
if currentPct >= threshold.pct {
|
||||
continue // Already past this threshold
|
||||
}
|
||||
|
||||
remaining := threshold.pct - currentPct
|
||||
daysUntil := remaining / trend.RatePerDay
|
||||
|
||||
if daysUntil > 0 && daysUntil <= 30 { // Only predict within 30 days
|
||||
predictions = append(predictions, Prediction{
|
||||
ResourceID: storage.ID,
|
||||
Metric: "usage",
|
||||
Event: threshold.event,
|
||||
ETA: time.Now().Add(time.Duration(daysUntil*24) * time.Hour),
|
||||
DaysUntil: daysUntil,
|
||||
Confidence: trend.Confidence,
|
||||
Basis: formatPredictionBasis(trend),
|
||||
GrowthRate: trend.RatePerDay,
|
||||
CurrentPct: currentPct,
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
return predictions
|
||||
}
|
||||
|
||||
// formatPredictionBasis creates explanation for a prediction
|
||||
func formatPredictionBasis(trend Trend) string {
|
||||
return "Growing " + formatRate(trend.RatePerDay) + " based on " +
|
||||
formatDuration(trend.Period) + " of data"
|
||||
}
|
||||
|
||||
// buildDockerHostContext creates context for a Docker host
|
||||
func (b *Builder) buildDockerHostContext(host models.DockerHost) ResourceContext {
|
||||
displayName := host.Hostname
|
||||
if host.DisplayName != "" {
|
||||
displayName = host.DisplayName
|
||||
}
|
||||
|
||||
ctx := ResourceContext{
|
||||
ResourceID: host.ID,
|
||||
ResourceType: "docker_host",
|
||||
ResourceName: displayName,
|
||||
Status: host.Status,
|
||||
Uptime: time.Duration(host.UptimeSeconds) * time.Second,
|
||||
}
|
||||
|
||||
// Note: Docker hosts don't have the same trend data as Proxmox resources
|
||||
// We could add container-level trends in the future
|
||||
|
||||
return ctx
|
||||
}
|
||||
|
||||
// buildHostContext creates context for an agent host
|
||||
func (b *Builder) buildHostContext(host models.Host) ResourceContext {
|
||||
displayName := host.Hostname
|
||||
if host.DisplayName != "" {
|
||||
displayName = host.DisplayName
|
||||
}
|
||||
|
||||
// Calculate CPU and memory from host data
|
||||
cpuPct := 0.0
|
||||
if len(host.LoadAverage) > 0 && host.CPUCount > 0 {
|
||||
cpuPct = host.LoadAverage[0] / float64(host.CPUCount) * 100
|
||||
}
|
||||
|
||||
memPct := 0.0
|
||||
if host.Memory.Total > 0 {
|
||||
memPct = float64(host.Memory.Used) / float64(host.Memory.Total) * 100
|
||||
}
|
||||
|
||||
ctx := ResourceContext{
|
||||
ResourceID: host.ID,
|
||||
ResourceType: "host",
|
||||
ResourceName: displayName,
|
||||
CurrentCPU: cpuPct,
|
||||
CurrentMemory: memPct,
|
||||
Status: host.Status,
|
||||
Uptime: time.Duration(host.UptimeSeconds) * time.Second,
|
||||
}
|
||||
|
||||
return ctx
|
||||
}
|
||||
|
||||
// enrichWithNotes adds user annotations to context
|
||||
func (b *Builder) enrichWithNotes(ctx *ResourceContext) {
|
||||
if b.knowledge == nil {
|
||||
return
|
||||
}
|
||||
|
||||
notes := b.knowledge.GetNotes(ctx.ResourceID)
|
||||
if len(notes) > 0 {
|
||||
ctx.UserNotes = notes
|
||||
}
|
||||
}
|
||||
|
||||
// filterRecentPoints filters points to only include those within duration
|
||||
func filterRecentPoints(points []MetricPoint, duration time.Duration) []MetricPoint {
|
||||
cutoff := time.Now().Add(-duration)
|
||||
result := make([]MetricPoint, 0, len(points))
|
||||
for _, p := range points {
|
||||
if p.Timestamp.After(cutoff) {
|
||||
result = append(result, p)
|
||||
}
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
||||
// MergeContexts combines context for targeted analysis with relevant infrastructure context
|
||||
func (b *Builder) MergeContexts(target *ResourceContext, infrastructure *InfrastructureContext) string {
|
||||
// For targeted requests, highlight the target first, then add relevant related context
|
||||
var result strings.Builder
|
||||
|
||||
result.WriteString("# Target Resource\n")
|
||||
result.WriteString(FormatResourceContext(*target))
|
||||
result.WriteString("\n")
|
||||
|
||||
// Add related resources (same node, dependencies, etc.)
|
||||
// This could be expanded with dependency mapping in the future
|
||||
if target.Node != "" {
|
||||
result.WriteString("\n## Related Resources\n")
|
||||
// Find other resources on the same node
|
||||
for _, vm := range infrastructure.VMs {
|
||||
if vm.Node == target.Node && vm.ResourceID != target.ResourceID {
|
||||
result.WriteString(FormatResourceContext(vm))
|
||||
}
|
||||
}
|
||||
for _, ct := range infrastructure.Containers {
|
||||
if ct.Node == target.Node && ct.ResourceID != target.ResourceID {
|
||||
result.WriteString(FormatResourceContext(ct))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
return result.String()
|
||||
}
|
||||
429
internal/ai/context/formatter.go
Normal file
429
internal/ai/context/formatter.go
Normal file
|
|
@ -0,0 +1,429 @@
|
|||
package context
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/models"
|
||||
)
|
||||
|
||||
// FormatResourceContext formats a single resource's context for AI consumption
|
||||
func FormatResourceContext(ctx ResourceContext) string {
|
||||
var sb strings.Builder
|
||||
|
||||
// Header with resource identity
|
||||
typeLabel := formatResourceType(ctx.ResourceType)
|
||||
sb.WriteString(fmt.Sprintf("### %s: %s", typeLabel, ctx.ResourceName))
|
||||
if ctx.Node != "" && ctx.ResourceType != "node" {
|
||||
sb.WriteString(fmt.Sprintf(" (on %s)", ctx.Node))
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
|
||||
// Current state
|
||||
sb.WriteString(fmt.Sprintf("**Status**: %s", ctx.Status))
|
||||
if ctx.Uptime > 0 {
|
||||
sb.WriteString(fmt.Sprintf(" | **Uptime**: %s", formatDuration(ctx.Uptime)))
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
|
||||
// Current metrics
|
||||
var metrics []string
|
||||
if ctx.CurrentCPU >= 0 {
|
||||
metrics = append(metrics, fmt.Sprintf("CPU: %.1f%%", ctx.CurrentCPU))
|
||||
}
|
||||
if ctx.CurrentMemory >= 0 {
|
||||
metrics = append(metrics, fmt.Sprintf("Memory: %.1f%%", ctx.CurrentMemory))
|
||||
}
|
||||
if ctx.CurrentDisk >= 0 {
|
||||
metrics = append(metrics, fmt.Sprintf("Disk: %.1f%%", ctx.CurrentDisk))
|
||||
}
|
||||
if len(metrics) > 0 {
|
||||
sb.WriteString("**Current**: " + strings.Join(metrics, " | ") + "\n")
|
||||
}
|
||||
|
||||
// Trends section (the differentiating context)
|
||||
if len(ctx.Trends) > 0 {
|
||||
var trendLines []string
|
||||
for metric, trend := range ctx.Trends {
|
||||
if trend.DataPoints < 3 {
|
||||
continue // Skip if not enough data
|
||||
}
|
||||
line := formatTrendLine(metric, trend)
|
||||
if line != "" {
|
||||
trendLines = append(trendLines, line)
|
||||
}
|
||||
}
|
||||
if len(trendLines) > 0 {
|
||||
sb.WriteString("**Trends**: ")
|
||||
sb.WriteString(strings.Join(trendLines, " | "))
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
}
|
||||
|
||||
// Anomalies (high value - what's unusual)
|
||||
if len(ctx.Anomalies) > 0 {
|
||||
sb.WriteString("**⚠️ Anomalies**: ")
|
||||
var anomalyDescs []string
|
||||
for _, a := range ctx.Anomalies {
|
||||
anomalyDescs = append(anomalyDescs, a.Description)
|
||||
}
|
||||
sb.WriteString(strings.Join(anomalyDescs, "; "))
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
// Predictions (proactive value)
|
||||
if len(ctx.Predictions) > 0 {
|
||||
sb.WriteString("**⏰ Predictions**: ")
|
||||
var predDescs []string
|
||||
for _, p := range ctx.Predictions {
|
||||
predDescs = append(predDescs, fmt.Sprintf("%s in ~%.0f days", p.Event, p.DaysUntil))
|
||||
}
|
||||
sb.WriteString(strings.Join(predDescs, "; "))
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
// User notes (context that only Pulse knows)
|
||||
if len(ctx.UserNotes) > 0 {
|
||||
sb.WriteString("**User Notes**: ")
|
||||
sb.WriteString(strings.Join(ctx.UserNotes, "; "))
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
// Past issues (operational memory)
|
||||
if len(ctx.PastIssues) > 0 || ctx.LastRemediation != "" {
|
||||
sb.WriteString("**History**: ")
|
||||
if ctx.LastRemediation != "" {
|
||||
sb.WriteString(ctx.LastRemediation)
|
||||
}
|
||||
if len(ctx.PastIssues) > 0 {
|
||||
sb.WriteString(" Past issues: " + strings.Join(ctx.PastIssues, "; "))
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
return sb.String()
|
||||
}
|
||||
|
||||
// formatTrendLine creates a compact trend description
|
||||
func formatTrendLine(metric string, trend Trend) string {
|
||||
if trend.DataPoints < 3 {
|
||||
return ""
|
||||
}
|
||||
|
||||
metricLabel := strings.Title(metric)
|
||||
|
||||
// Direction with rate
|
||||
var directionStr string
|
||||
switch trend.Direction {
|
||||
case TrendGrowing:
|
||||
rate := formatRate(trend.RatePerDay)
|
||||
directionStr = fmt.Sprintf("↑ %s", rate)
|
||||
case TrendDeclining:
|
||||
rate := formatRate(-trend.RatePerDay) // Make positive for display
|
||||
directionStr = fmt.Sprintf("↓ %s", rate)
|
||||
case TrendVolatile:
|
||||
directionStr = "⚡ volatile"
|
||||
case TrendStable:
|
||||
directionStr = "→ stable"
|
||||
default:
|
||||
return ""
|
||||
}
|
||||
|
||||
// Include range if interesting
|
||||
rangeStr := ""
|
||||
if trend.Max-trend.Min > 5 { // Only show range if variation is significant
|
||||
rangeStr = fmt.Sprintf(" (%.0f-%.0f%%)", trend.Min, trend.Max)
|
||||
}
|
||||
|
||||
return fmt.Sprintf("%s: %s%s", metricLabel, directionStr, rangeStr)
|
||||
}
|
||||
|
||||
// formatRate formats a rate value appropriately
|
||||
func formatRate(ratePerDay float64) string {
|
||||
absRate := ratePerDay
|
||||
if absRate < 0 {
|
||||
absRate = -absRate
|
||||
}
|
||||
|
||||
if absRate >= 1 {
|
||||
return fmt.Sprintf("%.1f/day", absRate)
|
||||
}
|
||||
// Convert to per hour if < 1/day
|
||||
ratePerHour := absRate / 24
|
||||
if ratePerHour >= 0.1 {
|
||||
return fmt.Sprintf("%.1f/hr", ratePerHour)
|
||||
}
|
||||
return "slow"
|
||||
}
|
||||
|
||||
// FormatInfrastructureContext formats full infrastructure context for AI
|
||||
func FormatInfrastructureContext(ctx *InfrastructureContext) string {
|
||||
var sb strings.Builder
|
||||
|
||||
sb.WriteString("# Infrastructure State with Historical Context\n\n")
|
||||
sb.WriteString(fmt.Sprintf("*Generated at %s | Monitoring %d resources*\n\n",
|
||||
ctx.GeneratedAt.Format("2006-01-02 15:04"),
|
||||
ctx.TotalResources))
|
||||
|
||||
// Global anomalies first (high priority)
|
||||
if len(ctx.Anomalies) > 0 {
|
||||
sb.WriteString("## ⚠️ Current Anomalies\n")
|
||||
for _, a := range ctx.Anomalies {
|
||||
sb.WriteString(fmt.Sprintf("- **%s**: %s\n", a.Metric, a.Description))
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
// Predictions (proactive insights)
|
||||
if len(ctx.Predictions) > 0 {
|
||||
sb.WriteString("## ⏰ Predictions\n")
|
||||
for _, p := range ctx.Predictions {
|
||||
sb.WriteString(fmt.Sprintf("- **%s** on %s: %s (%.0f days, %.0f%% confidence)\n",
|
||||
p.Event, p.ResourceID, p.Basis, p.DaysUntil, p.Confidence*100))
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
// Recent changes (what's different)
|
||||
if len(ctx.Changes) > 0 {
|
||||
sb.WriteString("## 🔄 Recent Changes\n")
|
||||
for _, c := range ctx.Changes {
|
||||
sb.WriteString(fmt.Sprintf("- %s: %s\n", c.ResourceName, c.Description))
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
// Resources by type
|
||||
if len(ctx.Nodes) > 0 {
|
||||
sb.WriteString("## Proxmox Nodes\n")
|
||||
for _, r := range ctx.Nodes {
|
||||
sb.WriteString(FormatResourceContext(r))
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
}
|
||||
|
||||
if len(ctx.VMs) > 0 {
|
||||
sb.WriteString("## Virtual Machines\n")
|
||||
for _, r := range ctx.VMs {
|
||||
sb.WriteString(FormatResourceContext(r))
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
if len(ctx.Containers) > 0 {
|
||||
sb.WriteString("## LXC Containers\n")
|
||||
for _, r := range ctx.Containers {
|
||||
sb.WriteString(FormatResourceContext(r))
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
if len(ctx.Storage) > 0 {
|
||||
sb.WriteString("## Storage\n")
|
||||
for _, r := range ctx.Storage {
|
||||
sb.WriteString(FormatResourceContext(r))
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
if len(ctx.DockerHosts) > 0 {
|
||||
sb.WriteString("## Docker Hosts\n")
|
||||
for _, r := range ctx.DockerHosts {
|
||||
sb.WriteString(FormatResourceContext(r))
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
if len(ctx.Hosts) > 0 {
|
||||
sb.WriteString("## Agent Hosts\n")
|
||||
for _, r := range ctx.Hosts {
|
||||
sb.WriteString(FormatResourceContext(r))
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
}
|
||||
|
||||
return sb.String()
|
||||
}
|
||||
|
||||
// FormatCompactSummary creates a brief overview suitable for context-limited prompts
|
||||
func FormatCompactSummary(ctx *InfrastructureContext) string {
|
||||
var sb strings.Builder
|
||||
|
||||
sb.WriteString(fmt.Sprintf("Infrastructure: %d resources\n", ctx.TotalResources))
|
||||
|
||||
// Count by status
|
||||
var healthy, warning, critical int
|
||||
countResource := func(resources []ResourceContext) {
|
||||
for _, r := range resources {
|
||||
switch {
|
||||
case len(r.Anomalies) > 0:
|
||||
critical++
|
||||
case hasGrowingTrend(r):
|
||||
warning++
|
||||
default:
|
||||
healthy++
|
||||
}
|
||||
}
|
||||
}
|
||||
countResource(ctx.Nodes)
|
||||
countResource(ctx.VMs)
|
||||
countResource(ctx.Containers)
|
||||
countResource(ctx.Storage)
|
||||
countResource(ctx.DockerHosts)
|
||||
countResource(ctx.Hosts)
|
||||
|
||||
sb.WriteString(fmt.Sprintf("Health: %d healthy, %d warning, %d critical\n", healthy, warning, critical))
|
||||
|
||||
if len(ctx.Anomalies) > 0 {
|
||||
sb.WriteString(fmt.Sprintf("Anomalies: %d active\n", len(ctx.Anomalies)))
|
||||
}
|
||||
|
||||
if len(ctx.Predictions) > 0 {
|
||||
// Show most urgent prediction
|
||||
earliest := ctx.Predictions[0]
|
||||
for _, p := range ctx.Predictions[1:] {
|
||||
if p.DaysUntil < earliest.DaysUntil {
|
||||
earliest = p
|
||||
}
|
||||
}
|
||||
sb.WriteString(fmt.Sprintf("⏰ Nearest: %s in %.0f days\n", earliest.Event, earliest.DaysUntil))
|
||||
}
|
||||
|
||||
return sb.String()
|
||||
}
|
||||
|
||||
// hasGrowingTrend checks if any metric trend is concerning
|
||||
func hasGrowingTrend(r ResourceContext) bool {
|
||||
for _, t := range r.Trends {
|
||||
if t.Direction == TrendGrowing && t.RatePerDay > 1 {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
// formatResourceType converts internal type to display label
|
||||
func formatResourceType(t string) string {
|
||||
switch t {
|
||||
case "node":
|
||||
return "Node"
|
||||
case "vm":
|
||||
return "VM"
|
||||
case "container":
|
||||
return "Container"
|
||||
case "storage":
|
||||
return "Storage"
|
||||
case "docker_host":
|
||||
return "Docker Host"
|
||||
case "docker_container":
|
||||
return "Docker Container"
|
||||
case "host":
|
||||
return "Host"
|
||||
default:
|
||||
return strings.Title(t)
|
||||
}
|
||||
}
|
||||
|
||||
// formatDuration formats a duration in human-readable form
|
||||
func formatDuration(d time.Duration) string {
|
||||
if d < time.Minute {
|
||||
return fmt.Sprintf("%ds", int(d.Seconds()))
|
||||
}
|
||||
if d < time.Hour {
|
||||
return fmt.Sprintf("%dm", int(d.Minutes()))
|
||||
}
|
||||
if d < 24*time.Hour {
|
||||
hours := int(d.Hours())
|
||||
mins := int(d.Minutes()) % 60
|
||||
if mins > 0 {
|
||||
return fmt.Sprintf("%dh%dm", hours, mins)
|
||||
}
|
||||
return fmt.Sprintf("%dh", hours)
|
||||
}
|
||||
days := int(d.Hours() / 24)
|
||||
hours := int(d.Hours()) % 24
|
||||
if hours > 0 {
|
||||
return fmt.Sprintf("%dd%dh", days, hours)
|
||||
}
|
||||
return fmt.Sprintf("%dd", days)
|
||||
}
|
||||
|
||||
// FormatBackupStatus creates a human-readable backup status
|
||||
func FormatBackupStatus(lastBackup time.Time) string {
|
||||
if lastBackup.IsZero() {
|
||||
return "never"
|
||||
}
|
||||
age := time.Since(lastBackup)
|
||||
if age < 24*time.Hour {
|
||||
return fmt.Sprintf("%.0fh ago", age.Hours())
|
||||
}
|
||||
days := age.Hours() / 24
|
||||
return fmt.Sprintf("%.0fd ago", days)
|
||||
}
|
||||
|
||||
// FormatNodeForContext creates context for a Proxmox node
|
||||
func FormatNodeForContext(node models.Node, trends map[string]Trend) ResourceContext {
|
||||
// Calculate memory percentage
|
||||
memPct := 0.0
|
||||
if node.Memory.Total > 0 {
|
||||
memPct = float64(node.Memory.Used) / float64(node.Memory.Total) * 100
|
||||
}
|
||||
|
||||
ctx := ResourceContext{
|
||||
ResourceID: node.ID,
|
||||
ResourceType: "node",
|
||||
ResourceName: node.Name,
|
||||
CurrentCPU: node.CPU * 100, // Convert from 0-1 to percentage
|
||||
CurrentMemory: memPct,
|
||||
Status: node.Status,
|
||||
Uptime: time.Duration(node.Uptime) * time.Second,
|
||||
Trends: trends,
|
||||
}
|
||||
|
||||
return ctx
|
||||
}
|
||||
|
||||
// FormatGuestForContext creates context for a VM or container
|
||||
func FormatGuestForContext(
|
||||
id, name, node, guestType, status string,
|
||||
cpu, memUsage, diskUsage float64,
|
||||
uptime int64,
|
||||
lastBackup time.Time,
|
||||
trends map[string]Trend,
|
||||
) ResourceContext {
|
||||
ctx := ResourceContext{
|
||||
ResourceID: id,
|
||||
ResourceType: guestType,
|
||||
ResourceName: name,
|
||||
Node: node,
|
||||
CurrentCPU: cpu * 100, // Convert from 0-1 to percentage
|
||||
CurrentMemory: memUsage * 100,
|
||||
CurrentDisk: diskUsage * 100,
|
||||
Status: status,
|
||||
Uptime: time.Duration(uptime) * time.Second,
|
||||
Trends: trends,
|
||||
}
|
||||
|
||||
return ctx
|
||||
}
|
||||
|
||||
// FormatStorageForContext creates context for storage
|
||||
func FormatStorageForContext(storage models.Storage, trends map[string]Trend) ResourceContext {
|
||||
usagePct := storage.Usage
|
||||
if usagePct == 0 && storage.Total > 0 {
|
||||
usagePct = float64(storage.Used) / float64(storage.Total) * 100
|
||||
}
|
||||
|
||||
ctx := ResourceContext{
|
||||
ResourceID: storage.ID,
|
||||
ResourceType: "storage",
|
||||
ResourceName: storage.Name,
|
||||
Node: storage.Node,
|
||||
CurrentDisk: usagePct,
|
||||
Status: storage.Status,
|
||||
Trends: trends,
|
||||
}
|
||||
|
||||
return ctx
|
||||
}
|
||||
327
internal/ai/context/trends.go
Normal file
327
internal/ai/context/trends.go
Normal file
|
|
@ -0,0 +1,327 @@
|
|||
package context
|
||||
|
||||
import (
|
||||
"math"
|
||||
"sort"
|
||||
"time"
|
||||
)
|
||||
|
||||
// ComputeTrend calculates trend from historical data points.
|
||||
// This is the core function that transforms raw metrics into meaningful insights.
|
||||
func ComputeTrend(points []MetricPoint, metricName string, period time.Duration) Trend {
|
||||
trend := Trend{
|
||||
Metric: metricName,
|
||||
Direction: TrendStable,
|
||||
Period: period,
|
||||
DataPoints: len(points),
|
||||
}
|
||||
|
||||
if len(points) < 2 {
|
||||
trend.Confidence = 0
|
||||
return trend
|
||||
}
|
||||
|
||||
// Sort by timestamp to ensure correct order
|
||||
sorted := make([]MetricPoint, len(points))
|
||||
copy(sorted, points)
|
||||
sort.Slice(sorted, func(i, j int) bool {
|
||||
return sorted[i].Timestamp.Before(sorted[j].Timestamp)
|
||||
})
|
||||
|
||||
// Calculate basic statistics
|
||||
stats := computeStats(sorted)
|
||||
trend.Average = stats.Mean
|
||||
trend.Min = stats.Min
|
||||
trend.Max = stats.Max
|
||||
trend.StdDev = stats.StdDev
|
||||
trend.Current = sorted[len(sorted)-1].Value
|
||||
|
||||
// Perform linear regression to get slope and fit quality
|
||||
regression := linearRegression(sorted)
|
||||
trend.Confidence = regression.R2
|
||||
|
||||
// Convert slope from "per second" to "per hour" and "per day"
|
||||
// Slope is in units/second
|
||||
trend.RatePerHour = regression.Slope * 3600
|
||||
trend.RatePerDay = regression.Slope * 86400
|
||||
|
||||
// Classify the trend direction
|
||||
trend.Direction = classifyTrend(regression.Slope, stats.Mean, stats.StdDev)
|
||||
|
||||
return trend
|
||||
}
|
||||
|
||||
// computeStats calculates basic statistics for a set of metric points
|
||||
func computeStats(points []MetricPoint) Stats {
|
||||
if len(points) == 0 {
|
||||
return Stats{}
|
||||
}
|
||||
|
||||
stats := Stats{
|
||||
Count: len(points),
|
||||
Min: points[0].Value,
|
||||
Max: points[0].Value,
|
||||
}
|
||||
|
||||
for _, p := range points {
|
||||
stats.Sum += p.Value
|
||||
if p.Value < stats.Min {
|
||||
stats.Min = p.Value
|
||||
}
|
||||
if p.Value > stats.Max {
|
||||
stats.Max = p.Value
|
||||
}
|
||||
}
|
||||
|
||||
stats.Mean = stats.Sum / float64(stats.Count)
|
||||
|
||||
// Calculate standard deviation
|
||||
var sumSquares float64
|
||||
for _, p := range points {
|
||||
diff := p.Value - stats.Mean
|
||||
sumSquares += diff * diff
|
||||
}
|
||||
stats.StdDev = math.Sqrt(sumSquares / float64(stats.Count))
|
||||
|
||||
return stats
|
||||
}
|
||||
|
||||
// linearRegression performs simple linear regression on time-series data.
|
||||
// Returns slope (change per second), intercept, and R² (goodness of fit).
|
||||
func linearRegression(points []MetricPoint) LinearRegressionResult {
|
||||
if len(points) < 2 {
|
||||
return LinearRegressionResult{}
|
||||
}
|
||||
|
||||
n := float64(len(points))
|
||||
|
||||
// Use time relative to first point for numerical stability
|
||||
baseTime := points[0].Timestamp
|
||||
|
||||
var sumX, sumY, sumXY, sumX2, sumY2 float64
|
||||
for _, p := range points {
|
||||
x := p.Timestamp.Sub(baseTime).Seconds() // seconds since start
|
||||
y := p.Value
|
||||
|
||||
sumX += x
|
||||
sumY += y
|
||||
sumXY += x * y
|
||||
sumX2 += x * x
|
||||
sumY2 += y * y
|
||||
}
|
||||
|
||||
// Calculate slope and intercept using least squares
|
||||
denominator := n*sumX2 - sumX*sumX
|
||||
if math.Abs(denominator) < 1e-10 {
|
||||
// All x values are the same (no time span)
|
||||
return LinearRegressionResult{R2: 0}
|
||||
}
|
||||
|
||||
slope := (n*sumXY - sumX*sumY) / denominator
|
||||
intercept := (sumY - slope*sumX) / n
|
||||
|
||||
// Calculate R² (coefficient of determination)
|
||||
meanY := sumY / n
|
||||
var ssRes, ssTot float64 // Sum of squares residual and total
|
||||
for _, p := range points {
|
||||
x := p.Timestamp.Sub(baseTime).Seconds()
|
||||
yPred := slope*x + intercept
|
||||
ssRes += (p.Value - yPred) * (p.Value - yPred)
|
||||
ssTot += (p.Value - meanY) * (p.Value - meanY)
|
||||
}
|
||||
|
||||
r2 := 0.0
|
||||
if ssTot > 0 {
|
||||
r2 = 1 - (ssRes / ssTot)
|
||||
}
|
||||
// Clamp R² to [0, 1] (can be negative for very bad fits)
|
||||
if r2 < 0 {
|
||||
r2 = 0
|
||||
}
|
||||
|
||||
return LinearRegressionResult{
|
||||
Slope: slope,
|
||||
Intercept: intercept,
|
||||
R2: r2,
|
||||
}
|
||||
}
|
||||
|
||||
// classifyTrend determines the trend direction based on slope and statistics.
|
||||
// We normalize the slope relative to the metric's magnitude to avoid
|
||||
// false positives on high-value metrics.
|
||||
func classifyTrend(slopePerSecond, mean, stdDev float64) TrendDirection {
|
||||
// If there's no significant variation, it's stable
|
||||
if stdDev < 0.01 && math.Abs(slopePerSecond) < 1e-10 {
|
||||
return TrendStable
|
||||
}
|
||||
|
||||
// If standard deviation is high relative to mean, it's volatile
|
||||
if mean > 0 && stdDev/mean > 0.3 {
|
||||
return TrendVolatile
|
||||
}
|
||||
|
||||
// Convert slope to hourly rate for easier reasoning
|
||||
hourlyRate := slopePerSecond * 3600
|
||||
|
||||
// Determine significance threshold based on the metric's scale
|
||||
// For percentage metrics (0-100), we care about ~0.1% per hour (~2.4% per day)
|
||||
// This catches slow-growing issues before they become critical
|
||||
// For absolute metrics, we care about ~0.5% of mean per hour
|
||||
threshold := 0.1 // Default threshold for percentage metrics
|
||||
if mean > 100 {
|
||||
// For larger absolute values, use relative threshold
|
||||
threshold = mean * 0.005
|
||||
}
|
||||
|
||||
// Check if the hourly change is significant
|
||||
if hourlyRate > threshold {
|
||||
return TrendGrowing
|
||||
}
|
||||
if hourlyRate < -threshold {
|
||||
return TrendDeclining
|
||||
}
|
||||
|
||||
return TrendStable
|
||||
}
|
||||
|
||||
// ComputePercentiles calculates percentile values from a sorted slice of points
|
||||
func ComputePercentiles(points []MetricPoint, percentiles ...int) map[int]float64 {
|
||||
result := make(map[int]float64)
|
||||
if len(points) == 0 {
|
||||
return result
|
||||
}
|
||||
|
||||
// Extract values and sort
|
||||
values := make([]float64, len(points))
|
||||
for i, p := range points {
|
||||
values[i] = p.Value
|
||||
}
|
||||
sort.Float64s(values)
|
||||
|
||||
for _, p := range percentiles {
|
||||
if p < 0 || p > 100 {
|
||||
continue
|
||||
}
|
||||
|
||||
// Calculate index for percentile
|
||||
idx := float64(p) / 100.0 * float64(len(values)-1)
|
||||
lower := int(math.Floor(idx))
|
||||
upper := int(math.Ceil(idx))
|
||||
|
||||
if lower >= len(values) {
|
||||
lower = len(values) - 1
|
||||
}
|
||||
if upper >= len(values) {
|
||||
upper = len(values) - 1
|
||||
}
|
||||
|
||||
if lower == upper {
|
||||
result[p] = values[lower]
|
||||
} else {
|
||||
// Linear interpolation between adjacent values
|
||||
frac := idx - float64(lower)
|
||||
result[p] = values[lower]*(1-frac) + values[upper]*frac
|
||||
}
|
||||
}
|
||||
|
||||
return result
|
||||
}
|
||||
|
||||
// TrendSummary generates a human-readable summary of a trend
|
||||
func TrendSummary(t Trend) string {
|
||||
if t.DataPoints < 2 {
|
||||
return "insufficient data"
|
||||
}
|
||||
|
||||
directionStr := ""
|
||||
switch t.Direction {
|
||||
case TrendGrowing:
|
||||
directionStr = "growing"
|
||||
case TrendDeclining:
|
||||
directionStr = "declining"
|
||||
case TrendVolatile:
|
||||
directionStr = "volatile"
|
||||
case TrendStable:
|
||||
directionStr = "stable"
|
||||
}
|
||||
|
||||
// Format rate based on magnitude
|
||||
rateStr := ""
|
||||
if t.Direction == TrendGrowing || t.Direction == TrendDeclining {
|
||||
absRate := math.Abs(t.RatePerDay)
|
||||
if absRate > 1 {
|
||||
rateStr = formatFloat(absRate, 1) + "/day"
|
||||
} else {
|
||||
rateStr = formatFloat(math.Abs(t.RatePerHour), 2) + "/hr"
|
||||
}
|
||||
}
|
||||
|
||||
if rateStr != "" {
|
||||
return directionStr + " " + rateStr
|
||||
}
|
||||
return directionStr
|
||||
}
|
||||
|
||||
// formatFloat formats a float with the given precision, trimming trailing zeros
|
||||
func formatFloat(v float64, precision int) string {
|
||||
return trimTrailingZeros(floatToString(v, precision))
|
||||
}
|
||||
|
||||
func floatToString(v float64, precision int) string {
|
||||
switch precision {
|
||||
case 0:
|
||||
return intToString(int(math.Round(v)))
|
||||
case 1:
|
||||
return intToString(int(v)) + "." + intToString(int(math.Round((v-float64(int(v)))*10)))
|
||||
case 2:
|
||||
return intToString(int(v)) + "." + padLeft(intToString(int(math.Round((v-float64(int(v)))*100))), 2, '0')
|
||||
default:
|
||||
mult := math.Pow(10, float64(precision))
|
||||
return intToString(int(v)) + "." + padLeft(intToString(int(math.Round((v-float64(int(v)))*mult))), precision, '0')
|
||||
}
|
||||
}
|
||||
|
||||
func intToString(i int) string {
|
||||
if i < 0 {
|
||||
return "-" + intToString(-i)
|
||||
}
|
||||
if i < 10 {
|
||||
return string(rune('0' + i))
|
||||
}
|
||||
return intToString(i/10) + string(rune('0'+i%10))
|
||||
}
|
||||
|
||||
func padLeft(s string, length int, pad rune) string {
|
||||
for len(s) < length {
|
||||
s = string(pad) + s
|
||||
}
|
||||
return s
|
||||
}
|
||||
|
||||
func trimTrailingZeros(s string) string {
|
||||
if s == "" {
|
||||
return s
|
||||
}
|
||||
// Find decimal point
|
||||
dotIdx := -1
|
||||
for i, c := range s {
|
||||
if c == '.' {
|
||||
dotIdx = i
|
||||
break
|
||||
}
|
||||
}
|
||||
if dotIdx == -1 {
|
||||
return s // No decimal point
|
||||
}
|
||||
|
||||
// Trim trailing zeros after decimal
|
||||
end := len(s)
|
||||
for end > dotIdx+1 && s[end-1] == '0' {
|
||||
end--
|
||||
}
|
||||
// Also trim decimal if nothing after it
|
||||
if end == dotIdx+1 {
|
||||
end = dotIdx
|
||||
}
|
||||
return s[:end]
|
||||
}
|
||||
250
internal/ai/context/trends_test.go
Normal file
250
internal/ai/context/trends_test.go
Normal file
|
|
@ -0,0 +1,250 @@
|
|||
package context
|
||||
|
||||
import (
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
func TestComputeTrend_Growing(t *testing.T) {
|
||||
// Create growing data (10% per day)
|
||||
now := time.Now()
|
||||
points := make([]MetricPoint, 24)
|
||||
for i := 0; i < 24; i++ {
|
||||
// 10% per day = ~0.417% per hour
|
||||
points[i] = MetricPoint{
|
||||
Value: 50 + float64(i)*0.417,
|
||||
Timestamp: now.Add(time.Duration(-24+i) * time.Hour),
|
||||
}
|
||||
}
|
||||
|
||||
trend := ComputeTrend(points, "memory", 24*time.Hour)
|
||||
|
||||
if trend.Direction != TrendGrowing {
|
||||
t.Errorf("Expected TrendGrowing, got %s", trend.Direction)
|
||||
}
|
||||
|
||||
// Rate should be ~10% per day
|
||||
if trend.RatePerDay < 8 || trend.RatePerDay > 12 {
|
||||
t.Errorf("Expected rate ~10/day, got %.2f", trend.RatePerDay)
|
||||
}
|
||||
|
||||
if trend.DataPoints != 24 {
|
||||
t.Errorf("Expected 24 data points, got %d", trend.DataPoints)
|
||||
}
|
||||
}
|
||||
|
||||
func TestComputeTrend_Stable(t *testing.T) {
|
||||
// Create stable data with small fluctuations
|
||||
now := time.Now()
|
||||
points := make([]MetricPoint, 24)
|
||||
for i := 0; i < 24; i++ {
|
||||
// Small random-looking variation around 50%, but no trend
|
||||
offset := float64(i%3 - 1) * 0.2
|
||||
points[i] = MetricPoint{
|
||||
Value: 50 + offset,
|
||||
Timestamp: now.Add(time.Duration(-24+i) * time.Hour),
|
||||
}
|
||||
}
|
||||
|
||||
trend := ComputeTrend(points, "cpu", 24*time.Hour)
|
||||
|
||||
if trend.Direction != TrendStable {
|
||||
t.Errorf("Expected TrendStable, got %s (rate: %.4f/hr)", trend.Direction, trend.RatePerHour)
|
||||
}
|
||||
}
|
||||
|
||||
func TestComputeTrend_Declining(t *testing.T) {
|
||||
// Create declining data
|
||||
now := time.Now()
|
||||
points := make([]MetricPoint, 24)
|
||||
for i := 0; i < 24; i++ {
|
||||
points[i] = MetricPoint{
|
||||
Value: 80 - float64(i)*0.5, // -12% per day
|
||||
Timestamp: now.Add(time.Duration(-24+i) * time.Hour),
|
||||
}
|
||||
}
|
||||
|
||||
trend := ComputeTrend(points, "disk", 24*time.Hour)
|
||||
|
||||
if trend.Direction != TrendDeclining {
|
||||
t.Errorf("Expected TrendDeclining, got %s", trend.Direction)
|
||||
}
|
||||
}
|
||||
|
||||
func TestComputeTrend_Volatile(t *testing.T) {
|
||||
// Create volatile data with high variance
|
||||
now := time.Now()
|
||||
points := make([]MetricPoint, 24)
|
||||
for i := 0; i < 24; i++ {
|
||||
// Alternating high/low values
|
||||
value := 50.0
|
||||
if i%2 == 0 {
|
||||
value = 80.0
|
||||
} else {
|
||||
value = 20.0
|
||||
}
|
||||
points[i] = MetricPoint{
|
||||
Value: value,
|
||||
Timestamp: now.Add(time.Duration(-24+i) * time.Hour),
|
||||
}
|
||||
}
|
||||
|
||||
trend := ComputeTrend(points, "cpu", 24*time.Hour)
|
||||
|
||||
if trend.Direction != TrendVolatile {
|
||||
t.Errorf("Expected TrendVolatile, got %s (stddev: %.2f, mean: %.2f)",
|
||||
trend.Direction, trend.StdDev, trend.Average)
|
||||
}
|
||||
}
|
||||
|
||||
func TestComputeTrend_InsufficientData(t *testing.T) {
|
||||
// Only one data point
|
||||
points := []MetricPoint{
|
||||
{Value: 50, Timestamp: time.Now()},
|
||||
}
|
||||
|
||||
trend := ComputeTrend(points, "memory", 24*time.Hour)
|
||||
|
||||
if trend.Confidence != 0 {
|
||||
t.Errorf("Expected 0 confidence with insufficient data, got %.2f", trend.Confidence)
|
||||
}
|
||||
}
|
||||
|
||||
func TestLinearRegression_Perfect(t *testing.T) {
|
||||
// Perfect linear data: y = 2x + 10
|
||||
now := time.Now()
|
||||
points := make([]MetricPoint, 10)
|
||||
for i := 0; i < 10; i++ {
|
||||
points[i] = MetricPoint{
|
||||
Value: 10 + float64(i)*2,
|
||||
Timestamp: now.Add(time.Duration(i) * time.Second),
|
||||
}
|
||||
}
|
||||
|
||||
result := linearRegression(points)
|
||||
|
||||
// Slope should be 2 per second
|
||||
if result.Slope < 1.9 || result.Slope > 2.1 {
|
||||
t.Errorf("Expected slope ~2, got %.4f", result.Slope)
|
||||
}
|
||||
|
||||
// R² should be 1 (perfect fit)
|
||||
if result.R2 < 0.99 {
|
||||
t.Errorf("Expected R² ~1, got %.4f", result.R2)
|
||||
}
|
||||
}
|
||||
|
||||
func TestComputePercentiles(t *testing.T) {
|
||||
now := time.Now()
|
||||
// Create 100 points with values 1-100
|
||||
points := make([]MetricPoint, 100)
|
||||
for i := 0; i < 100; i++ {
|
||||
points[i] = MetricPoint{
|
||||
Value: float64(i + 1),
|
||||
Timestamp: now.Add(time.Duration(i) * time.Second),
|
||||
}
|
||||
}
|
||||
|
||||
percentiles := ComputePercentiles(points, 5, 50, 95)
|
||||
|
||||
// P5 should be ~5
|
||||
if percentiles[5] < 4 || percentiles[5] > 6 {
|
||||
t.Errorf("Expected P5 ~5, got %.2f", percentiles[5])
|
||||
}
|
||||
|
||||
// P50 should be ~50
|
||||
if percentiles[50] < 49 || percentiles[50] > 51 {
|
||||
t.Errorf("Expected P50 ~50, got %.2f", percentiles[50])
|
||||
}
|
||||
|
||||
// P95 should be ~95
|
||||
if percentiles[95] < 94 || percentiles[95] > 96 {
|
||||
t.Errorf("Expected P95 ~95, got %.2f", percentiles[95])
|
||||
}
|
||||
}
|
||||
|
||||
func TestTrendSummary(t *testing.T) {
|
||||
tests := []struct {
|
||||
name string
|
||||
trend Trend
|
||||
expected string
|
||||
}{
|
||||
{
|
||||
name: "growing fast",
|
||||
trend: Trend{
|
||||
Direction: TrendGrowing,
|
||||
RatePerDay: 5.5,
|
||||
RatePerHour: 0.23,
|
||||
DataPoints: 24,
|
||||
},
|
||||
expected: "growing 5.5/day",
|
||||
},
|
||||
{
|
||||
name: "growing slow",
|
||||
trend: Trend{
|
||||
Direction: TrendGrowing,
|
||||
RatePerDay: 0.5,
|
||||
RatePerHour: 0.02,
|
||||
DataPoints: 24,
|
||||
},
|
||||
expected: "growing 0.02/hr",
|
||||
},
|
||||
{
|
||||
name: "stable",
|
||||
trend: Trend{
|
||||
Direction: TrendStable,
|
||||
DataPoints: 24,
|
||||
},
|
||||
expected: "stable",
|
||||
},
|
||||
{
|
||||
name: "volatile",
|
||||
trend: Trend{
|
||||
Direction: TrendVolatile,
|
||||
DataPoints: 24,
|
||||
},
|
||||
expected: "volatile",
|
||||
},
|
||||
{
|
||||
name: "insufficient data",
|
||||
trend: Trend{
|
||||
DataPoints: 1,
|
||||
},
|
||||
expected: "insufficient data",
|
||||
},
|
||||
}
|
||||
|
||||
for _, tt := range tests {
|
||||
t.Run(tt.name, func(t *testing.T) {
|
||||
result := TrendSummary(tt.trend)
|
||||
if result != tt.expected {
|
||||
t.Errorf("Expected %q, got %q", tt.expected, result)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestComputeStats(t *testing.T) {
|
||||
points := []MetricPoint{
|
||||
{Value: 10},
|
||||
{Value: 20},
|
||||
{Value: 30},
|
||||
{Value: 40},
|
||||
{Value: 50},
|
||||
}
|
||||
|
||||
stats := computeStats(points)
|
||||
|
||||
if stats.Count != 5 {
|
||||
t.Errorf("Expected count 5, got %d", stats.Count)
|
||||
}
|
||||
if stats.Min != 10 {
|
||||
t.Errorf("Expected min 10, got %.2f", stats.Min)
|
||||
}
|
||||
if stats.Max != 50 {
|
||||
t.Errorf("Expected max 50, got %.2f", stats.Max)
|
||||
}
|
||||
if stats.Mean != 30 {
|
||||
t.Errorf("Expected mean 30, got %.2f", stats.Mean)
|
||||
}
|
||||
}
|
||||
180
internal/ai/context/types.go
Normal file
180
internal/ai/context/types.go
Normal file
|
|
@ -0,0 +1,180 @@
|
|||
// Package context provides AI context building with historical data integration.
|
||||
// This package transforms raw metrics and state into meaningful, time-aware context
|
||||
// that differentiates Pulse AI from stateless AI assistants.
|
||||
package context
|
||||
|
||||
import (
|
||||
"time"
|
||||
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/types"
|
||||
)
|
||||
|
||||
// MetricPoint is an alias for the shared type
|
||||
type MetricPoint = types.MetricPoint
|
||||
|
||||
// TrendDirection indicates whether a metric is growing, stable, or declining
|
||||
type TrendDirection string
|
||||
|
||||
const (
|
||||
TrendStable TrendDirection = "stable" // No significant change
|
||||
TrendGrowing TrendDirection = "growing" // Increasing over time
|
||||
TrendDeclining TrendDirection = "declining" // Decreasing over time
|
||||
TrendVolatile TrendDirection = "volatile" // Fluctuating significantly
|
||||
)
|
||||
|
||||
// Trend represents the direction and rate of change for a metric
|
||||
type Trend struct {
|
||||
Metric string // Name of the metric (cpu, memory, disk)
|
||||
Direction TrendDirection // Overall direction
|
||||
RatePerHour float64 // Change per hour (in metric units, e.g., percentage points)
|
||||
RatePerDay float64 // Change per day
|
||||
Current float64 // Most recent value
|
||||
Average float64 // Average over the period
|
||||
Min float64 // Minimum value
|
||||
Max float64 // Maximum value
|
||||
StdDev float64 // Standard deviation
|
||||
DataPoints int // Number of data points used
|
||||
Period time.Duration // Time period analyzed
|
||||
Confidence float64 // 0-1 confidence based on data quality (R² for linear fit)
|
||||
}
|
||||
|
||||
// Baseline represents learned "normal" behavior for a metric
|
||||
type Baseline struct {
|
||||
Metric string // Name of the metric
|
||||
Mean float64 // Average value
|
||||
StdDev float64 // Standard deviation
|
||||
P5 float64 // 5th percentile (low boundary)
|
||||
P50 float64 // Median
|
||||
P95 float64 // 95th percentile (high boundary)
|
||||
Min float64 // Observed minimum
|
||||
Max float64 // Observed maximum
|
||||
SampleCount int // Number of samples used
|
||||
LearnedAt time.Time // When baseline was computed
|
||||
}
|
||||
|
||||
// Anomaly represents a detected deviation from normal behavior
|
||||
type Anomaly struct {
|
||||
Metric string // Which metric is anomalous
|
||||
Current float64 // Current value
|
||||
Expected float64 // Expected value (baseline mean)
|
||||
Deviation float64 // Number of standard deviations from mean
|
||||
Severity string // "low", "medium", "high", "critical"
|
||||
Since time.Time // When the anomaly started (if known)
|
||||
Description string // Human-readable description
|
||||
}
|
||||
|
||||
// Prediction represents a forecasted future event
|
||||
type Prediction struct {
|
||||
ResourceID string // Which resource this prediction is for
|
||||
Metric string // Which metric
|
||||
Event string // Type of predicted event (capacity_full, oom, etc.)
|
||||
ETA time.Time // When the event is predicted to occur
|
||||
DaysUntil float64 // Days until event
|
||||
Confidence float64 // 0-1 confidence level
|
||||
Basis string // Explanation of how prediction was made
|
||||
GrowthRate float64 // Rate of change used for projection
|
||||
CurrentPct float64 // Current usage percentage
|
||||
}
|
||||
|
||||
// Change represents a detected configuration or state change
|
||||
type Change struct {
|
||||
ResourceID string // Which resource changed
|
||||
ResourceName string // Display name
|
||||
ChangeType ChangeType // Type of change
|
||||
Before interface{} // Previous value (nil for creation)
|
||||
After interface{} // New value (nil for deletion)
|
||||
DetectedAt time.Time // When change was detected
|
||||
Description string // Human-readable description
|
||||
}
|
||||
|
||||
// ChangeType categorizes types of changes
|
||||
type ChangeType string
|
||||
|
||||
const (
|
||||
ChangeCreated ChangeType = "created" // New resource appeared
|
||||
ChangeDeleted ChangeType = "deleted" // Resource disappeared
|
||||
ChangeConfig ChangeType = "config" // Configuration change (RAM, CPU)
|
||||
ChangeStatus ChangeType = "status" // Status change (started, stopped)
|
||||
ChangeMigrated ChangeType = "migrated" // Moved to different node
|
||||
ChangePerformance ChangeType = "performance" // Significant performance shift
|
||||
)
|
||||
|
||||
// ResourceTrends contains all trend data for a single resource
|
||||
type ResourceTrends struct {
|
||||
ResourceID string // Unique identifier
|
||||
ResourceType string // node, vm, container, storage, docker_host
|
||||
ResourceName string // Display name
|
||||
Trends map[string]Trend // Metric name -> trend data
|
||||
DataAvailable bool // Whether we have historical data for this resource
|
||||
OldestData time.Time // Timestamp of oldest data point
|
||||
NewestData time.Time // Timestamp of newest data point
|
||||
}
|
||||
|
||||
// ResourceContext contains all context for a single resource
|
||||
type ResourceContext struct {
|
||||
ResourceID string
|
||||
ResourceType string // "node", "vm", "container", "storage", "docker_host"
|
||||
ResourceName string
|
||||
Node string // Parent node (for guests)
|
||||
|
||||
// Current state (point-in-time)
|
||||
CurrentCPU float64
|
||||
CurrentMemory float64
|
||||
CurrentDisk float64
|
||||
Status string
|
||||
Uptime time.Duration
|
||||
|
||||
// Historical analysis
|
||||
Trends map[string]Trend // metric -> trend (24h and 7d)
|
||||
Baselines map[string]Baseline // metric -> baseline
|
||||
Anomalies []Anomaly // Current anomalies
|
||||
|
||||
// Predictions
|
||||
Predictions []Prediction
|
||||
|
||||
// Operational memory
|
||||
UserNotes []string // User-provided annotations
|
||||
PastIssues []string // Summary of past findings
|
||||
LastRemediation string // What was done last time
|
||||
RecentChanges []Change // Recent configuration changes
|
||||
}
|
||||
|
||||
// InfrastructureContext contains summarized context for the entire infrastructure
|
||||
type InfrastructureContext struct {
|
||||
// Timestamp of this context snapshot
|
||||
GeneratedAt time.Time
|
||||
|
||||
// Summary statistics
|
||||
TotalResources int
|
||||
ResourcesWithData int // Resources with historical data available
|
||||
|
||||
// Categorized resources with their context
|
||||
Nodes []ResourceContext
|
||||
VMs []ResourceContext
|
||||
Containers []ResourceContext
|
||||
Storage []ResourceContext
|
||||
DockerHosts []ResourceContext
|
||||
Hosts []ResourceContext
|
||||
|
||||
// Global insights
|
||||
Anomalies []Anomaly // Cross-infrastructure anomalies
|
||||
Predictions []Prediction // Capacity and failure predictions
|
||||
Changes []Change // Recent changes across infrastructure
|
||||
}
|
||||
|
||||
// Stats contains summary statistics for a metric
|
||||
type Stats struct {
|
||||
Count int
|
||||
Min float64
|
||||
Max float64
|
||||
Sum float64
|
||||
Mean float64
|
||||
StdDev float64
|
||||
}
|
||||
|
||||
// LinearRegressionResult contains the results of linear regression
|
||||
type LinearRegressionResult struct {
|
||||
Slope float64 // Rate of change per second
|
||||
Intercept float64 // Y-intercept
|
||||
R2 float64 // Coefficient of determination (0-1)
|
||||
}
|
||||
289
internal/ai/cost/store.go
Normal file
289
internal/ai/cost/store.go
Normal file
|
|
@ -0,0 +1,289 @@
|
|||
package cost
|
||||
|
||||
import (
|
||||
"sort"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/rs/zerolog/log"
|
||||
)
|
||||
|
||||
// UsageEvent represents a single AI provider call for cost/token tracking.
|
||||
// It intentionally excludes prompt/response content for privacy.
|
||||
type UsageEvent struct {
|
||||
Timestamp time.Time `json:"timestamp"`
|
||||
Provider string `json:"provider"`
|
||||
RequestModel string `json:"request_model"`
|
||||
ResponseModel string `json:"response_model,omitempty"`
|
||||
UseCase string `json:"use_case,omitempty"` // "chat" or "patrol"
|
||||
InputTokens int `json:"input_tokens,omitempty"`
|
||||
OutputTokens int `json:"output_tokens,omitempty"`
|
||||
TargetType string `json:"target_type,omitempty"`
|
||||
TargetID string `json:"target_id,omitempty"`
|
||||
FindingID string `json:"finding_id,omitempty"`
|
||||
}
|
||||
|
||||
// Persistence defines the storage contract for usage history.
|
||||
type Persistence interface {
|
||||
SaveUsageHistory(events []UsageEvent) error
|
||||
LoadUsageHistory() ([]UsageEvent, error)
|
||||
}
|
||||
|
||||
// DefaultMaxDays is the default retention window for raw usage events.
|
||||
const DefaultMaxDays = 90
|
||||
|
||||
// Store provides thread-safe usage tracking with optional persistence.
|
||||
type Store struct {
|
||||
mu sync.RWMutex
|
||||
events []UsageEvent
|
||||
maxDays int
|
||||
persistence Persistence
|
||||
|
||||
// Debounced persistence to avoid frequent disk writes.
|
||||
saveTimer *time.Timer
|
||||
savePending bool
|
||||
saveDebounce time.Duration
|
||||
}
|
||||
|
||||
// NewStore creates a new usage store.
|
||||
func NewStore(maxDays int) *Store {
|
||||
if maxDays <= 0 {
|
||||
maxDays = DefaultMaxDays
|
||||
}
|
||||
return &Store{
|
||||
events: make([]UsageEvent, 0),
|
||||
maxDays: maxDays,
|
||||
saveDebounce: 5 * time.Second,
|
||||
}
|
||||
}
|
||||
|
||||
// SetPersistence sets persistence and loads any existing history.
|
||||
func (s *Store) SetPersistence(p Persistence) error {
|
||||
s.mu.Lock()
|
||||
s.persistence = p
|
||||
s.mu.Unlock()
|
||||
|
||||
if p == nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
events, err := p.LoadUsageHistory()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
s.mu.Lock()
|
||||
s.events = events
|
||||
s.trimLocked(time.Now())
|
||||
s.mu.Unlock()
|
||||
return nil
|
||||
}
|
||||
|
||||
// Record appends a usage event and schedules persistence.
|
||||
func (s *Store) Record(event UsageEvent) {
|
||||
if event.Timestamp.IsZero() {
|
||||
event.Timestamp = time.Now()
|
||||
}
|
||||
|
||||
s.mu.Lock()
|
||||
s.events = append(s.events, event)
|
||||
s.trimLocked(time.Now())
|
||||
s.scheduleSaveLocked()
|
||||
s.mu.Unlock()
|
||||
}
|
||||
|
||||
// GetSummary returns a rollup of usage over the last N days.
|
||||
func (s *Store) GetSummary(days int) Summary {
|
||||
if days <= 0 {
|
||||
days = 30
|
||||
}
|
||||
|
||||
now := time.Now()
|
||||
cutoff := now.AddDate(0, 0, -days)
|
||||
|
||||
s.mu.RLock()
|
||||
events := make([]UsageEvent, 0, len(s.events))
|
||||
for _, e := range s.events {
|
||||
if !e.Timestamp.Before(cutoff) {
|
||||
events = append(events, e)
|
||||
}
|
||||
}
|
||||
s.mu.RUnlock()
|
||||
|
||||
type pmKey struct {
|
||||
provider string
|
||||
model string
|
||||
}
|
||||
|
||||
pmTotals := make(map[pmKey]*ProviderModelSummary)
|
||||
dailyTotals := make(map[string]*DailySummary)
|
||||
|
||||
var totalInput, totalOutput int64
|
||||
|
||||
for _, e := range events {
|
||||
provider := e.Provider
|
||||
model := normalizeModel(provider, e.RequestModel, e.ResponseModel)
|
||||
|
||||
k := pmKey{provider: provider, model: model}
|
||||
pm := pmTotals[k]
|
||||
if pm == nil {
|
||||
pm = &ProviderModelSummary{Provider: provider, Model: model}
|
||||
pmTotals[k] = pm
|
||||
}
|
||||
pm.InputTokens += int64(e.InputTokens)
|
||||
pm.OutputTokens += int64(e.OutputTokens)
|
||||
|
||||
totalInput += int64(e.InputTokens)
|
||||
totalOutput += int64(e.OutputTokens)
|
||||
|
||||
date := e.Timestamp.Format("2006-01-02")
|
||||
ds := dailyTotals[date]
|
||||
if ds == nil {
|
||||
ds = &DailySummary{Date: date}
|
||||
dailyTotals[date] = ds
|
||||
}
|
||||
ds.InputTokens += int64(e.InputTokens)
|
||||
ds.OutputTokens += int64(e.OutputTokens)
|
||||
}
|
||||
|
||||
providerModels := make([]ProviderModelSummary, 0, len(pmTotals))
|
||||
for _, pm := range pmTotals {
|
||||
pm.TotalTokens = pm.InputTokens + pm.OutputTokens
|
||||
providerModels = append(providerModels, *pm)
|
||||
}
|
||||
sort.Slice(providerModels, func(i, j int) bool {
|
||||
if providerModels[i].Provider == providerModels[j].Provider {
|
||||
return providerModels[i].Model < providerModels[j].Model
|
||||
}
|
||||
return providerModels[i].Provider < providerModels[j].Provider
|
||||
})
|
||||
|
||||
daily := make([]DailySummary, 0, len(dailyTotals))
|
||||
for _, ds := range dailyTotals {
|
||||
ds.TotalTokens = ds.InputTokens + ds.OutputTokens
|
||||
daily = append(daily, *ds)
|
||||
}
|
||||
sort.Slice(daily, func(i, j int) bool {
|
||||
return daily[i].Date < daily[j].Date
|
||||
})
|
||||
|
||||
totals := ProviderModelSummary{
|
||||
Provider: "all",
|
||||
InputTokens: totalInput,
|
||||
OutputTokens: totalOutput,
|
||||
TotalTokens: totalInput + totalOutput,
|
||||
}
|
||||
|
||||
return Summary{
|
||||
Days: days,
|
||||
ProviderModels: providerModels,
|
||||
DailyTotals: daily,
|
||||
Totals: totals,
|
||||
}
|
||||
}
|
||||
|
||||
// Flush immediately writes any pending changes to persistence.
|
||||
func (s *Store) Flush() error {
|
||||
s.mu.Lock()
|
||||
if s.saveTimer != nil {
|
||||
s.saveTimer.Stop()
|
||||
}
|
||||
s.savePending = false
|
||||
events := make([]UsageEvent, len(s.events))
|
||||
copy(events, s.events)
|
||||
p := s.persistence
|
||||
s.mu.Unlock()
|
||||
|
||||
if p != nil {
|
||||
return p.SaveUsageHistory(events)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (s *Store) trimLocked(now time.Time) {
|
||||
if s.maxDays <= 0 {
|
||||
return
|
||||
}
|
||||
cutoff := now.AddDate(0, 0, -s.maxDays)
|
||||
filtered := s.events[:0]
|
||||
for _, e := range s.events {
|
||||
if !e.Timestamp.Before(cutoff) {
|
||||
filtered = append(filtered, e)
|
||||
}
|
||||
}
|
||||
s.events = filtered
|
||||
}
|
||||
|
||||
func (s *Store) scheduleSaveLocked() {
|
||||
if s.persistence == nil {
|
||||
return
|
||||
}
|
||||
|
||||
if s.saveTimer != nil {
|
||||
s.saveTimer.Stop()
|
||||
}
|
||||
|
||||
s.savePending = true
|
||||
s.saveTimer = time.AfterFunc(s.saveDebounce, func() {
|
||||
s.mu.Lock()
|
||||
if !s.savePending {
|
||||
s.mu.Unlock()
|
||||
return
|
||||
}
|
||||
s.savePending = false
|
||||
events := make([]UsageEvent, len(s.events))
|
||||
copy(events, s.events)
|
||||
p := s.persistence
|
||||
s.mu.Unlock()
|
||||
|
||||
if p != nil {
|
||||
if err := p.SaveUsageHistory(events); err != nil {
|
||||
log.Error().Err(err).Msg("Failed to save AI usage history")
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
func normalizeModel(provider, requestModel, responseModel string) string {
|
||||
if requestModel != "" {
|
||||
parts := strings.SplitN(requestModel, ":", 2)
|
||||
if len(parts) == 2 && parts[0] == provider {
|
||||
return parts[1]
|
||||
}
|
||||
return requestModel
|
||||
}
|
||||
if responseModel != "" {
|
||||
parts := strings.SplitN(responseModel, ":", 2)
|
||||
if len(parts) == 2 && parts[0] == provider {
|
||||
return parts[1]
|
||||
}
|
||||
return responseModel
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
// ProviderModelSummary is a rollup for a provider/model pair.
|
||||
type ProviderModelSummary struct {
|
||||
Provider string `json:"provider"`
|
||||
Model string `json:"model"`
|
||||
InputTokens int64 `json:"input_tokens"`
|
||||
OutputTokens int64 `json:"output_tokens"`
|
||||
TotalTokens int64 `json:"total_tokens"`
|
||||
}
|
||||
|
||||
// DailySummary is a rollup for a single day across all providers.
|
||||
type DailySummary struct {
|
||||
Date string `json:"date"`
|
||||
InputTokens int64 `json:"input_tokens"`
|
||||
OutputTokens int64 `json:"output_tokens"`
|
||||
TotalTokens int64 `json:"total_tokens"`
|
||||
}
|
||||
|
||||
// Summary is returned by the cost summary API.
|
||||
type Summary struct {
|
||||
Days int `json:"days"`
|
||||
ProviderModels []ProviderModelSummary `json:"provider_models"`
|
||||
DailyTotals []DailySummary `json:"daily_totals"`
|
||||
Totals ProviderModelSummary `json:"totals"`
|
||||
}
|
||||
113
internal/ai/cost/store_test.go
Normal file
113
internal/ai/cost/store_test.go
Normal file
|
|
@ -0,0 +1,113 @@
|
|||
package cost
|
||||
|
||||
import (
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
func TestSummaryGroupsByProviderModelAndDailyTotals(t *testing.T) {
|
||||
store := NewStore(90)
|
||||
now := time.Now()
|
||||
|
||||
day1 := now.Add(-24 * time.Hour)
|
||||
day2 := now.Add(-48 * time.Hour)
|
||||
|
||||
store.Record(UsageEvent{
|
||||
Timestamp: day1,
|
||||
Provider: "openai",
|
||||
RequestModel: "openai:gpt-4o",
|
||||
InputTokens: 100,
|
||||
OutputTokens: 50,
|
||||
UseCase: "chat",
|
||||
})
|
||||
store.Record(UsageEvent{
|
||||
Timestamp: day1,
|
||||
Provider: "openai",
|
||||
RequestModel: "openai:gpt-4o",
|
||||
InputTokens: 10,
|
||||
OutputTokens: 5,
|
||||
UseCase: "chat",
|
||||
})
|
||||
store.Record(UsageEvent{
|
||||
Timestamp: day2,
|
||||
Provider: "openai",
|
||||
RequestModel: "openai:gpt-4o-mini",
|
||||
InputTokens: 20,
|
||||
OutputTokens: 10,
|
||||
UseCase: "patrol",
|
||||
})
|
||||
store.Record(UsageEvent{
|
||||
Timestamp: now,
|
||||
Provider: "anthropic",
|
||||
RequestModel: "anthropic:claude-opus-4-5-20251101",
|
||||
InputTokens: 200,
|
||||
OutputTokens: 100,
|
||||
UseCase: "chat",
|
||||
})
|
||||
|
||||
summary := store.GetSummary(3)
|
||||
|
||||
if len(summary.ProviderModels) != 3 {
|
||||
t.Fatalf("expected 3 provider models, got %d", len(summary.ProviderModels))
|
||||
}
|
||||
|
||||
type key struct{ provider, model string }
|
||||
got := make(map[key]ProviderModelSummary)
|
||||
for _, pm := range summary.ProviderModels {
|
||||
got[key{pm.Provider, pm.Model}] = pm
|
||||
}
|
||||
|
||||
openaiGpt4o := got[key{"openai", "gpt-4o"}]
|
||||
if openaiGpt4o.InputTokens != 110 || openaiGpt4o.OutputTokens != 55 {
|
||||
t.Fatalf("openai gpt-4o tokens wrong: %+v", openaiGpt4o)
|
||||
}
|
||||
|
||||
openaiMini := got[key{"openai", "gpt-4o-mini"}]
|
||||
if openaiMini.InputTokens != 20 || openaiMini.OutputTokens != 10 {
|
||||
t.Fatalf("openai gpt-4o-mini tokens wrong: %+v", openaiMini)
|
||||
}
|
||||
|
||||
anthropicOpus := got[key{"anthropic", "claude-opus-4-5-20251101"}]
|
||||
if anthropicOpus.InputTokens != 200 || anthropicOpus.OutputTokens != 100 {
|
||||
t.Fatalf("anthropic opus tokens wrong: %+v", anthropicOpus)
|
||||
}
|
||||
|
||||
// Daily totals across all providers.
|
||||
dailyGot := make(map[string]DailySummary)
|
||||
for _, d := range summary.DailyTotals {
|
||||
dailyGot[d.Date] = d
|
||||
}
|
||||
|
||||
d1Key := day1.Format("2006-01-02")
|
||||
if dailyGot[d1Key].InputTokens != 110 || dailyGot[d1Key].OutputTokens != 55 {
|
||||
t.Fatalf("daily totals for %s wrong: %+v", d1Key, dailyGot[d1Key])
|
||||
}
|
||||
|
||||
d2Key := day2.Format("2006-01-02")
|
||||
if dailyGot[d2Key].InputTokens != 20 || dailyGot[d2Key].OutputTokens != 10 {
|
||||
t.Fatalf("daily totals for %s wrong: %+v", d2Key, dailyGot[d2Key])
|
||||
}
|
||||
|
||||
todayKey := now.Format("2006-01-02")
|
||||
if dailyGot[todayKey].InputTokens != 200 || dailyGot[todayKey].OutputTokens != 100 {
|
||||
t.Fatalf("daily totals for %s wrong: %+v", todayKey, dailyGot[todayKey])
|
||||
}
|
||||
}
|
||||
|
||||
func TestRetentionTrimsOldEvents(t *testing.T) {
|
||||
store := NewStore(1)
|
||||
old := time.Now().Add(-48 * time.Hour)
|
||||
|
||||
store.Record(UsageEvent{
|
||||
Timestamp: old,
|
||||
Provider: "openai",
|
||||
RequestModel: "openai:gpt-4o",
|
||||
InputTokens: 10,
|
||||
OutputTokens: 10,
|
||||
})
|
||||
|
||||
summary := store.GetSummary(7)
|
||||
if len(summary.ProviderModels) != 0 {
|
||||
t.Fatalf("expected old event to be trimmed, got %+v", summary.ProviderModels)
|
||||
}
|
||||
}
|
||||
61
internal/ai/cost_persistence.go
Normal file
61
internal/ai/cost_persistence.go
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
package ai
|
||||
|
||||
import (
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/ai/cost"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/config"
|
||||
)
|
||||
|
||||
// CostPersistenceAdapter bridges ConfigPersistence to cost.Persistence.
|
||||
type CostPersistenceAdapter struct {
|
||||
config *config.ConfigPersistence
|
||||
}
|
||||
|
||||
// NewCostPersistenceAdapter creates a new adapter.
|
||||
func NewCostPersistenceAdapter(cfg *config.ConfigPersistence) *CostPersistenceAdapter {
|
||||
return &CostPersistenceAdapter{config: cfg}
|
||||
}
|
||||
|
||||
// SaveUsageHistory saves usage events to disk via ConfigPersistence.
|
||||
func (a *CostPersistenceAdapter) SaveUsageHistory(events []cost.UsageEvent) error {
|
||||
records := make([]config.AIUsageEventRecord, len(events))
|
||||
for i, e := range events {
|
||||
records[i] = config.AIUsageEventRecord{
|
||||
Timestamp: e.Timestamp,
|
||||
Provider: e.Provider,
|
||||
RequestModel: e.RequestModel,
|
||||
ResponseModel: e.ResponseModel,
|
||||
UseCase: e.UseCase,
|
||||
InputTokens: e.InputTokens,
|
||||
OutputTokens: e.OutputTokens,
|
||||
TargetType: e.TargetType,
|
||||
TargetID: e.TargetID,
|
||||
FindingID: e.FindingID,
|
||||
}
|
||||
}
|
||||
return a.config.SaveAIUsageHistory(records)
|
||||
}
|
||||
|
||||
// LoadUsageHistory loads usage events from disk via ConfigPersistence.
|
||||
func (a *CostPersistenceAdapter) LoadUsageHistory() ([]cost.UsageEvent, error) {
|
||||
data, err := a.config.LoadAIUsageHistory()
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
events := make([]cost.UsageEvent, len(data.Events))
|
||||
for i, r := range data.Events {
|
||||
events[i] = cost.UsageEvent{
|
||||
Timestamp: r.Timestamp,
|
||||
Provider: r.Provider,
|
||||
RequestModel: r.RequestModel,
|
||||
ResponseModel: r.ResponseModel,
|
||||
UseCase: r.UseCase,
|
||||
InputTokens: r.InputTokens,
|
||||
OutputTokens: r.OutputTokens,
|
||||
TargetType: r.TargetType,
|
||||
TargetID: r.TargetID,
|
||||
FindingID: r.FindingID,
|
||||
}
|
||||
}
|
||||
return events, nil
|
||||
}
|
||||
85
internal/ai/metrics_history_adapter.go
Normal file
85
internal/ai/metrics_history_adapter.go
Normal file
|
|
@ -0,0 +1,85 @@
|
|||
package ai
|
||||
|
||||
import (
|
||||
"time"
|
||||
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/monitoring"
|
||||
)
|
||||
|
||||
// MetricsHistoryAdapter adapts monitoring.MetricsHistory to the MetricsHistoryProvider interface
|
||||
// This allows the patrol service to use the monitoring package's metrics history
|
||||
// without creating a direct package dependency
|
||||
type MetricsHistoryAdapter struct {
|
||||
history *monitoring.MetricsHistory
|
||||
}
|
||||
|
||||
// NewMetricsHistoryAdapter creates an adapter for the monitoring.MetricsHistory
|
||||
func NewMetricsHistoryAdapter(history *monitoring.MetricsHistory) *MetricsHistoryAdapter {
|
||||
if history == nil {
|
||||
return nil
|
||||
}
|
||||
return &MetricsHistoryAdapter{history: history}
|
||||
}
|
||||
|
||||
// GetNodeMetrics returns historical metrics for a node
|
||||
func (a *MetricsHistoryAdapter) GetNodeMetrics(nodeID string, metricType string, duration time.Duration) []MetricPoint {
|
||||
if a.history == nil {
|
||||
return nil
|
||||
}
|
||||
points := a.history.GetNodeMetrics(nodeID, metricType, duration)
|
||||
return convertMetricPoints(points)
|
||||
}
|
||||
|
||||
// GetGuestMetrics returns historical metrics for a guest
|
||||
func (a *MetricsHistoryAdapter) GetGuestMetrics(guestID string, metricType string, duration time.Duration) []MetricPoint {
|
||||
if a.history == nil {
|
||||
return nil
|
||||
}
|
||||
points := a.history.GetGuestMetrics(guestID, metricType, duration)
|
||||
return convertMetricPoints(points)
|
||||
}
|
||||
|
||||
// GetAllGuestMetrics returns all metrics for a guest
|
||||
func (a *MetricsHistoryAdapter) GetAllGuestMetrics(guestID string, duration time.Duration) map[string][]MetricPoint {
|
||||
if a.history == nil {
|
||||
return nil
|
||||
}
|
||||
metricsMap := a.history.GetAllGuestMetrics(guestID, duration)
|
||||
return convertMetricsMap(metricsMap)
|
||||
}
|
||||
|
||||
// GetAllStorageMetrics returns all metrics for storage
|
||||
func (a *MetricsHistoryAdapter) GetAllStorageMetrics(storageID string, duration time.Duration) map[string][]MetricPoint {
|
||||
if a.history == nil {
|
||||
return nil
|
||||
}
|
||||
metricsMap := a.history.GetAllStorageMetrics(storageID, duration)
|
||||
return convertMetricsMap(metricsMap)
|
||||
}
|
||||
|
||||
// convertMetricPoints converts from monitoring.MetricPoint to ai.MetricPoint
|
||||
func convertMetricPoints(points []monitoring.MetricPoint) []MetricPoint {
|
||||
if points == nil {
|
||||
return nil
|
||||
}
|
||||
result := make([]MetricPoint, len(points))
|
||||
for i, p := range points {
|
||||
result[i] = MetricPoint{
|
||||
Value: p.Value,
|
||||
Timestamp: p.Timestamp,
|
||||
}
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
||||
// convertMetricsMap converts a map of metric types to their points
|
||||
func convertMetricsMap(metricsMap map[string][]monitoring.MetricPoint) map[string][]MetricPoint {
|
||||
if metricsMap == nil {
|
||||
return nil
|
||||
}
|
||||
result := make(map[string][]MetricPoint, len(metricsMap))
|
||||
for key, points := range metricsMap {
|
||||
result[key] = convertMetricPoints(points)
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
|
@ -10,6 +10,7 @@ import (
|
|||
"sync"
|
||||
"time"
|
||||
|
||||
aicontext "github.com/rcourtman/pulse-go-rewrite/internal/ai/context"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/ai/knowledge"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/models"
|
||||
"github.com/rs/zerolog/log"
|
||||
|
|
@ -208,6 +209,7 @@ type PatrolService struct {
|
|||
config PatrolConfig
|
||||
findings *FindingsStore
|
||||
knowledgeStore *knowledge.Store // For per-resource notes in patrol context
|
||||
metricsHistory MetricsHistoryProvider // For trend analysis and predictions
|
||||
|
||||
// Cached thresholds (recalculated when thresholdProvider changes)
|
||||
thresholds PatrolThresholds
|
||||
|
|
@ -329,6 +331,15 @@ func (p *PatrolService) SetKnowledgeStore(store *knowledge.Store) {
|
|||
p.knowledgeStore = store
|
||||
}
|
||||
|
||||
// SetMetricsHistoryProvider sets the metrics history provider for enriched context
|
||||
// This enables the patrol service to compute trends and predictions based on historical data
|
||||
func (p *PatrolService) SetMetricsHistoryProvider(provider MetricsHistoryProvider) {
|
||||
p.mu.Lock()
|
||||
defer p.mu.Unlock()
|
||||
p.metricsHistory = provider
|
||||
log.Info().Msg("AI Patrol: Metrics history provider set for enriched context")
|
||||
}
|
||||
|
||||
// GetConfig returns the current patrol configuration
|
||||
func (p *PatrolService) GetConfig() PatrolConfig {
|
||||
p.mu.RLock()
|
||||
|
|
@ -1441,8 +1452,9 @@ func (p *PatrolService) runAIAnalysis(ctx context.Context, state models.StateSna
|
|||
return nil, fmt.Errorf("AI service not available")
|
||||
}
|
||||
|
||||
// Build infrastructure summary for the AI
|
||||
summary := p.buildInfrastructureSummary(state)
|
||||
// Build enriched infrastructure context with trends and predictions
|
||||
// Falls back to basic summary if metrics history is not available
|
||||
summary := p.buildEnrichedContext(state)
|
||||
if summary == "" {
|
||||
return nil, nil // Nothing to analyze
|
||||
}
|
||||
|
|
@ -1656,7 +1668,7 @@ func (p *PatrolService) buildInfrastructureSummary(state models.StateSnapshot) s
|
|||
dh.Hostname, dh.Status, len(dh.Containers)))
|
||||
for _, c := range dh.Containers {
|
||||
sb.WriteString(fmt.Sprintf(" - %s: State=%s, CPU=%.1f%%, Memory=%.1f%%, Restarts=%d\n",
|
||||
c.Name, c.State, c.CPUPercent, c.MemoryPercent, c.RestartCount))
|
||||
c.Name, c.State, c.CPUPercent, c.MemoryPercent, c.RestartCount))
|
||||
}
|
||||
}
|
||||
sb.WriteString("\n")
|
||||
|
|
@ -1665,6 +1677,141 @@ func (p *PatrolService) buildInfrastructureSummary(state models.StateSnapshot) s
|
|||
return sb.String()
|
||||
}
|
||||
|
||||
// buildEnrichedContext creates context with historical trends and predictions
|
||||
// Falls back to basic summary if metrics history is not available
|
||||
func (p *PatrolService) buildEnrichedContext(state models.StateSnapshot) string {
|
||||
p.mu.RLock()
|
||||
metricsHistory := p.metricsHistory
|
||||
knowledgeStore := p.knowledgeStore
|
||||
p.mu.RUnlock()
|
||||
|
||||
// If no metrics history, fall back to basic summary
|
||||
if metricsHistory == nil {
|
||||
log.Debug().Msg("AI Patrol: No metrics history available, using basic summary")
|
||||
return p.buildInfrastructureSummary(state)
|
||||
}
|
||||
|
||||
// Build enriched context using the context package
|
||||
builder := aicontext.NewBuilder().
|
||||
WithMetricsHistory(&metricsHistoryShim{provider: metricsHistory})
|
||||
|
||||
// Add knowledge store if available
|
||||
if knowledgeStore != nil {
|
||||
builder = builder.WithKnowledge(&knowledgeShim{store: knowledgeStore})
|
||||
}
|
||||
|
||||
// Build full infrastructure context with trends
|
||||
infraCtx := builder.BuildForInfrastructure(state)
|
||||
if infraCtx == nil {
|
||||
log.Warn().Msg("AI Patrol: Failed to build enriched context, falling back")
|
||||
return p.buildInfrastructureSummary(state)
|
||||
}
|
||||
|
||||
// Format for AI consumption
|
||||
formatted := aicontext.FormatInfrastructureContext(infraCtx)
|
||||
|
||||
log.Debug().
|
||||
Int("resources", infraCtx.TotalResources).
|
||||
Int("predictions", len(infraCtx.Predictions)).
|
||||
Msg("AI Patrol: Built enriched context with trends")
|
||||
|
||||
return formatted
|
||||
}
|
||||
|
||||
// metricsHistoryShim adapts ai.MetricsHistoryProvider to aicontext.MetricsHistoryProvider
|
||||
type metricsHistoryShim struct {
|
||||
provider MetricsHistoryProvider
|
||||
}
|
||||
|
||||
func (s *metricsHistoryShim) GetNodeMetrics(nodeID string, metricType string, duration time.Duration) []aicontext.MetricPoint {
|
||||
if s.provider == nil {
|
||||
return nil
|
||||
}
|
||||
points := s.provider.GetNodeMetrics(nodeID, metricType, duration)
|
||||
return convertToContextPoints(points)
|
||||
}
|
||||
|
||||
func (s *metricsHistoryShim) GetGuestMetrics(guestID string, metricType string, duration time.Duration) []aicontext.MetricPoint {
|
||||
if s.provider == nil {
|
||||
return nil
|
||||
}
|
||||
points := s.provider.GetGuestMetrics(guestID, metricType, duration)
|
||||
return convertToContextPoints(points)
|
||||
}
|
||||
|
||||
func (s *metricsHistoryShim) GetAllGuestMetrics(guestID string, duration time.Duration) map[string][]aicontext.MetricPoint {
|
||||
if s.provider == nil {
|
||||
return nil
|
||||
}
|
||||
metricsMap := s.provider.GetAllGuestMetrics(guestID, duration)
|
||||
return convertToContextMetricsMap(metricsMap)
|
||||
}
|
||||
|
||||
func (s *metricsHistoryShim) GetAllStorageMetrics(storageID string, duration time.Duration) map[string][]aicontext.MetricPoint {
|
||||
if s.provider == nil {
|
||||
return nil
|
||||
}
|
||||
metricsMap := s.provider.GetAllStorageMetrics(storageID, duration)
|
||||
return convertToContextMetricsMap(metricsMap)
|
||||
}
|
||||
|
||||
// knowledgeShim adapts knowledge.Store to aicontext.KnowledgeProvider
|
||||
type knowledgeShim struct {
|
||||
store *knowledge.Store
|
||||
}
|
||||
|
||||
func (k *knowledgeShim) GetNotes(guestID string) []string {
|
||||
if k.store == nil {
|
||||
return nil
|
||||
}
|
||||
knowledge, err := k.store.GetKnowledge(guestID)
|
||||
if err != nil || knowledge == nil {
|
||||
return nil
|
||||
}
|
||||
// Extract note contents
|
||||
var notes []string
|
||||
for _, note := range knowledge.Notes {
|
||||
notes = append(notes, note.Content)
|
||||
}
|
||||
return notes
|
||||
}
|
||||
|
||||
func (k *knowledgeShim) FormatAllForContext() string {
|
||||
if k.store == nil {
|
||||
return ""
|
||||
}
|
||||
return k.store.FormatAllForContext()
|
||||
}
|
||||
|
||||
// convertToContextPoints converts ai.MetricPoint to aicontext.MetricPoint
|
||||
// Since both are aliases for types.MetricPoint, this is just a type assertion
|
||||
func convertToContextPoints(points []MetricPoint) []aicontext.MetricPoint {
|
||||
if points == nil {
|
||||
return nil
|
||||
}
|
||||
// Both types are aliases for types.MetricPoint, so they're compatible
|
||||
result := make([]aicontext.MetricPoint, len(points))
|
||||
for i, p := range points {
|
||||
result[i] = aicontext.MetricPoint{
|
||||
Value: p.Value,
|
||||
Timestamp: p.Timestamp,
|
||||
}
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
||||
// convertToContextMetricsMap converts a map of metric points
|
||||
func convertToContextMetricsMap(metricsMap map[string][]MetricPoint) map[string][]aicontext.MetricPoint {
|
||||
if metricsMap == nil {
|
||||
return nil
|
||||
}
|
||||
result := make(map[string][]aicontext.MetricPoint, len(metricsMap))
|
||||
for key, points := range metricsMap {
|
||||
result[key] = convertToContextPoints(points)
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
||||
// buildPatrolPrompt creates the prompt for AI analysis
|
||||
// Includes user feedback context to prevent re-raising dismissed findings
|
||||
func (p *PatrolService) buildPatrolPrompt(summary string) string {
|
||||
|
|
@ -1685,13 +1832,20 @@ func (p *PatrolService) buildPatrolPrompt(summary string) string {
|
|||
%s
|
||||
|
||||
Analyze the above and report any findings using the structured format. Focus on:
|
||||
- Resources showing high utilization
|
||||
- Patterns that might indicate problems
|
||||
- Resources showing high utilization or concerning trends (look for ↑ growing indicators)
|
||||
- Predictions showing resources approaching capacity (look for ⏰ predictions)
|
||||
- Anomalies flagged as unusual (look for ⚠️ anomalies)
|
||||
- Patterns that might indicate problems over time (compare 24h vs 7d trends)
|
||||
- Missing backups or stale backup schedules
|
||||
- Unbalanced resource distribution
|
||||
- Any anomalies or concerns
|
||||
|
||||
If everything looks healthy, say so briefly.`, summary)
|
||||
IMPORTANT: The context includes historical trends (24h and 7d) where available. Use this to provide actionable insights:
|
||||
- A resource that's "growing 5%%/day" needs proactive attention
|
||||
- A resource that's "stable" with high usage may just need monitoring
|
||||
- A "volatile" resource may indicate workload issues
|
||||
|
||||
If predictions show a resource will be full within 7 days, flag it as high priority.
|
||||
If everything looks healthy with stable trends, say so briefly.`, summary)
|
||||
|
||||
var contextAdditions strings.Builder
|
||||
|
||||
|
|
|
|||
|
|
@ -3,6 +3,7 @@ package ai
|
|||
import (
|
||||
"context"
|
||||
"encoding/base64"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"io"
|
||||
"net/http"
|
||||
|
|
@ -15,10 +16,12 @@ import (
|
|||
|
||||
"github.com/google/uuid"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/agentexec"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/ai/cost"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/ai/knowledge"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/ai/providers"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/config"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/models"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/types"
|
||||
"github.com/rs/zerolog/log"
|
||||
)
|
||||
|
||||
|
|
@ -38,6 +41,7 @@ type Service struct {
|
|||
stateProvider StateProvider
|
||||
alertProvider AlertProvider
|
||||
knowledgeStore *knowledge.Store
|
||||
costStore *cost.Store
|
||||
resourceProvider ResourceProvider // Unified resource model provider (Phase 2)
|
||||
patrolService *PatrolService // Background AI monitoring service
|
||||
metadataProvider MetadataProvider // Enables AI to update resource URLs
|
||||
|
|
@ -50,12 +54,16 @@ type Service struct {
|
|||
func NewService(persistence *config.ConfigPersistence, agentServer *agentexec.Server) *Service {
|
||||
// Initialize knowledge store
|
||||
var knowledgeStore *knowledge.Store
|
||||
costStore := cost.NewStore(cost.DefaultMaxDays)
|
||||
if persistence != nil {
|
||||
var err error
|
||||
knowledgeStore, err = knowledge.NewStore(persistence.DataDir())
|
||||
if err != nil {
|
||||
log.Warn().Err(err).Msg("Failed to initialize knowledge store")
|
||||
}
|
||||
if err := costStore.SetPersistence(NewCostPersistenceAdapter(persistence)); err != nil {
|
||||
log.Warn().Err(err).Msg("Failed to initialize AI usage cost store")
|
||||
}
|
||||
}
|
||||
|
||||
return &Service{
|
||||
|
|
@ -63,6 +71,7 @@ func NewService(persistence *config.ConfigPersistence, agentServer *agentexec.Se
|
|||
agentServer: agentServer,
|
||||
policy: agentexec.DefaultPolicy(),
|
||||
knowledgeStore: knowledgeStore,
|
||||
costStore: costStore,
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -108,6 +117,28 @@ func (s *Service) GetAIConfig() *config.AIConfig {
|
|||
return s.cfg
|
||||
}
|
||||
|
||||
// GetCostSummary returns usage rollups for the last N days.
|
||||
func (s *Service) GetCostSummary(days int) cost.Summary {
|
||||
s.mu.RLock()
|
||||
store := s.costStore
|
||||
s.mu.RUnlock()
|
||||
|
||||
if store == nil {
|
||||
if days <= 0 {
|
||||
days = 30
|
||||
}
|
||||
return cost.Summary{
|
||||
Days: days,
|
||||
ProviderModels: []cost.ProviderModelSummary{},
|
||||
DailyTotals: []cost.DailySummary{},
|
||||
Totals: cost.ProviderModelSummary{
|
||||
Provider: "all",
|
||||
},
|
||||
}
|
||||
}
|
||||
return store.GetSummary(days)
|
||||
}
|
||||
|
||||
// SetPatrolThresholdProvider sets the threshold provider for patrol
|
||||
// This should be called with an AlertThresholdAdapter to connect patrol to user-configured thresholds
|
||||
func (s *Service) SetPatrolThresholdProvider(provider ThresholdProvider) {
|
||||
|
|
@ -120,6 +151,30 @@ func (s *Service) SetPatrolThresholdProvider(provider ThresholdProvider) {
|
|||
}
|
||||
}
|
||||
|
||||
// MetricsHistoryProvider provides access to historical metrics for trend analysis
|
||||
// This interface matches the monitoring.MetricsHistory methods we need
|
||||
type MetricsHistoryProvider interface {
|
||||
GetNodeMetrics(nodeID string, metricType string, duration time.Duration) []MetricPoint
|
||||
GetGuestMetrics(guestID string, metricType string, duration time.Duration) []MetricPoint
|
||||
GetAllGuestMetrics(guestID string, duration time.Duration) map[string][]MetricPoint
|
||||
GetAllStorageMetrics(storageID string, duration time.Duration) map[string][]MetricPoint
|
||||
}
|
||||
|
||||
// MetricPoint is an alias for the shared metric point type
|
||||
type MetricPoint = types.MetricPoint
|
||||
|
||||
// SetMetricsHistoryProvider sets the metrics history provider for enriched AI context
|
||||
// This enables the AI to see trends, anomalies, and predictions based on historical data
|
||||
func (s *Service) SetMetricsHistoryProvider(provider MetricsHistoryProvider) {
|
||||
s.mu.RLock()
|
||||
patrol := s.patrolService
|
||||
s.mu.RUnlock()
|
||||
|
||||
if patrol != nil {
|
||||
patrol.SetMetricsHistoryProvider(provider)
|
||||
}
|
||||
}
|
||||
|
||||
// StartPatrol starts the background patrol service
|
||||
func (s *Service) StartPatrol(ctx context.Context) {
|
||||
s.mu.RLock()
|
||||
|
|
@ -325,6 +380,40 @@ func extractVMIDFromCommand(command string) (vmid int, requiresOwnerNode bool, f
|
|||
return 0, false, false
|
||||
}
|
||||
|
||||
// formatApprovalNeededToolResult returns a structured tool result for commands that require approval.
|
||||
// It is encoded as a marker + JSON so the LLM can reliably detect it.
|
||||
func formatApprovalNeededToolResult(command, toolID, reason string) string {
|
||||
payload := map[string]interface{}{
|
||||
"type": "approval_required",
|
||||
"command": command,
|
||||
"tool_id": toolID,
|
||||
"reason": reason,
|
||||
"how_to_approve": "Ask the user to click the approval button shown in the UI.",
|
||||
"do_not_retry": true,
|
||||
}
|
||||
b, err := json.Marshal(payload)
|
||||
if err != nil {
|
||||
// Fallback to a safe plain-text marker.
|
||||
return fmt.Sprintf("APPROVAL_REQUIRED: %s", command)
|
||||
}
|
||||
return "APPROVAL_REQUIRED: " + string(b)
|
||||
}
|
||||
|
||||
// formatPolicyBlockedToolResult returns a structured tool result for commands blocked by policy.
|
||||
func formatPolicyBlockedToolResult(command, reason string) string {
|
||||
payload := map[string]interface{}{
|
||||
"type": "policy_blocked",
|
||||
"command": command,
|
||||
"reason": reason,
|
||||
"do_not_retry": true,
|
||||
}
|
||||
b, err := json.Marshal(payload)
|
||||
if err != nil {
|
||||
return fmt.Sprintf("POLICY_BLOCKED: %s", reason)
|
||||
}
|
||||
return "POLICY_BLOCKED: " + string(b)
|
||||
}
|
||||
|
||||
// LoadConfig loads the AI configuration and initializes the provider
|
||||
func (s *Service) LoadConfig() error {
|
||||
s.mu.Lock()
|
||||
|
|
@ -343,17 +432,41 @@ func (s *Service) LoadConfig() error {
|
|||
return nil
|
||||
}
|
||||
|
||||
provider, err := providers.NewFromConfig(cfg)
|
||||
selectedModel := cfg.GetModel()
|
||||
selectedProvider, _ := config.ParseModelString(selectedModel)
|
||||
|
||||
providerClient, err := providers.NewForModel(cfg, selectedModel)
|
||||
if err != nil {
|
||||
log.Warn().Err(err).Msg("Failed to initialize AI provider")
|
||||
s.provider = nil
|
||||
return nil // Don't fail startup if provider can't be initialized
|
||||
// Only fall back to legacy config if no multi-provider credentials are set.
|
||||
if len(cfg.GetConfiguredProviders()) == 0 && (cfg.Provider != "" || cfg.APIKey != "") {
|
||||
if legacyClient, legacyErr := providers.NewFromConfig(cfg); legacyErr == nil {
|
||||
providerClient = legacyClient
|
||||
selectedProvider = providerClient.Name()
|
||||
log.Info().
|
||||
Str("provider", selectedProvider).
|
||||
Str("model", cfg.GetModel()).
|
||||
Msg("AI service initialized via legacy config (migration path)")
|
||||
} else {
|
||||
log.Warn().Err(legacyErr).Msg("Failed to initialize legacy AI provider")
|
||||
s.provider = nil
|
||||
return nil
|
||||
}
|
||||
} else {
|
||||
log.Warn().
|
||||
Err(err).
|
||||
Str("selected_model", selectedModel).
|
||||
Str("selected_provider", selectedProvider).
|
||||
Strs("configured_providers", cfg.GetConfiguredProviders()).
|
||||
Msg("AI enabled but selected provider is not configured; check API keys or model selection")
|
||||
s.provider = nil
|
||||
return nil
|
||||
}
|
||||
}
|
||||
|
||||
s.provider = provider
|
||||
s.provider = providerClient
|
||||
log.Info().
|
||||
Str("provider", cfg.Provider).
|
||||
Str("model", cfg.GetModel()).
|
||||
Str("provider", selectedProvider).
|
||||
Str("model", selectedModel).
|
||||
Bool("autonomous_mode", cfg.AutonomousMode).
|
||||
Msg("AI service initialized")
|
||||
|
||||
|
|
@ -400,7 +513,7 @@ func (s *Service) GetDebugContext(req ExecuteRequest) map[string]interface{} {
|
|||
"hosts": len(state.Hosts),
|
||||
"pbs_instances": len(state.PBSInstances),
|
||||
}
|
||||
|
||||
|
||||
// List some VMs/containers for verification
|
||||
var vmNames []string
|
||||
for _, vm := range state.VMs {
|
||||
|
|
@ -491,48 +604,48 @@ func isDangerousCommand(cmd string) bool {
|
|||
"unlink": true,
|
||||
"shred": true,
|
||||
// Disk/filesystem destructive operations
|
||||
"dd": true,
|
||||
"mkfs": true,
|
||||
"fdisk": true,
|
||||
"parted": true,
|
||||
"wipefs": true,
|
||||
"sgdisk": true,
|
||||
"gdisk": true,
|
||||
"zpool": true, // Allow reads but not modifications
|
||||
"zfs": true, // Allow reads but not modifications
|
||||
"lvremove": true,
|
||||
"vgremove": true,
|
||||
"pvremove": true,
|
||||
"dd": true,
|
||||
"mkfs": true,
|
||||
"fdisk": true,
|
||||
"parted": true,
|
||||
"wipefs": true,
|
||||
"sgdisk": true,
|
||||
"gdisk": true,
|
||||
"zpool": true, // Allow reads but not modifications
|
||||
"zfs": true, // Allow reads but not modifications
|
||||
"lvremove": true,
|
||||
"vgremove": true,
|
||||
"pvremove": true,
|
||||
// System state changes
|
||||
"reboot": true,
|
||||
"shutdown": true,
|
||||
"poweroff": true,
|
||||
"halt": true,
|
||||
"init": true,
|
||||
"systemctl": true, // could stop critical services
|
||||
"service": true,
|
||||
"reboot": true,
|
||||
"shutdown": true,
|
||||
"poweroff": true,
|
||||
"halt": true,
|
||||
"init": true,
|
||||
"systemctl": true, // could stop critical services
|
||||
"service": true,
|
||||
// User/permission changes
|
||||
"chmod": true,
|
||||
"chown": true,
|
||||
"useradd": true,
|
||||
"userdel": true,
|
||||
"passwd": true,
|
||||
"chmod": true,
|
||||
"chown": true,
|
||||
"useradd": true,
|
||||
"userdel": true,
|
||||
"passwd": true,
|
||||
// Package management
|
||||
"apt": true,
|
||||
"apt-get": true,
|
||||
"dpkg": true,
|
||||
"yum": true,
|
||||
"dnf": true,
|
||||
"pacman": true,
|
||||
"pip": true,
|
||||
"npm": true,
|
||||
"apt": true,
|
||||
"apt-get": true,
|
||||
"dpkg": true,
|
||||
"yum": true,
|
||||
"dnf": true,
|
||||
"pacman": true,
|
||||
"pip": true,
|
||||
"npm": true,
|
||||
// Proxmox destructive
|
||||
"vzdump": true,
|
||||
"vzrestore": true,
|
||||
"pveam": true,
|
||||
"vzdump": true,
|
||||
"vzrestore": true,
|
||||
"pveam": true,
|
||||
// Network changes
|
||||
"iptables": true,
|
||||
"nft": true,
|
||||
"iptables": true,
|
||||
"nft": true,
|
||||
"firewall-cmd": true,
|
||||
}
|
||||
|
||||
|
|
@ -569,7 +682,7 @@ func isDangerousCommand(cmd string) bool {
|
|||
}
|
||||
}
|
||||
}
|
||||
// Special case: allow read-only dpkg operations
|
||||
// Special case: allow read-only dpkg operations
|
||||
if baseCmd == "dpkg" {
|
||||
safeDpkgOps := []string{"-l", "--list", "-L", "--listfiles", "-s", "--status", "-S", "--search", "-p", "--print-avail", "--get-selections"}
|
||||
for _, safeOp := range safeDpkgOps {
|
||||
|
|
@ -698,14 +811,14 @@ func isReadOnlyCommand(cmd string) bool {
|
|||
|
||||
// ConversationMessage represents a message in conversation history
|
||||
type ConversationMessage struct {
|
||||
Role string `json:"role"` // "user" or "assistant"
|
||||
Role string `json:"role"` // "user" or "assistant"
|
||||
Content string `json:"content"`
|
||||
}
|
||||
|
||||
// ExecuteRequest represents a request to execute an AI prompt
|
||||
type ExecuteRequest struct {
|
||||
Prompt string `json:"prompt"`
|
||||
TargetType string `json:"target_type,omitempty"` // "host", "container", "vm", "node"
|
||||
TargetType string `json:"target_type,omitempty"` // "host", "container", "vm", "node"
|
||||
TargetID string `json:"target_id,omitempty"`
|
||||
Context map[string]interface{} `json:"context,omitempty"` // Current metrics, state, etc.
|
||||
SystemPrompt string `json:"system_prompt,omitempty"` // Override system prompt
|
||||
|
|
@ -717,18 +830,18 @@ type ExecuteRequest struct {
|
|||
|
||||
// ExecuteResponse represents the AI's response
|
||||
type ExecuteResponse struct {
|
||||
Content string `json:"content"`
|
||||
Model string `json:"model"`
|
||||
InputTokens int `json:"input_tokens"`
|
||||
OutputTokens int `json:"output_tokens"`
|
||||
Content string `json:"content"`
|
||||
Model string `json:"model"`
|
||||
InputTokens int `json:"input_tokens"`
|
||||
OutputTokens int `json:"output_tokens"`
|
||||
ToolCalls []ToolExecution `json:"tool_calls,omitempty"` // Commands that were executed
|
||||
}
|
||||
|
||||
// ToolExecution represents a tool that was executed during the AI conversation
|
||||
type ToolExecution struct {
|
||||
Name string `json:"name"`
|
||||
Input string `json:"input"` // Human-readable input (e.g., the command)
|
||||
Output string `json:"output"` // Result of execution
|
||||
Input string `json:"input"` // Human-readable input (e.g., the command)
|
||||
Output string `json:"output"` // Result of execution
|
||||
Success bool `json:"success"`
|
||||
}
|
||||
|
||||
|
|
@ -786,8 +899,8 @@ type ToolEndData struct {
|
|||
// ApprovalNeededData is sent when a command needs user approval
|
||||
type ApprovalNeededData struct {
|
||||
Command string `json:"command"`
|
||||
ToolID string `json:"tool_id"` // ID to reference when approving
|
||||
ToolName string `json:"tool_name"` // "run_command", "read_file", etc.
|
||||
ToolID string `json:"tool_id"` // ID to reference when approving
|
||||
ToolName string `json:"tool_name"` // "run_command", "read_file", etc.
|
||||
RunOnHost bool `json:"run_on_host"`
|
||||
TargetHost string `json:"target_host,omitempty"` // Explicit host to route to
|
||||
}
|
||||
|
|
@ -799,6 +912,7 @@ func (s *Service) Execute(ctx context.Context, req ExecuteRequest) (*ExecuteResp
|
|||
defaultProvider := s.provider
|
||||
agentServer := s.agentServer
|
||||
cfg := s.cfg
|
||||
costStore := s.costStore
|
||||
s.mu.RUnlock()
|
||||
|
||||
// Determine the model to use for this request
|
||||
|
|
@ -887,6 +1001,21 @@ Always execute the commands rather than telling the user how to do it.`
|
|||
return nil, fmt.Errorf("AI request failed: %w", err)
|
||||
}
|
||||
|
||||
if costStore != nil {
|
||||
costStore.Record(cost.UsageEvent{
|
||||
Timestamp: time.Now(),
|
||||
Provider: provider.Name(),
|
||||
RequestModel: modelString,
|
||||
ResponseModel: resp.Model,
|
||||
UseCase: req.UseCase,
|
||||
InputTokens: resp.InputTokens,
|
||||
OutputTokens: resp.OutputTokens,
|
||||
TargetType: req.TargetType,
|
||||
TargetID: req.TargetID,
|
||||
FindingID: req.FindingID,
|
||||
})
|
||||
}
|
||||
|
||||
totalInputTokens += resp.InputTokens
|
||||
totalOutputTokens += resp.OutputTokens
|
||||
model = resp.Model
|
||||
|
|
@ -938,6 +1067,7 @@ func (s *Service) ExecuteStream(ctx context.Context, req ExecuteRequest, callbac
|
|||
defaultProvider := s.provider
|
||||
agentServer := s.agentServer
|
||||
cfg := s.cfg
|
||||
costStore := s.costStore
|
||||
s.mu.RUnlock()
|
||||
|
||||
// Determine the model to use for this request
|
||||
|
|
@ -1058,6 +1188,21 @@ Always execute the commands rather than telling the user how to do it.`
|
|||
return nil, fmt.Errorf("AI request failed: %w", err)
|
||||
}
|
||||
|
||||
if costStore != nil {
|
||||
costStore.Record(cost.UsageEvent{
|
||||
Timestamp: time.Now(),
|
||||
Provider: provider.Name(),
|
||||
RequestModel: modelString,
|
||||
ResponseModel: resp.Model,
|
||||
UseCase: req.UseCase,
|
||||
InputTokens: resp.InputTokens,
|
||||
OutputTokens: resp.OutputTokens,
|
||||
TargetType: req.TargetType,
|
||||
TargetID: req.TargetID,
|
||||
FindingID: req.FindingID,
|
||||
})
|
||||
}
|
||||
|
||||
log.Debug().Int("iteration", iteration).Msg("AI provider returned successfully")
|
||||
|
||||
totalInputTokens += resp.InputTokens
|
||||
|
|
@ -1161,7 +1306,6 @@ Always execute the commands rather than telling the user how to do it.`
|
|||
}
|
||||
}
|
||||
|
||||
|
||||
var result string
|
||||
var execution ToolExecution
|
||||
|
||||
|
|
@ -1170,7 +1314,14 @@ Always execute the commands rather than telling the user how to do it.`
|
|||
// We'll break out of the loop after processing all tool calls
|
||||
// Note: We don't add to toolExecutions here because the approval_needed event
|
||||
// already tells the frontend to show the approval UI
|
||||
result = fmt.Sprintf("Awaiting user approval: %s", toolInput)
|
||||
cmd, _ := tc.Input["command"].(string)
|
||||
result = formatApprovalNeededToolResult(cmd, tc.ID, "Command requires user approval")
|
||||
execution = ToolExecution{
|
||||
Name: tc.Name,
|
||||
Input: toolInput,
|
||||
Output: result,
|
||||
Success: true, // Not an error; awaiting approval
|
||||
}
|
||||
} else {
|
||||
// Stream tool start event
|
||||
callback(StreamEvent{
|
||||
|
|
@ -1440,15 +1591,11 @@ func (s *Service) executeTool(ctx context.Context, req ExecuteRequest, tc provid
|
|||
if !s.IsAutonomous() {
|
||||
decision := s.policy.Evaluate(command)
|
||||
if decision == agentexec.PolicyBlock {
|
||||
execution.Output = "Error: This command is blocked by security policy"
|
||||
execution.Output = formatPolicyBlockedToolResult(command, "This command is blocked by security policy")
|
||||
return execution.Output, execution
|
||||
}
|
||||
if decision == agentexec.PolicyRequireApproval {
|
||||
// Direct the AI to tell the user about the approval button
|
||||
execution.Output = fmt.Sprintf("COMMAND_BLOCKED: This command (%s) requires user approval and was NOT executed. "+
|
||||
"An approval button has been displayed to the user. "+
|
||||
"DO NOT attempt to run this command again. "+
|
||||
"Tell the user to click the 'Run' button to execute it.", command)
|
||||
execution.Output = formatApprovalNeededToolResult(command, tc.ID, "Security policy requires approval")
|
||||
execution.Success = true // Not an error, just needs approval
|
||||
return execution.Output, execution
|
||||
}
|
||||
|
|
@ -1456,7 +1603,7 @@ func (s *Service) executeTool(ctx context.Context, req ExecuteRequest, tc provid
|
|||
|
||||
// Build execution request with proper targeting
|
||||
execReq := req
|
||||
|
||||
|
||||
// If target_host is explicitly specified by AI, use it for routing
|
||||
if targetHost != "" {
|
||||
// Ensure Context map exists
|
||||
|
|
@ -1477,7 +1624,7 @@ func (s *Service) executeTool(ctx context.Context, req ExecuteRequest, tc provid
|
|||
Str("command", command).
|
||||
Msg("AI explicitly specified target_host for command routing")
|
||||
}
|
||||
|
||||
|
||||
// If run_on_host is true, override the target type to run on host
|
||||
if runOnHost {
|
||||
log.Debug().
|
||||
|
|
@ -1576,7 +1723,7 @@ func (s *Service) executeTool(ctx context.Context, req ExecuteRequest, tc provid
|
|||
// Build the write command using base64 to safely handle any content
|
||||
// This avoids issues with special characters, quotes, newlines, etc.
|
||||
encoded := base64.StdEncoding.EncodeToString([]byte(content))
|
||||
|
||||
|
||||
var command string
|
||||
if appendMode {
|
||||
// Append mode: decode and append to file (no backup needed for append)
|
||||
|
|
@ -1591,7 +1738,7 @@ func (s *Service) executeTool(ctx context.Context, req ExecuteRequest, tc provid
|
|||
dir := filepath.Dir(path)
|
||||
tempFile := path + ".pulse-tmp"
|
||||
backupFile := path + ".bak"
|
||||
|
||||
|
||||
// Build a safe multi-step command:
|
||||
// - mkdir -p for parent dir
|
||||
// - if file exists, copy to .bak
|
||||
|
|
@ -1731,7 +1878,7 @@ func (s *Service) getGuestID(req ExecuteRequest) string {
|
|||
if req.TargetType == "" || req.TargetID == "" {
|
||||
return ""
|
||||
}
|
||||
|
||||
|
||||
// For Proxmox targets, include the node info
|
||||
// Format: instance-node-type-vmid or instance-targetid
|
||||
return fmt.Sprintf("%s-%s", req.TargetType, req.TargetID)
|
||||
|
|
@ -1811,11 +1958,11 @@ func sanitizeError(err error) error {
|
|||
if err == nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
|
||||
errMsg := err.Error()
|
||||
|
||||
|
||||
// Replace raw TCP connection details with generic message
|
||||
// e.g., "write tcp 192.168.0.123:7655->192.168.0.134:58004: i/o timeout"
|
||||
// e.g., "write tcp 192.168.0.123:7655->192.168.0.134:58004: i/o timeout"
|
||||
// becomes "connection to agent timed out"
|
||||
if strings.Contains(errMsg, "i/o timeout") {
|
||||
if strings.Contains(errMsg, "failed to send command") {
|
||||
|
|
@ -1823,22 +1970,22 @@ func sanitizeError(err error) error {
|
|||
}
|
||||
return fmt.Errorf("network timeout - the target may be unreachable")
|
||||
}
|
||||
|
||||
|
||||
// Replace "write tcp ... connection refused" style errors
|
||||
if strings.Contains(errMsg, "connection refused") {
|
||||
return fmt.Errorf("connection refused - the agent may not be running on the target host")
|
||||
}
|
||||
|
||||
|
||||
// Replace "no such host" errors
|
||||
if strings.Contains(errMsg, "no such host") {
|
||||
return fmt.Errorf("host not found - verify the hostname is correct and DNS is working")
|
||||
}
|
||||
|
||||
|
||||
// Replace "context deadline exceeded" with friendlier message
|
||||
if strings.Contains(errMsg, "context deadline exceeded") {
|
||||
return fmt.Errorf("operation timed out - the command may have taken too long")
|
||||
}
|
||||
|
||||
|
||||
return err
|
||||
}
|
||||
|
||||
|
|
@ -1850,7 +1997,7 @@ func (s *Service) executeOnAgent(ctx context.Context, req ExecuteRequest, comman
|
|||
|
||||
// Find the appropriate agent using robust routing
|
||||
agents := s.agentServer.GetConnectedAgents()
|
||||
|
||||
|
||||
// Use the new robust routing logic
|
||||
routeResult, err := s.routeToAgent(req, command, agents)
|
||||
if err != nil {
|
||||
|
|
@ -1870,7 +2017,7 @@ func (s *Service) executeOnAgent(ctx context.Context, req ExecuteRequest, comman
|
|||
}
|
||||
|
||||
agentID := routeResult.AgentID
|
||||
|
||||
|
||||
log.Debug().
|
||||
Str("agent_id", agentID).
|
||||
Str("agent_hostname", routeResult.AgentHostname).
|
||||
|
|
@ -1952,7 +2099,7 @@ type RunCommandRequest struct {
|
|||
Command string `json:"command"`
|
||||
TargetType string `json:"target_type"` // "host", "container", "vm"
|
||||
TargetID string `json:"target_id"`
|
||||
RunOnHost bool `json:"run_on_host"` // If true, run on host instead of target
|
||||
RunOnHost bool `json:"run_on_host"` // If true, run on host instead of target
|
||||
VMID string `json:"vmid,omitempty"`
|
||||
TargetHost string `json:"target_host,omitempty"` // Explicit host for routing
|
||||
}
|
||||
|
|
@ -1997,7 +2144,6 @@ func (s *Service) RunCommand(ctx context.Context, req RunCommandRequest) (*RunCo
|
|||
Msg("RunCommand using explicit target_host for routing")
|
||||
}
|
||||
|
||||
|
||||
output, err := s.executeOnAgent(ctx, execReq, req.Command)
|
||||
if err != nil {
|
||||
return &RunCommandResponse{
|
||||
|
|
@ -2132,7 +2278,6 @@ After install, enable and start the service:
|
|||
The latest version can be found at: https://api.github.com/repos/rcourtman/Pulse/releases/latest
|
||||
This is a 3-command job. Don't over-investigate.`
|
||||
|
||||
|
||||
// Add custom context from AI settings (user's infrastructure description)
|
||||
s.mu.RLock()
|
||||
cfg := s.cfg
|
||||
|
|
@ -2147,7 +2292,7 @@ This is a 3-command job. Don't over-investigate.`
|
|||
s.mu.RLock()
|
||||
hasResourceProvider := s.resourceProvider != nil
|
||||
s.mu.RUnlock()
|
||||
|
||||
|
||||
if hasResourceProvider {
|
||||
prompt += s.buildUnifiedResourceContext()
|
||||
} else {
|
||||
|
|
@ -2194,7 +2339,6 @@ This is a 3-command job. Don't over-investigate.`
|
|||
}
|
||||
}
|
||||
|
||||
|
||||
// Add any provided context in a structured way
|
||||
if len(req.Context) > 0 {
|
||||
prompt += "\n\n## Current Metrics and State"
|
||||
|
|
@ -2259,39 +2403,39 @@ This is a 3-command job. Don't over-investigate.`
|
|||
// formatContextKey converts snake_case keys to readable labels
|
||||
func formatContextKey(key string) string {
|
||||
replacements := map[string]string{
|
||||
"guestName": "Guest Name",
|
||||
"name": "Name",
|
||||
"type": "Type",
|
||||
"vmid": "VMID",
|
||||
"node": "PVE Node (host)",
|
||||
"guest_node": "PVE Node (host)",
|
||||
"status": "Status",
|
||||
"uptime": "Uptime",
|
||||
"cpu_usage": "CPU Usage",
|
||||
"cpu_cores": "CPU Cores",
|
||||
"memory_used": "Memory Used",
|
||||
"memory_total": "Memory Total",
|
||||
"memory_usage": "Memory Usage",
|
||||
"memory_balloon": "Memory Balloon",
|
||||
"swap_used": "Swap Used",
|
||||
"swap_total": "Swap Total",
|
||||
"disk_used": "Disk Used",
|
||||
"disk_total": "Disk Total",
|
||||
"disk_usage": "Disk Usage",
|
||||
"disk_read_rate": "Disk Read Rate",
|
||||
"disk_write_rate": "Disk Write Rate",
|
||||
"network_in_rate": "Network In Rate",
|
||||
"network_out_rate": "Network Out Rate",
|
||||
"backup_status": "Backup Status",
|
||||
"last_backup": "Last Backup",
|
||||
"days_since_backup": "Days Since Backup",
|
||||
"os_name": "OS Name",
|
||||
"os_version": "OS Version",
|
||||
"guest_agent": "Guest Agent",
|
||||
"ip_addresses": "IP Addresses",
|
||||
"tags": "Tags",
|
||||
"user_notes": "User Notes",
|
||||
"user_annotations": "User Annotations",
|
||||
"guestName": "Guest Name",
|
||||
"name": "Name",
|
||||
"type": "Type",
|
||||
"vmid": "VMID",
|
||||
"node": "PVE Node (host)",
|
||||
"guest_node": "PVE Node (host)",
|
||||
"status": "Status",
|
||||
"uptime": "Uptime",
|
||||
"cpu_usage": "CPU Usage",
|
||||
"cpu_cores": "CPU Cores",
|
||||
"memory_used": "Memory Used",
|
||||
"memory_total": "Memory Total",
|
||||
"memory_usage": "Memory Usage",
|
||||
"memory_balloon": "Memory Balloon",
|
||||
"swap_used": "Swap Used",
|
||||
"swap_total": "Swap Total",
|
||||
"disk_used": "Disk Used",
|
||||
"disk_total": "Disk Total",
|
||||
"disk_usage": "Disk Usage",
|
||||
"disk_read_rate": "Disk Read Rate",
|
||||
"disk_write_rate": "Disk Write Rate",
|
||||
"network_in_rate": "Network In Rate",
|
||||
"network_out_rate": "Network Out Rate",
|
||||
"backup_status": "Backup Status",
|
||||
"last_backup": "Last Backup",
|
||||
"days_since_backup": "Days Since Backup",
|
||||
"os_name": "OS Name",
|
||||
"os_version": "OS Version",
|
||||
"guest_agent": "Guest Agent",
|
||||
"ip_addresses": "IP Addresses",
|
||||
"tags": "Tags",
|
||||
"user_notes": "User Notes",
|
||||
"user_annotations": "User Annotations",
|
||||
}
|
||||
|
||||
if label, ok := replacements[key]; ok {
|
||||
|
|
@ -2474,4 +2618,3 @@ func providerDisplayName(provider string) string {
|
|||
func (s *Service) Reload() error {
|
||||
return s.LoadConfig()
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -12,13 +12,13 @@ import (
|
|||
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/agentexec"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/ai"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/ai/cost"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/ai/providers"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/config"
|
||||
"github.com/rcourtman/pulse-go-rewrite/internal/utils"
|
||||
"github.com/rs/zerolog/log"
|
||||
)
|
||||
|
||||
|
||||
// AISettingsHandler handles AI settings endpoints
|
||||
type AISettingsHandler struct {
|
||||
config *config.Config
|
||||
|
|
@ -91,6 +91,11 @@ func (h *AISettingsHandler) SetPatrolRunHistoryPersistence(persistence ai.Patrol
|
|||
return nil
|
||||
}
|
||||
|
||||
// SetMetricsHistoryProvider sets the metrics history provider for enriched AI context
|
||||
func (h *AISettingsHandler) SetMetricsHistoryProvider(provider ai.MetricsHistoryProvider) {
|
||||
h.aiService.SetMetricsHistoryProvider(provider)
|
||||
}
|
||||
|
||||
// StopPatrol stops the background AI patrol service
|
||||
func (h *AISettingsHandler) StopPatrol() {
|
||||
h.aiService.StopPatrol()
|
||||
|
|
@ -105,38 +110,38 @@ func (h *AISettingsHandler) GetAlertTriggeredAnalyzer() *ai.AlertTriggeredAnalyz
|
|||
// API keys are masked for security
|
||||
type AISettingsResponse struct {
|
||||
Enabled bool `json:"enabled"`
|
||||
Provider string `json:"provider"` // DEPRECATED: legacy single provider
|
||||
APIKeySet bool `json:"api_key_set"` // DEPRECATED: true if legacy API key is configured
|
||||
Provider string `json:"provider"` // DEPRECATED: legacy single provider
|
||||
APIKeySet bool `json:"api_key_set"` // DEPRECATED: true if legacy API key is configured
|
||||
Model string `json:"model"`
|
||||
ChatModel string `json:"chat_model,omitempty"` // Model for interactive chat (empty = use default)
|
||||
PatrolModel string `json:"patrol_model,omitempty"` // Model for patrol (empty = use default)
|
||||
BaseURL string `json:"base_url,omitempty"` // DEPRECATED: legacy base URL
|
||||
Configured bool `json:"configured"` // true if AI is ready to use
|
||||
AutonomousMode bool `json:"autonomous_mode"` // true if AI can execute without approval
|
||||
CustomContext string `json:"custom_context"` // user-provided infrastructure context
|
||||
Configured bool `json:"configured"` // true if AI is ready to use
|
||||
AutonomousMode bool `json:"autonomous_mode"` // true if AI can execute without approval
|
||||
CustomContext string `json:"custom_context"` // user-provided infrastructure context
|
||||
// OAuth fields for Claude Pro/Max subscription authentication
|
||||
AuthMethod string `json:"auth_method"` // "api_key" or "oauth"
|
||||
OAuthConnected bool `json:"oauth_connected"` // true if OAuth tokens are configured
|
||||
// Patrol settings for token efficiency
|
||||
PatrolSchedulePreset string `json:"patrol_schedule_preset"` // DEPRECATED: legacy preset
|
||||
PatrolIntervalMinutes int `json:"patrol_interval_minutes"` // Patrol interval in minutes (0 = disabled)
|
||||
AlertTriggeredAnalysis bool `json:"alert_triggered_analysis"` // true if AI analyzes when alerts fire
|
||||
AvailableModels []config.ModelInfo `json:"available_models"` // List of models for current provider
|
||||
PatrolSchedulePreset string `json:"patrol_schedule_preset"` // DEPRECATED: legacy preset
|
||||
PatrolIntervalMinutes int `json:"patrol_interval_minutes"` // Patrol interval in minutes (0 = disabled)
|
||||
AlertTriggeredAnalysis bool `json:"alert_triggered_analysis"` // true if AI analyzes when alerts fire
|
||||
AvailableModels []config.ModelInfo `json:"available_models"` // List of models for current provider
|
||||
// Multi-provider credentials - shows which providers are configured
|
||||
AnthropicConfigured bool `json:"anthropic_configured"` // true if Anthropic API key or OAuth is set
|
||||
OpenAIConfigured bool `json:"openai_configured"` // true if OpenAI API key is set
|
||||
DeepSeekConfigured bool `json:"deepseek_configured"` // true if DeepSeek API key is set
|
||||
OllamaConfigured bool `json:"ollama_configured"` // true (always available for attempt)
|
||||
OllamaBaseURL string `json:"ollama_base_url"` // Ollama server URL
|
||||
OpenAIBaseURL string `json:"openai_base_url,omitempty"` // Custom OpenAI base URL
|
||||
ConfiguredProviders []string `json:"configured_providers"` // List of provider names with credentials
|
||||
AnthropicConfigured bool `json:"anthropic_configured"` // true if Anthropic API key or OAuth is set
|
||||
OpenAIConfigured bool `json:"openai_configured"` // true if OpenAI API key is set
|
||||
DeepSeekConfigured bool `json:"deepseek_configured"` // true if DeepSeek API key is set
|
||||
OllamaConfigured bool `json:"ollama_configured"` // true (always available for attempt)
|
||||
OllamaBaseURL string `json:"ollama_base_url"` // Ollama server URL
|
||||
OpenAIBaseURL string `json:"openai_base_url,omitempty"` // Custom OpenAI base URL
|
||||
ConfiguredProviders []string `json:"configured_providers"` // List of provider names with credentials
|
||||
}
|
||||
|
||||
// AISettingsUpdateRequest is the request body for PUT /api/settings/ai
|
||||
type AISettingsUpdateRequest struct {
|
||||
Enabled *bool `json:"enabled,omitempty"`
|
||||
Provider *string `json:"provider,omitempty"` // DEPRECATED: use model selection instead
|
||||
APIKey *string `json:"api_key,omitempty"` // DEPRECATED: use per-provider keys
|
||||
Provider *string `json:"provider,omitempty"` // DEPRECATED: use model selection instead
|
||||
APIKey *string `json:"api_key,omitempty"` // DEPRECATED: use per-provider keys
|
||||
Model *string `json:"model,omitempty"`
|
||||
ChatModel *string `json:"chat_model,omitempty"` // Model for interactive chat
|
||||
PatrolModel *string `json:"patrol_model,omitempty"` // Model for background patrol
|
||||
|
|
@ -582,9 +587,9 @@ func (h *AISettingsHandler) HandleListModels(w http.ResponseWriter, r *http.Requ
|
|||
}
|
||||
|
||||
type Response struct {
|
||||
Models []ModelInfo `json:"models"`
|
||||
Error string `json:"error,omitempty"`
|
||||
Cached bool `json:"cached"`
|
||||
Models []ModelInfo `json:"models"`
|
||||
Error string `json:"error,omitempty"`
|
||||
Cached bool `json:"cached"`
|
||||
}
|
||||
|
||||
models, err := h.aiService.ListModels(ctx)
|
||||
|
|
@ -622,25 +627,25 @@ func (h *AISettingsHandler) HandleListModels(w http.ResponseWriter, r *http.Requ
|
|||
// AIExecuteRequest is the request body for POST /api/ai/execute
|
||||
// AIConversationMessage represents a message in conversation history
|
||||
type AIConversationMessage struct {
|
||||
Role string `json:"role"` // "user" or "assistant"
|
||||
Role string `json:"role"` // "user" or "assistant"
|
||||
Content string `json:"content"`
|
||||
}
|
||||
|
||||
type AIExecuteRequest struct {
|
||||
Prompt string `json:"prompt"`
|
||||
TargetType string `json:"target_type,omitempty"` // "host", "container", "vm", "node"
|
||||
TargetID string `json:"target_id,omitempty"`
|
||||
Context map[string]interface{} `json:"context,omitempty"` // Current metrics, state, etc.
|
||||
History []AIConversationMessage `json:"history,omitempty"` // Previous conversation messages
|
||||
Prompt string `json:"prompt"`
|
||||
TargetType string `json:"target_type,omitempty"` // "host", "container", "vm", "node"
|
||||
TargetID string `json:"target_id,omitempty"`
|
||||
Context map[string]interface{} `json:"context,omitempty"` // Current metrics, state, etc.
|
||||
History []AIConversationMessage `json:"history,omitempty"` // Previous conversation messages
|
||||
}
|
||||
|
||||
// AIExecuteResponse is the response from POST /api/ai/execute
|
||||
type AIExecuteResponse struct {
|
||||
Content string `json:"content"`
|
||||
Model string `json:"model"`
|
||||
InputTokens int `json:"input_tokens"`
|
||||
OutputTokens int `json:"output_tokens"`
|
||||
ToolCalls []ai.ToolExecution `json:"tool_calls,omitempty"` // Commands that were executed
|
||||
Content string `json:"content"`
|
||||
Model string `json:"model"`
|
||||
InputTokens int `json:"input_tokens"`
|
||||
OutputTokens int `json:"output_tokens"`
|
||||
ToolCalls []ai.ToolExecution `json:"tool_calls,omitempty"` // Commands that were executed
|
||||
}
|
||||
|
||||
// HandleExecute executes an AI prompt (POST /api/ai/execute)
|
||||
|
|
@ -935,7 +940,6 @@ type AIRunCommandRequest struct {
|
|||
TargetHost string `json:"target_host,omitempty"` // Explicit host for routing
|
||||
}
|
||||
|
||||
|
||||
// HandleRunCommand executes a single approved command (POST /api/ai/run-command)
|
||||
func (h *AISettingsHandler) HandleRunCommand(w http.ResponseWriter, r *http.Request) {
|
||||
if r.Method != http.MethodPost {
|
||||
|
|
@ -957,7 +961,7 @@ func (h *AISettingsHandler) HandleRunCommand(w http.ResponseWriter, r *http.Requ
|
|||
return
|
||||
}
|
||||
log.Debug().Str("body", string(bodyBytes)).Msg("run-command request body")
|
||||
|
||||
|
||||
var req AIRunCommandRequest
|
||||
if err := json.Unmarshal(bodyBytes, &req); err != nil {
|
||||
log.Error().Err(err).Str("body", string(bodyBytes)).Msg("Failed to decode JSON body")
|
||||
|
|
@ -2059,7 +2063,7 @@ func (h *AISettingsHandler) HandleAcknowledgeFinding(w http.ResponseWriter, r *h
|
|||
}
|
||||
|
||||
findings := patrol.GetFindings()
|
||||
|
||||
|
||||
// Just acknowledge - don't resolve. Finding stays visible but marked as seen.
|
||||
// Auto-resolve will remove it when the underlying condition clears.
|
||||
if !findings.Acknowledge(req.FindingID) {
|
||||
|
|
@ -2126,7 +2130,7 @@ func (h *AISettingsHandler) HandleSnoozeFinding(w http.ResponseWriter, r *http.R
|
|||
|
||||
findings := patrol.GetFindings()
|
||||
duration := time.Duration(req.DurationHours) * time.Hour
|
||||
|
||||
|
||||
if !findings.Snooze(req.FindingID, duration) {
|
||||
http.Error(w, "Finding not found or already resolved", http.StatusNotFound)
|
||||
return
|
||||
|
|
@ -2180,7 +2184,7 @@ func (h *AISettingsHandler) HandleResolveFinding(w http.ResponseWriter, r *http.
|
|||
}
|
||||
|
||||
findings := patrol.GetFindings()
|
||||
|
||||
|
||||
// Mark as manually resolved (auto=false since user did it)
|
||||
if !findings.Resolve(req.FindingID, false) {
|
||||
http.Error(w, "Finding not found or already resolved", http.StatusNotFound)
|
||||
|
|
@ -2223,8 +2227,8 @@ func (h *AISettingsHandler) HandleDismissFinding(w http.ResponseWriter, r *http.
|
|||
|
||||
var req struct {
|
||||
FindingID string `json:"finding_id"`
|
||||
Reason string `json:"reason"` // "not_an_issue", "expected_behavior", "will_fix_later"
|
||||
Note string `json:"note"` // Optional freeform note
|
||||
Reason string `json:"reason"` // "not_an_issue", "expected_behavior", "will_fix_later"
|
||||
Note string `json:"note"` // Optional freeform note
|
||||
}
|
||||
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
|
||||
http.Error(w, "Invalid request body", http.StatusBadRequest)
|
||||
|
|
@ -2248,7 +2252,7 @@ func (h *AISettingsHandler) HandleDismissFinding(w http.ResponseWriter, r *http.
|
|||
}
|
||||
|
||||
findings := patrol.GetFindings()
|
||||
|
||||
|
||||
if !findings.Dismiss(req.FindingID, req.Reason, req.Note) {
|
||||
http.Error(w, "Finding not found", http.StatusNotFound)
|
||||
return
|
||||
|
|
@ -2303,7 +2307,7 @@ func (h *AISettingsHandler) HandleSuppressFinding(w http.ResponseWriter, r *http
|
|||
}
|
||||
|
||||
findings := patrol.GetFindings()
|
||||
|
||||
|
||||
if !findings.Suppress(req.FindingID) {
|
||||
http.Error(w, "Finding not found", http.StatusNotFound)
|
||||
return
|
||||
|
|
@ -2392,6 +2396,40 @@ func (h *AISettingsHandler) HandleGetPatrolRunHistory(w http.ResponseWriter, r *
|
|||
}
|
||||
}
|
||||
|
||||
// HandleGetAICostSummary returns AI usage rollups (GET /api/ai/cost/summary?days=N).
|
||||
func (h *AISettingsHandler) HandleGetAICostSummary(w http.ResponseWriter, r *http.Request) {
|
||||
if r.Method != http.MethodGet {
|
||||
http.Error(w, "Method not allowed", http.StatusMethodNotAllowed)
|
||||
return
|
||||
}
|
||||
|
||||
// Parse optional days query parameter (default: 30, max: 365)
|
||||
days := 30
|
||||
if daysStr := r.URL.Query().Get("days"); daysStr != "" {
|
||||
if _, err := fmt.Sscanf(daysStr, "%d", &days); err == nil && days > 0 {
|
||||
if days > 365 {
|
||||
days = 365
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
var summary cost.Summary
|
||||
if h.aiService != nil {
|
||||
summary = h.aiService.GetCostSummary(days)
|
||||
} else {
|
||||
summary = cost.Summary{
|
||||
Days: days,
|
||||
ProviderModels: []cost.ProviderModelSummary{},
|
||||
DailyTotals: []cost.DailySummary{},
|
||||
Totals: cost.ProviderModelSummary{Provider: "all"},
|
||||
}
|
||||
}
|
||||
|
||||
if err := utils.WriteJSONResponse(w, summary); err != nil {
|
||||
log.Error().Err(err).Msg("Failed to write AI cost summary response")
|
||||
}
|
||||
}
|
||||
|
||||
// HandleGetSuppressionRules returns all suppression rules (GET /api/ai/patrol/suppressions)
|
||||
func (h *AISettingsHandler) HandleGetSuppressionRules(w http.ResponseWriter, r *http.Request) {
|
||||
if r.Method != http.MethodGet {
|
||||
|
|
@ -2427,7 +2465,7 @@ func (h *AISettingsHandler) HandleAddSuppressionRule(w http.ResponseWriter, r *h
|
|||
return
|
||||
}
|
||||
|
||||
// Require authentication
|
||||
// Require authentication
|
||||
if !CheckAuth(h.config, w, r) {
|
||||
return
|
||||
}
|
||||
|
|
@ -2523,7 +2561,7 @@ func (h *AISettingsHandler) HandleDeleteSuppressionRule(w http.ResponseWriter, r
|
|||
}
|
||||
|
||||
findings := patrol.GetFindings()
|
||||
|
||||
|
||||
if !findings.DeleteSuppressionRule(ruleID) {
|
||||
http.Error(w, "Rule not found", http.StatusNotFound)
|
||||
return
|
||||
|
|
|
|||
|
|
@ -61,6 +61,7 @@ type Router struct {
|
|||
updateManager *updates.Manager
|
||||
updateHistory *updates.UpdateHistory
|
||||
exportLimiter *RateLimiter
|
||||
downloadLimiter *RateLimiter
|
||||
persistence *config.ConfigPersistence
|
||||
oidcMu sync.Mutex
|
||||
oidcService *OIDCService
|
||||
|
|
@ -75,6 +76,8 @@ type Router struct {
|
|||
publicURLDetected bool
|
||||
bootstrapTokenHash string
|
||||
bootstrapTokenPath string
|
||||
checksumMu sync.RWMutex
|
||||
checksumCache map[string]checksumCacheEntry
|
||||
}
|
||||
|
||||
func pulseBinDir() string {
|
||||
|
|
@ -124,17 +127,19 @@ func NewRouter(cfg *config.Config, monitor *monitoring.Monitor, wsHub *websocket
|
|||
updateManager.SetHistory(updateHistory)
|
||||
|
||||
r := &Router{
|
||||
mux: http.NewServeMux(),
|
||||
config: cfg,
|
||||
monitor: monitor,
|
||||
wsHub: wsHub,
|
||||
reloadFunc: reloadFunc,
|
||||
updateManager: updateManager,
|
||||
updateHistory: updateHistory,
|
||||
exportLimiter: NewRateLimiter(5, 1*time.Minute), // 5 attempts per minute
|
||||
persistence: config.NewConfigPersistence(cfg.DataPath),
|
||||
serverVersion: strings.TrimSpace(serverVersion),
|
||||
projectRoot: projectRoot,
|
||||
mux: http.NewServeMux(),
|
||||
config: cfg,
|
||||
monitor: monitor,
|
||||
wsHub: wsHub,
|
||||
reloadFunc: reloadFunc,
|
||||
updateManager: updateManager,
|
||||
updateHistory: updateHistory,
|
||||
exportLimiter: NewRateLimiter(5, 1*time.Minute), // 5 attempts per minute
|
||||
downloadLimiter: NewRateLimiter(60, 1*time.Minute), // downloads/installers per minute per IP
|
||||
persistence: config.NewConfigPersistence(cfg.DataPath),
|
||||
serverVersion: strings.TrimSpace(serverVersion),
|
||||
projectRoot: projectRoot,
|
||||
checksumCache: make(map[string]checksumCacheEntry),
|
||||
}
|
||||
|
||||
r.initializeBootstrapToken()
|
||||
|
|
@ -1090,10 +1095,11 @@ func (r *Router) setupRoutes() {
|
|||
r.mux.HandleFunc("/api/ai/knowledge/clear", RequireAuth(r.config, r.aiSettingsHandler.HandleClearGuestKnowledge))
|
||||
r.mux.HandleFunc("/api/ai/debug/context", RequireAdmin(r.config, r.aiSettingsHandler.HandleDebugContext))
|
||||
r.mux.HandleFunc("/api/ai/agents", RequireAuth(r.config, r.aiSettingsHandler.HandleGetConnectedAgents))
|
||||
r.mux.HandleFunc("/api/ai/cost/summary", RequireAuth(r.config, r.aiSettingsHandler.HandleGetAICostSummary))
|
||||
// OAuth endpoints for Claude Pro/Max subscription authentication
|
||||
r.mux.HandleFunc("/api/ai/oauth/start", RequireAdmin(r.config, r.aiSettingsHandler.HandleOAuthStart))
|
||||
r.mux.HandleFunc("/api/ai/oauth/exchange", RequireAdmin(r.config, r.aiSettingsHandler.HandleOAuthExchange)) // Manual code input
|
||||
r.mux.HandleFunc("/api/ai/oauth/callback", r.aiSettingsHandler.HandleOAuthCallback) // Public - receives redirect from Anthropic
|
||||
r.mux.HandleFunc("/api/ai/oauth/callback", r.aiSettingsHandler.HandleOAuthCallback) // Public - receives redirect from Anthropic
|
||||
r.mux.HandleFunc("/api/ai/oauth/disconnect", RequireAdmin(r.config, r.aiSettingsHandler.HandleOAuthDisconnect))
|
||||
|
||||
// AI Patrol routes for background monitoring
|
||||
|
|
@ -1103,8 +1109,8 @@ func (r *Router) setupRoutes() {
|
|||
r.mux.HandleFunc("/api/ai/patrol/history", RequireAuth(r.config, r.aiSettingsHandler.HandleGetFindingsHistory))
|
||||
r.mux.HandleFunc("/api/ai/patrol/run", RequireAdmin(r.config, r.aiSettingsHandler.HandleForcePatrol))
|
||||
r.mux.HandleFunc("/api/ai/patrol/acknowledge", RequireAuth(r.config, r.aiSettingsHandler.HandleAcknowledgeFinding))
|
||||
r.mux.HandleFunc("/api/ai/patrol/dismiss", RequireAuth(r.config, r.aiSettingsHandler.HandleDismissFinding)) // Dismiss with reason (LLM memory)
|
||||
r.mux.HandleFunc("/api/ai/patrol/suppress", RequireAuth(r.config, r.aiSettingsHandler.HandleSuppressFinding)) // Permanently suppress (LLM memory)
|
||||
r.mux.HandleFunc("/api/ai/patrol/dismiss", RequireAuth(r.config, r.aiSettingsHandler.HandleDismissFinding)) // Dismiss with reason (LLM memory)
|
||||
r.mux.HandleFunc("/api/ai/patrol/suppress", RequireAuth(r.config, r.aiSettingsHandler.HandleSuppressFinding)) // Permanently suppress (LLM memory)
|
||||
r.mux.HandleFunc("/api/ai/patrol/snooze", RequireAuth(r.config, r.aiSettingsHandler.HandleSnoozeFinding))
|
||||
r.mux.HandleFunc("/api/ai/patrol/resolve", RequireAuth(r.config, r.aiSettingsHandler.HandleResolveFinding))
|
||||
r.mux.HandleFunc("/api/ai/patrol/runs", RequireAuth(r.config, r.aiSettingsHandler.HandleGetPatrolRunHistory))
|
||||
|
|
@ -1125,23 +1131,23 @@ func (r *Router) setupRoutes() {
|
|||
// Agent WebSocket for AI command execution
|
||||
r.mux.HandleFunc("/api/agent/ws", r.handleAgentWebSocket)
|
||||
|
||||
// Docker agent download endpoints
|
||||
r.mux.HandleFunc("/install-docker-agent.sh", r.handleDownloadInstallScript) // Serves the Docker agent install script
|
||||
r.mux.HandleFunc("/install-container-agent.sh", r.handleDownloadContainerAgentInstallScript)
|
||||
r.mux.HandleFunc("/download/pulse-docker-agent", r.handleDownloadAgent)
|
||||
// Docker agent download endpoints (public but rate limited)
|
||||
r.mux.HandleFunc("/install-docker-agent.sh", r.downloadLimiter.Middleware(r.handleDownloadInstallScript)) // Serves the Docker agent install script
|
||||
r.mux.HandleFunc("/install-container-agent.sh", r.downloadLimiter.Middleware(r.handleDownloadContainerAgentInstallScript))
|
||||
r.mux.HandleFunc("/download/pulse-docker-agent", r.downloadLimiter.Middleware(r.handleDownloadAgent))
|
||||
|
||||
// Host agent download endpoints
|
||||
r.mux.HandleFunc("/install-host-agent.sh", r.handleDownloadHostAgentInstallScript)
|
||||
r.mux.HandleFunc("/install-host-agent.ps1", r.handleDownloadHostAgentInstallScriptPS)
|
||||
r.mux.HandleFunc("/uninstall-host-agent.sh", r.handleDownloadHostAgentUninstallScript)
|
||||
r.mux.HandleFunc("/uninstall-host-agent.ps1", r.handleDownloadHostAgentUninstallScriptPS)
|
||||
r.mux.HandleFunc("/download/pulse-host-agent", r.handleDownloadHostAgent)
|
||||
r.mux.HandleFunc("/download/pulse-host-agent.sha256", r.handleDownloadHostAgent)
|
||||
// Host agent download endpoints (public but rate limited)
|
||||
r.mux.HandleFunc("/install-host-agent.sh", r.downloadLimiter.Middleware(r.handleDownloadHostAgentInstallScript))
|
||||
r.mux.HandleFunc("/install-host-agent.ps1", r.downloadLimiter.Middleware(r.handleDownloadHostAgentInstallScriptPS))
|
||||
r.mux.HandleFunc("/uninstall-host-agent.sh", r.downloadLimiter.Middleware(r.handleDownloadHostAgentUninstallScript))
|
||||
r.mux.HandleFunc("/uninstall-host-agent.ps1", r.downloadLimiter.Middleware(r.handleDownloadHostAgentUninstallScriptPS))
|
||||
r.mux.HandleFunc("/download/pulse-host-agent", r.downloadLimiter.Middleware(r.handleDownloadHostAgent))
|
||||
r.mux.HandleFunc("/download/pulse-host-agent.sha256", r.downloadLimiter.Middleware(r.handleDownloadHostAgent))
|
||||
|
||||
// Unified Agent endpoints
|
||||
r.mux.HandleFunc("/install.sh", r.handleDownloadUnifiedInstallScript)
|
||||
r.mux.HandleFunc("/install.ps1", r.handleDownloadUnifiedInstallScriptPS)
|
||||
r.mux.HandleFunc("/download/pulse-agent", r.handleDownloadUnifiedAgent)
|
||||
// Unified Agent endpoints (public but rate limited)
|
||||
r.mux.HandleFunc("/install.sh", r.downloadLimiter.Middleware(r.handleDownloadUnifiedInstallScript))
|
||||
r.mux.HandleFunc("/install.ps1", r.downloadLimiter.Middleware(r.handleDownloadUnifiedInstallScriptPS))
|
||||
r.mux.HandleFunc("/download/pulse-agent", r.downloadLimiter.Middleware(r.handleDownloadUnifiedAgent))
|
||||
|
||||
r.mux.HandleFunc("/api/agent/version", r.handleAgentVersion)
|
||||
r.mux.HandleFunc("/api/server/info", r.handleServerInfo)
|
||||
|
|
@ -1405,6 +1411,16 @@ func (r *Router) StartPatrol(ctx context.Context) {
|
|||
}
|
||||
}
|
||||
|
||||
// Connect patrol to metrics history for enriched context (trends, predictions)
|
||||
if r.monitor != nil {
|
||||
if metricsHistory := r.monitor.GetMetricsHistory(); metricsHistory != nil {
|
||||
adapter := ai.NewMetricsHistoryAdapter(metricsHistory)
|
||||
if adapter != nil {
|
||||
r.aiSettingsHandler.SetMetricsHistoryProvider(adapter)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
r.aiSettingsHandler.StartPatrol(ctx)
|
||||
}
|
||||
}
|
||||
|
|
@ -2620,13 +2636,13 @@ func (r *Router) handleState(w http.ResponseWriter, req *http.Request) {
|
|||
}
|
||||
|
||||
state := r.monitor.GetState()
|
||||
|
||||
|
||||
// Also populate the unified resource store (Phase 1 of unified architecture)
|
||||
// This runs on every state request to keep resources up-to-date
|
||||
if r.resourceHandlers != nil {
|
||||
r.resourceHandlers.PopulateFromSnapshot(state)
|
||||
}
|
||||
|
||||
|
||||
frontendState := state.ToFrontend()
|
||||
|
||||
if err := utils.WriteJSONResponse(w, frontendState); err != nil {
|
||||
|
|
@ -3771,26 +3787,19 @@ func (r *Router) handleDownloadAgent(w http.ResponseWriter, req *http.Request) {
|
|||
continue
|
||||
}
|
||||
|
||||
checksum, err := r.cachedSHA256(candidate, info)
|
||||
if err != nil {
|
||||
log.Error().Err(err).Str("path", candidate).Msg("Failed to compute docker agent checksum")
|
||||
continue
|
||||
}
|
||||
|
||||
file, err := os.Open(candidate)
|
||||
if err != nil {
|
||||
log.Error().Err(err).Str("path", candidate).Msg("Failed to open docker agent binary for download")
|
||||
continue
|
||||
}
|
||||
|
||||
hasher := sha256.New()
|
||||
if _, err := io.Copy(hasher, file); err != nil {
|
||||
file.Close()
|
||||
log.Error().Err(err).Str("path", candidate).Msg("Failed to hash docker agent binary")
|
||||
continue
|
||||
}
|
||||
|
||||
if _, err := file.Seek(0, io.SeekStart); err != nil {
|
||||
file.Close()
|
||||
log.Error().Err(err).Str("path", candidate).Msg("Failed to rewind docker agent binary")
|
||||
continue
|
||||
}
|
||||
|
||||
w.Header().Set("X-Checksum-Sha256", hex.EncodeToString(hasher.Sum(nil)))
|
||||
w.Header().Set("X-Checksum-Sha256", checksum)
|
||||
http.ServeContent(w, req, filepath.Base(candidate), info.ModTime(), file)
|
||||
file.Close()
|
||||
return
|
||||
|
|
@ -4041,22 +4050,73 @@ func sortedHostAgentKeys(missing map[string]agentbinaries.HostAgentBinary) []str
|
|||
return keys
|
||||
}
|
||||
|
||||
// serveChecksum computes and serves the SHA256 checksum of a file
|
||||
func (r *Router) serveChecksum(w http.ResponseWriter, filepath string) {
|
||||
file, err := os.Open(filepath)
|
||||
type checksumCacheEntry struct {
|
||||
checksum string
|
||||
modTime time.Time
|
||||
size int64
|
||||
}
|
||||
|
||||
func (r *Router) cachedSHA256(filePath string, info os.FileInfo) (string, error) {
|
||||
if filePath == "" {
|
||||
return "", fmt.Errorf("empty file path")
|
||||
}
|
||||
|
||||
if info == nil {
|
||||
var err error
|
||||
info, err = os.Stat(filePath)
|
||||
if err != nil {
|
||||
return "", err
|
||||
}
|
||||
}
|
||||
|
||||
r.checksumMu.RLock()
|
||||
entry, ok := r.checksumCache[filePath]
|
||||
r.checksumMu.RUnlock()
|
||||
if ok && entry.size == info.Size() && entry.modTime.Equal(info.ModTime()) {
|
||||
return entry.checksum, nil
|
||||
}
|
||||
|
||||
file, err := os.Open(filePath)
|
||||
if err != nil {
|
||||
http.Error(w, "Failed to open file", http.StatusInternalServerError)
|
||||
return
|
||||
return "", err
|
||||
}
|
||||
defer file.Close()
|
||||
|
||||
hasher := sha256.New()
|
||||
if _, err := io.Copy(hasher, file); err != nil {
|
||||
return "", err
|
||||
}
|
||||
|
||||
checksum := hex.EncodeToString(hasher.Sum(nil))
|
||||
|
||||
r.checksumMu.Lock()
|
||||
if r.checksumCache == nil {
|
||||
r.checksumCache = make(map[string]checksumCacheEntry)
|
||||
}
|
||||
r.checksumCache[filePath] = checksumCacheEntry{
|
||||
checksum: checksum,
|
||||
modTime: info.ModTime(),
|
||||
size: info.Size(),
|
||||
}
|
||||
r.checksumMu.Unlock()
|
||||
|
||||
return checksum, nil
|
||||
}
|
||||
|
||||
// serveChecksum computes and serves the SHA256 checksum of a file
|
||||
func (r *Router) serveChecksum(w http.ResponseWriter, filePath string) {
|
||||
info, err := os.Stat(filePath)
|
||||
if err != nil {
|
||||
http.Error(w, "Failed to stat file", http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
|
||||
checksum, err := r.cachedSHA256(filePath, info)
|
||||
if err != nil {
|
||||
http.Error(w, "Failed to compute checksum", http.StatusInternalServerError)
|
||||
return
|
||||
}
|
||||
|
||||
checksum := hex.EncodeToString(hasher.Sum(nil))
|
||||
w.Header().Set("Content-Type", "text/plain")
|
||||
fmt.Fprintf(w, "%s\n", checksum)
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,9 +1,6 @@
|
|||
package api
|
||||
|
||||
import (
|
||||
"crypto/sha256"
|
||||
"encoding/hex"
|
||||
"io"
|
||||
"net/http"
|
||||
"os"
|
||||
"path/filepath"
|
||||
|
|
@ -139,26 +136,19 @@ func (r *Router) handleDownloadUnifiedAgent(w http.ResponseWriter, req *http.Req
|
|||
continue
|
||||
}
|
||||
|
||||
checksum, err := r.cachedSHA256(candidate, info)
|
||||
if err != nil {
|
||||
log.Error().Err(err).Str("path", candidate).Msg("Failed to compute unified agent checksum")
|
||||
continue
|
||||
}
|
||||
|
||||
file, err := os.Open(candidate)
|
||||
if err != nil {
|
||||
log.Error().Err(err).Str("path", candidate).Msg("Failed to open unified agent binary for download")
|
||||
continue
|
||||
}
|
||||
|
||||
hasher := sha256.New()
|
||||
if _, err := io.Copy(hasher, file); err != nil {
|
||||
file.Close()
|
||||
log.Error().Err(err).Str("path", candidate).Msg("Failed to hash unified agent binary")
|
||||
continue
|
||||
}
|
||||
|
||||
if _, err := file.Seek(0, io.SeekStart); err != nil {
|
||||
file.Close()
|
||||
log.Error().Err(err).Str("path", candidate).Msg("Failed to rewind unified agent binary")
|
||||
continue
|
||||
}
|
||||
|
||||
w.Header().Set("X-Checksum-Sha256", hex.EncodeToString(hasher.Sum(nil)))
|
||||
w.Header().Set("X-Checksum-Sha256", checksum)
|
||||
http.ServeContent(w, req, filepath.Base(candidate), info.ModTime(), file)
|
||||
file.Close()
|
||||
return
|
||||
|
|
|
|||
59
internal/api/unified_agent_download_test.go
Normal file
59
internal/api/unified_agent_download_test.go
Normal file
|
|
@ -0,0 +1,59 @@
|
|||
package api
|
||||
|
||||
import (
|
||||
"crypto/sha256"
|
||||
"fmt"
|
||||
"net/http"
|
||||
"net/http/httptest"
|
||||
"os"
|
||||
"path/filepath"
|
||||
"strings"
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
func TestHandleDownloadUnifiedAgentSetsChecksumAndInvalidatesOnChange(t *testing.T) {
|
||||
binDir := setupTempPulseBin(t)
|
||||
filePath := filepath.Join(binDir, "pulse-agent-linux-amd64")
|
||||
|
||||
payload1 := []byte("agent-binary-v1")
|
||||
if err := os.WriteFile(filePath, payload1, 0o755); err != nil {
|
||||
t.Fatalf("failed to write test binary: %v", err)
|
||||
}
|
||||
|
||||
req1 := httptest.NewRequest(http.MethodGet, "/download/pulse-agent?arch=linux-amd64", nil)
|
||||
rr1 := httptest.NewRecorder()
|
||||
|
||||
router := &Router{checksumCache: make(map[string]checksumCacheEntry)}
|
||||
router.handleDownloadUnifiedAgent(rr1, req1)
|
||||
|
||||
if rr1.Code != http.StatusOK {
|
||||
t.Fatalf("expected 200 OK, got %d", rr1.Code)
|
||||
}
|
||||
|
||||
expected1 := fmt.Sprintf("%x", sha256.Sum256(payload1))
|
||||
if got := rr1.Header().Get("X-Checksum-Sha256"); got != expected1 {
|
||||
t.Fatalf("unexpected checksum header: got %q want %q", got, expected1)
|
||||
}
|
||||
|
||||
// Ensure modtime changes for invalidation.
|
||||
time.Sleep(10 * time.Millisecond)
|
||||
payload2 := []byte("agent-binary-v2")
|
||||
if err := os.WriteFile(filePath, payload2, 0o755); err != nil {
|
||||
t.Fatalf("failed to rewrite test binary: %v", err)
|
||||
}
|
||||
|
||||
req2 := httptest.NewRequest(http.MethodGet, "/download/pulse-agent?arch=linux-amd64", nil)
|
||||
rr2 := httptest.NewRecorder()
|
||||
router.handleDownloadUnifiedAgent(rr2, req2)
|
||||
|
||||
expected2 := fmt.Sprintf("%x", sha256.Sum256(payload2))
|
||||
if got := rr2.Header().Get("X-Checksum-Sha256"); got != expected2 {
|
||||
t.Fatalf("checksum did not update after file change: got %q want %q", got, expected2)
|
||||
}
|
||||
|
||||
if strings.TrimSpace(rr2.Body.String()) != string(payload2) {
|
||||
t.Fatalf("unexpected response body after update")
|
||||
}
|
||||
}
|
||||
|
||||
41
internal/config/ai_usage_persistence_test.go
Normal file
41
internal/config/ai_usage_persistence_test.go
Normal file
|
|
@ -0,0 +1,41 @@
|
|||
package config
|
||||
|
||||
import (
|
||||
"testing"
|
||||
"time"
|
||||
)
|
||||
|
||||
func TestSaveLoadAIUsageHistory(t *testing.T) {
|
||||
dir := t.TempDir()
|
||||
cp := NewConfigPersistence(dir)
|
||||
|
||||
now := time.Now()
|
||||
events := []AIUsageEventRecord{
|
||||
{
|
||||
Timestamp: now,
|
||||
Provider: "openai",
|
||||
RequestModel: "openai:gpt-4o",
|
||||
InputTokens: 123,
|
||||
OutputTokens: 45,
|
||||
UseCase: "chat",
|
||||
TargetType: "vm",
|
||||
TargetID: "vm-101",
|
||||
},
|
||||
}
|
||||
|
||||
if err := cp.SaveAIUsageHistory(events); err != nil {
|
||||
t.Fatalf("SaveAIUsageHistory: %v", err)
|
||||
}
|
||||
|
||||
loaded, err := cp.LoadAIUsageHistory()
|
||||
if err != nil {
|
||||
t.Fatalf("LoadAIUsageHistory: %v", err)
|
||||
}
|
||||
|
||||
if len(loaded.Events) != 1 {
|
||||
t.Fatalf("expected 1 event, got %d", len(loaded.Events))
|
||||
}
|
||||
if loaded.Events[0].Provider != "openai" || loaded.Events[0].InputTokens != 123 {
|
||||
t.Fatalf("loaded event mismatch: %+v", loaded.Events[0])
|
||||
}
|
||||
}
|
||||
|
|
@ -171,6 +171,7 @@ type DiscoveryConfig struct {
|
|||
EnvironmentOverride string `json:"environment_override,omitempty"`
|
||||
SubnetAllowlist []string `json:"subnet_allowlist,omitempty"`
|
||||
SubnetBlocklist []string `json:"subnet_blocklist,omitempty"`
|
||||
IPBlocklist []string `json:"ip_blocklist,omitempty"` // Individual IPs to skip (auto-populated with configured Proxmox hosts)
|
||||
MaxHostsPerScan int `json:"max_hosts_per_scan,omitempty"`
|
||||
MaxConcurrent int `json:"max_concurrent,omitempty"`
|
||||
EnableReverseDNS bool `json:"enable_reverse_dns"`
|
||||
|
|
@ -203,6 +204,9 @@ func CloneDiscoveryConfig(cfg DiscoveryConfig) DiscoveryConfig {
|
|||
if cfg.SubnetBlocklist != nil {
|
||||
clone.SubnetBlocklist = append([]string(nil), cfg.SubnetBlocklist...)
|
||||
}
|
||||
if cfg.IPBlocklist != nil {
|
||||
clone.IPBlocklist = append([]string(nil), cfg.IPBlocklist...)
|
||||
}
|
||||
return clone
|
||||
}
|
||||
|
||||
|
|
|
|||
|
|
@ -20,21 +20,22 @@ import (
|
|||
|
||||
// ConfigPersistence handles saving and loading configuration
|
||||
type ConfigPersistence struct {
|
||||
mu sync.RWMutex
|
||||
tx *importTransaction
|
||||
configDir string
|
||||
alertFile string
|
||||
emailFile string
|
||||
webhookFile string
|
||||
appriseFile string
|
||||
nodesFile string
|
||||
systemFile string
|
||||
oidcFile string
|
||||
apiTokensFile string
|
||||
aiFile string
|
||||
aiFindingsFile string
|
||||
aiPatrolRunsFile string
|
||||
crypto *crypto.CryptoManager
|
||||
mu sync.RWMutex
|
||||
tx *importTransaction
|
||||
configDir string
|
||||
alertFile string
|
||||
emailFile string
|
||||
webhookFile string
|
||||
appriseFile string
|
||||
nodesFile string
|
||||
systemFile string
|
||||
oidcFile string
|
||||
apiTokensFile string
|
||||
aiFile string
|
||||
aiFindingsFile string
|
||||
aiPatrolRunsFile string
|
||||
aiUsageHistoryFile string
|
||||
crypto *crypto.CryptoManager
|
||||
}
|
||||
|
||||
// NewConfigPersistence creates a new config persistence manager.
|
||||
|
|
@ -67,19 +68,20 @@ func newConfigPersistence(configDir string) (*ConfigPersistence, error) {
|
|||
}
|
||||
|
||||
cp := &ConfigPersistence{
|
||||
configDir: configDir,
|
||||
alertFile: filepath.Join(configDir, "alerts.json"),
|
||||
emailFile: filepath.Join(configDir, "email.enc"),
|
||||
webhookFile: filepath.Join(configDir, "webhooks.enc"),
|
||||
appriseFile: filepath.Join(configDir, "apprise.enc"),
|
||||
nodesFile: filepath.Join(configDir, "nodes.enc"),
|
||||
systemFile: filepath.Join(configDir, "system.json"),
|
||||
oidcFile: filepath.Join(configDir, "oidc.enc"),
|
||||
apiTokensFile: filepath.Join(configDir, "api_tokens.json"),
|
||||
aiFile: filepath.Join(configDir, "ai.enc"),
|
||||
aiFindingsFile: filepath.Join(configDir, "ai_findings.json"),
|
||||
aiPatrolRunsFile: filepath.Join(configDir, "ai_patrol_runs.json"),
|
||||
crypto: cryptoMgr,
|
||||
configDir: configDir,
|
||||
alertFile: filepath.Join(configDir, "alerts.json"),
|
||||
emailFile: filepath.Join(configDir, "email.enc"),
|
||||
webhookFile: filepath.Join(configDir, "webhooks.enc"),
|
||||
appriseFile: filepath.Join(configDir, "apprise.enc"),
|
||||
nodesFile: filepath.Join(configDir, "nodes.enc"),
|
||||
systemFile: filepath.Join(configDir, "system.json"),
|
||||
oidcFile: filepath.Join(configDir, "oidc.enc"),
|
||||
apiTokensFile: filepath.Join(configDir, "api_tokens.json"),
|
||||
aiFile: filepath.Join(configDir, "ai.enc"),
|
||||
aiFindingsFile: filepath.Join(configDir, "ai_findings.json"),
|
||||
aiPatrolRunsFile: filepath.Join(configDir, "ai_patrol_runs.json"),
|
||||
aiUsageHistoryFile: filepath.Join(configDir, "ai_usage_history.json"),
|
||||
crypto: cryptoMgr,
|
||||
}
|
||||
|
||||
log.Debug().
|
||||
|
|
@ -1382,24 +1384,24 @@ type AIFindingsData struct {
|
|||
|
||||
// AIFindingRecord is a persisted finding with full history
|
||||
type AIFindingRecord struct {
|
||||
ID string `json:"id"`
|
||||
Severity string `json:"severity"`
|
||||
Category string `json:"category"`
|
||||
ResourceID string `json:"resource_id"`
|
||||
ResourceName string `json:"resource_name"`
|
||||
ResourceType string `json:"resource_type"`
|
||||
Node string `json:"node,omitempty"`
|
||||
Title string `json:"title"`
|
||||
Description string `json:"description"`
|
||||
Recommendation string `json:"recommendation,omitempty"`
|
||||
Evidence string `json:"evidence,omitempty"`
|
||||
DetectedAt time.Time `json:"detected_at"`
|
||||
LastSeenAt time.Time `json:"last_seen_at"`
|
||||
ID string `json:"id"`
|
||||
Severity string `json:"severity"`
|
||||
Category string `json:"category"`
|
||||
ResourceID string `json:"resource_id"`
|
||||
ResourceName string `json:"resource_name"`
|
||||
ResourceType string `json:"resource_type"`
|
||||
Node string `json:"node,omitempty"`
|
||||
Title string `json:"title"`
|
||||
Description string `json:"description"`
|
||||
Recommendation string `json:"recommendation,omitempty"`
|
||||
Evidence string `json:"evidence,omitempty"`
|
||||
DetectedAt time.Time `json:"detected_at"`
|
||||
LastSeenAt time.Time `json:"last_seen_at"`
|
||||
ResolvedAt *time.Time `json:"resolved_at,omitempty"`
|
||||
AutoResolved bool `json:"auto_resolved"`
|
||||
AutoResolved bool `json:"auto_resolved"`
|
||||
AcknowledgedAt *time.Time `json:"acknowledged_at,omitempty"`
|
||||
SnoozedUntil *time.Time `json:"snoozed_until,omitempty"`
|
||||
AlertID string `json:"alert_id,omitempty"`
|
||||
AlertID string `json:"alert_id,omitempty"`
|
||||
}
|
||||
|
||||
// SaveAIFindings persists AI findings to disk
|
||||
|
|
@ -1474,26 +1476,26 @@ func (c *ConfigPersistence) LoadAIFindings() (*AIFindingsData, error) {
|
|||
|
||||
// PatrolRunHistoryData represents persisted patrol run history with metadata
|
||||
type PatrolRunHistoryData struct {
|
||||
Version int `json:"version"`
|
||||
LastSaved time.Time `json:"last_saved"`
|
||||
Runs []PatrolRunRecord `json:"runs"`
|
||||
Version int `json:"version"`
|
||||
LastSaved time.Time `json:"last_saved"`
|
||||
Runs []PatrolRunRecord `json:"runs"`
|
||||
}
|
||||
|
||||
// PatrolRunRecord represents a single patrol check run
|
||||
type PatrolRunRecord struct {
|
||||
ID string `json:"id"`
|
||||
StartedAt time.Time `json:"started_at"`
|
||||
CompletedAt time.Time `json:"completed_at"`
|
||||
DurationMs int64 `json:"duration_ms"`
|
||||
Type string `json:"type"` // "quick" or "deep"
|
||||
ResourcesChecked int `json:"resources_checked"`
|
||||
ID string `json:"id"`
|
||||
StartedAt time.Time `json:"started_at"`
|
||||
CompletedAt time.Time `json:"completed_at"`
|
||||
DurationMs int64 `json:"duration_ms"`
|
||||
Type string `json:"type"` // "quick" or "deep"
|
||||
ResourcesChecked int `json:"resources_checked"`
|
||||
// Breakdown by resource type
|
||||
NodesChecked int `json:"nodes_checked"`
|
||||
GuestsChecked int `json:"guests_checked"`
|
||||
DockerChecked int `json:"docker_checked"`
|
||||
StorageChecked int `json:"storage_checked"`
|
||||
HostsChecked int `json:"hosts_checked"`
|
||||
PBSChecked int `json:"pbs_checked"`
|
||||
NodesChecked int `json:"nodes_checked"`
|
||||
GuestsChecked int `json:"guests_checked"`
|
||||
DockerChecked int `json:"docker_checked"`
|
||||
StorageChecked int `json:"storage_checked"`
|
||||
HostsChecked int `json:"hosts_checked"`
|
||||
PBSChecked int `json:"pbs_checked"`
|
||||
// Findings from this run
|
||||
NewFindings int `json:"new_findings"`
|
||||
ExistingFindings int `json:"existing_findings"`
|
||||
|
|
@ -1508,6 +1510,96 @@ type PatrolRunRecord struct {
|
|||
OutputTokens int `json:"output_tokens,omitempty"` // Tokens received from AI
|
||||
}
|
||||
|
||||
// AIUsageHistoryData represents persisted AI usage history with metadata
|
||||
type AIUsageHistoryData struct {
|
||||
Version int `json:"version"`
|
||||
LastSaved time.Time `json:"last_saved"`
|
||||
Events []AIUsageEventRecord `json:"events"`
|
||||
}
|
||||
|
||||
// AIUsageEventRecord is a persisted usage event for an AI provider call.
|
||||
// This intentionally excludes prompt/response content for privacy.
|
||||
type AIUsageEventRecord struct {
|
||||
Timestamp time.Time `json:"timestamp"`
|
||||
Provider string `json:"provider"`
|
||||
RequestModel string `json:"request_model"`
|
||||
ResponseModel string `json:"response_model,omitempty"`
|
||||
UseCase string `json:"use_case,omitempty"` // "chat" or "patrol"
|
||||
InputTokens int `json:"input_tokens,omitempty"`
|
||||
OutputTokens int `json:"output_tokens,omitempty"`
|
||||
TargetType string `json:"target_type,omitempty"`
|
||||
TargetID string `json:"target_id,omitempty"`
|
||||
FindingID string `json:"finding_id,omitempty"`
|
||||
}
|
||||
|
||||
// SaveAIUsageHistory persists AI usage events to disk.
|
||||
func (c *ConfigPersistence) SaveAIUsageHistory(events []AIUsageEventRecord) error {
|
||||
c.mu.Lock()
|
||||
defer c.mu.Unlock()
|
||||
|
||||
if err := c.EnsureConfigDir(); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
data := AIUsageHistoryData{
|
||||
Version: 1,
|
||||
LastSaved: time.Now(),
|
||||
Events: events,
|
||||
}
|
||||
|
||||
jsonData, err := json.Marshal(data)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
if err := c.writeConfigFileLocked(c.aiUsageHistoryFile, jsonData, 0600); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
log.Debug().
|
||||
Str("file", c.aiUsageHistoryFile).
|
||||
Int("count", len(events)).
|
||||
Msg("AI usage history saved")
|
||||
return nil
|
||||
}
|
||||
|
||||
// LoadAIUsageHistory loads AI usage events from disk.
|
||||
func (c *ConfigPersistence) LoadAIUsageHistory() (*AIUsageHistoryData, error) {
|
||||
c.mu.RLock()
|
||||
defer c.mu.RUnlock()
|
||||
|
||||
data, err := os.ReadFile(c.aiUsageHistoryFile)
|
||||
if err != nil {
|
||||
if os.IsNotExist(err) {
|
||||
return &AIUsageHistoryData{
|
||||
Version: 1,
|
||||
Events: make([]AIUsageEventRecord, 0),
|
||||
}, nil
|
||||
}
|
||||
return nil, err
|
||||
}
|
||||
|
||||
var usageData AIUsageHistoryData
|
||||
if err := json.Unmarshal(data, &usageData); err != nil {
|
||||
log.Error().Err(err).Str("file", c.aiUsageHistoryFile).Msg("Failed to parse AI usage history file")
|
||||
return &AIUsageHistoryData{
|
||||
Version: 1,
|
||||
Events: make([]AIUsageEventRecord, 0),
|
||||
}, nil
|
||||
}
|
||||
|
||||
if usageData.Events == nil {
|
||||
usageData.Events = make([]AIUsageEventRecord, 0)
|
||||
}
|
||||
|
||||
log.Info().
|
||||
Str("file", c.aiUsageHistoryFile).
|
||||
Int("count", len(usageData.Events)).
|
||||
Time("last_saved", usageData.LastSaved).
|
||||
Msg("AI usage history loaded")
|
||||
return &usageData, nil
|
||||
}
|
||||
|
||||
// SavePatrolRunHistory persists patrol run history to disk
|
||||
func (c *ConfigPersistence) SavePatrolRunHistory(runs []PatrolRunRecord) error {
|
||||
c.mu.Lock()
|
||||
|
|
|
|||
|
|
@ -113,6 +113,19 @@ func ApplyConfigToProfile(profile *envdetect.EnvironmentProfile, cfg config.Disc
|
|||
if cfg.HTTPTimeout > 0 {
|
||||
profile.Policy.HTTPTimeout = time.Duration(cfg.HTTPTimeout) * time.Millisecond
|
||||
}
|
||||
|
||||
// Apply IP blocklist (individual IPs to skip, e.g. already-configured Proxmox hosts)
|
||||
for _, ipStr := range cfg.IPBlocklist {
|
||||
ipStr = strings.TrimSpace(ipStr)
|
||||
if ipStr == "" {
|
||||
continue
|
||||
}
|
||||
if ip := net.ParseIP(ipStr); ip != nil {
|
||||
profile.IPBlocklist = append(profile.IPBlocklist, ip)
|
||||
} else {
|
||||
profile.Warnings = append(profile.Warnings, fmt.Sprintf("Invalid IP in blocklist: %s", ipStr))
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func shouldPruneContainerNetworks(env envdetect.Environment) bool {
|
||||
|
|
|
|||
|
|
@ -3265,6 +3265,91 @@ func (m *Monitor) baseIntervalForInstanceType(instanceType InstanceType) time.Du
|
|||
}
|
||||
}
|
||||
|
||||
// getConfiguredHostIPs returns a list of IP addresses from all configured Proxmox hosts.
|
||||
// This is used to prevent discovery from probing hosts we already know about.
|
||||
// Caller must hold m.mu.RLock or m.mu.Lock.
|
||||
func (m *Monitor) getConfiguredHostIPs() []string {
|
||||
if m.config == nil {
|
||||
return nil
|
||||
}
|
||||
|
||||
seen := make(map[string]struct{})
|
||||
var ips []string
|
||||
|
||||
addHost := func(host string) {
|
||||
// Parse the host to extract IP/hostname
|
||||
host = strings.TrimSpace(host)
|
||||
if host == "" {
|
||||
return
|
||||
}
|
||||
// Remove scheme if present
|
||||
if strings.HasPrefix(host, "https://") {
|
||||
host = strings.TrimPrefix(host, "https://")
|
||||
} else if strings.HasPrefix(host, "http://") {
|
||||
host = strings.TrimPrefix(host, "http://")
|
||||
}
|
||||
// Remove port if present
|
||||
if colonIdx := strings.LastIndex(host, ":"); colonIdx != -1 {
|
||||
// Check if it's an IPv6 address
|
||||
if !strings.Contains(host[colonIdx:], "]") {
|
||||
host = host[:colonIdx]
|
||||
}
|
||||
}
|
||||
// Remove trailing path
|
||||
if slashIdx := strings.Index(host, "/"); slashIdx != -1 {
|
||||
host = host[:slashIdx]
|
||||
}
|
||||
host = strings.TrimSpace(host)
|
||||
if host == "" {
|
||||
return
|
||||
}
|
||||
// Check if it's already an IP
|
||||
if ip := net.ParseIP(host); ip != nil {
|
||||
if _, exists := seen[host]; !exists {
|
||||
seen[host] = struct{}{}
|
||||
ips = append(ips, host)
|
||||
}
|
||||
return
|
||||
}
|
||||
// Try to resolve hostname to IP
|
||||
if addrs, err := net.LookupIP(host); err == nil && len(addrs) > 0 {
|
||||
for _, addr := range addrs {
|
||||
// Prefer IPv4
|
||||
if v4 := addr.To4(); v4 != nil {
|
||||
ipStr := v4.String()
|
||||
if _, exists := seen[ipStr]; !exists {
|
||||
seen[ipStr] = struct{}{}
|
||||
ips = append(ips, ipStr)
|
||||
}
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Add PVE hosts
|
||||
for _, pve := range m.config.PVEInstances {
|
||||
addHost(pve.Host)
|
||||
// Also add cluster endpoints
|
||||
for _, ep := range pve.ClusterEndpoints {
|
||||
addHost(ep.Host)
|
||||
addHost(ep.IP)
|
||||
}
|
||||
}
|
||||
|
||||
// Add PBS hosts
|
||||
for _, pbs := range m.config.PBSInstances {
|
||||
addHost(pbs.Host)
|
||||
}
|
||||
|
||||
// Add PMG hosts
|
||||
for _, pmg := range m.config.PMGInstances {
|
||||
addHost(pmg.Host)
|
||||
}
|
||||
|
||||
return ips
|
||||
}
|
||||
|
||||
// Start begins the monitoring loop
|
||||
func (m *Monitor) Start(ctx context.Context, wsHub *websocket.Hub) {
|
||||
pollingInterval := m.effectivePVEPollingInterval()
|
||||
|
|
@ -3292,7 +3377,11 @@ func (m *Monitor) Start(ctx context.Context, wsHub *websocket.Hub) {
|
|||
if m.config == nil {
|
||||
return config.DefaultDiscoveryConfig()
|
||||
}
|
||||
return config.CloneDiscoveryConfig(m.config.Discovery)
|
||||
cfg := config.CloneDiscoveryConfig(m.config.Discovery)
|
||||
// Auto-populate IPBlocklist with configured Proxmox host IPs to avoid
|
||||
// probing hosts we already know about (reduces PBS auth failure log spam)
|
||||
cfg.IPBlocklist = m.getConfiguredHostIPs()
|
||||
return cfg
|
||||
}
|
||||
m.discoveryService = discovery.NewService(wsHub, 5*time.Minute, discoverySubnet, cfgProvider)
|
||||
if m.discoveryService != nil {
|
||||
|
|
@ -7301,6 +7390,12 @@ func (m *Monitor) GetMetricsStore() *metrics.Store {
|
|||
return m.metricsStore
|
||||
}
|
||||
|
||||
// GetMetricsHistory returns the in-memory metrics history for trend analysis
|
||||
// This is used by the AI context builder to compute trends and predictions
|
||||
func (m *Monitor) GetMetricsHistory() *MetricsHistory {
|
||||
return m.metricsHistory
|
||||
}
|
||||
|
||||
// shouldSkipNodeMetrics returns true if we should skip detailed metric polling
|
||||
// for the given node because a host agent is providing richer data.
|
||||
// This helps reduce API load when agents are active.
|
||||
|
|
|
|||
|
|
@ -234,9 +234,33 @@ func (s *Scanner) DiscoverServersWithCallbacks(ctx context.Context, subnet strin
|
|||
|
||||
seenIPs := make(map[string]struct{})
|
||||
|
||||
// Pre-populate seenIPs with blocked IPs to skip them during scanning
|
||||
// This prevents probing already-configured Proxmox hosts (reduces PBS auth failure log spam)
|
||||
blockedCount := 0
|
||||
if activeProfile != nil {
|
||||
for _, ip := range activeProfile.IPBlocklist {
|
||||
if ip == nil {
|
||||
continue
|
||||
}
|
||||
if ip4 := ip.To4(); ip4 != nil {
|
||||
seenIPs[ip4.String()] = struct{}{}
|
||||
blockedCount++
|
||||
}
|
||||
}
|
||||
if blockedCount > 0 {
|
||||
log.Debug().
|
||||
Int("blocked_ips", blockedCount).
|
||||
Msg("Pre-populated blocked IPs to skip during discovery")
|
||||
}
|
||||
}
|
||||
|
||||
// Calculate total targets and phases for progress tracking
|
||||
// Use a preview map to ensure we count only unique IPs that will actually be scanned
|
||||
// Copy blocked IPs to preview map as well
|
||||
previewSeen := make(map[string]struct{})
|
||||
for ip := range seenIPs {
|
||||
previewSeen[ip] = struct{}{}
|
||||
}
|
||||
var totalTargets int
|
||||
var validPhases []envdetect.SubnetPhase
|
||||
phases := append([]envdetect.SubnetPhase(nil), activeProfile.Phases...)
|
||||
|
|
|
|||
|
|
@ -86,6 +86,7 @@ type EnvironmentProfile struct {
|
|||
Type Environment // Detected environment.
|
||||
Phases []SubnetPhase // Subnet scanning phases.
|
||||
ExtraTargets []net.IP // IPs to always probe.
|
||||
IPBlocklist []net.IP // Individual IPs to skip (auto-populated with configured Proxmox hosts).
|
||||
Policy ScanPolicy // Applied scan policy.
|
||||
Confidence float64 // Overall confidence (0.0 - 1.0).
|
||||
Warnings []string // Non-fatal detection warnings.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue