Add /api/ai/intelligence/anomalies endpoint that compares live metrics
against learned baselines to surface deviations - all deterministic
(no LLM required).
Backend:
- Add AnomalyReport struct with severity classification
- Add CheckResourceAnomalies method to baseline store
- Add HandleGetAnomalies API handler
- Add GetStateProvider getter to AI service
Frontend:
- Add AnomalyReport and AnomaliesResponse types
- Add getAnomalies API function
- Add AnomalySeverity type
This is the first step toward surfacing deterministic intelligence
directly in the UI without requiring LLM interaction.
- Create Intelligence struct that aggregates all AI subsystems
- Add /api/ai/intelligence endpoint for system-wide and per-resource insights
- Wire Intelligence into PatrolService as a facade (not replacement)
- Add TypeScript types and API client for frontend
- Add unit tests for Intelligence orchestrator
- Fix pre-existing test failures using diagnostic commands instead of actionable ones
The Intelligence orchestrator provides:
- System-wide health scoring (A-F grades)
- Aggregated findings, predictions, correlations
- Per-resource context generation for AI prompts
- Learning progress tracking
This unifies access to AI subsystems without replacing existing code paths.
Backend:
- Enhanced buildEnrichedResourceContext to ALWAYS show learned baselines with
status indicators (normal/elevated/anomaly) instead of only when anomalous
- This makes Pulse Pro's 'moat' visible - users can see the AI understands
their infrastructure's normal behavior patterns
- Added baseline import to service.go
Frontend (user changes):
- Added incident event type filtering with toggle buttons
- Added resource incident panel to view all incidents for a resource
- Added timeline expand/collapse functionality in alert history
- Added incident note saving with proper incidentId tracking
- Added startedAt parameter for proper incident timeline loading
Multiple frontend components were using - as a fallback
when guest.id was falsy. This format drops the node component, which is
critical for clustered setups where the same VMID can exist on different
nodes.
Changes:
- GuestDrawer.tsx: Updated guestId() and handleAskAI() to use canonical format
- GuestRow.tsx: Updated buildGuestId() to use canonical format
- Dashboard.tsx: Updated handleGuestRowClick() and guest rendering loop,
also fixed legacy metadata fallback to use consistent keying
- ThresholdsTable.tsx: Updated guestsGroupedByNode() to use canonical format
Backend changes:
- Removed temporary debug logging added during investigation
- Added alert history section to AI buildEnrichedResourceContext() function
The backend generates VM/Container IDs in instance:node:vmid format (e.g.,
delly:delly:101) via makeGuestID(). This format is now consistently used
across all frontend fallbacks to prevent AI context, metadata, overrides,
and metrics from colliding or desyncing in clustered environments.
- Fixed normalizeStorageDefaults to allow Trigger=0
- Fixed normalizeNodeDefaults (Temperature) to allow Trigger=0
- Added comprehensive tests for all threshold normalization patterns
- Updated existing test that expected old behavior
Related to #864
- Add HandleLicenseFeatures handler that was missing from license_handlers.go
- Add /api/license/features route to router
- Update AI service and metadata provider
- Update frontend license API and components
- Fix CI build failure caused by tests referencing unimplemented method
Fixes#858
The patrol interval setting was not being properly applied due to:
1. ReconfigurePatrol() was setting the deprecated QuickCheckInterval field
instead of the preferred Interval field
2. SetConfig() was comparing raw field values instead of using GetInterval()
to compare effective intervals, causing change detection to fail
3. The API response was missing interval_ms, preventing the frontend from
displaying the correct interval
Changes:
- Update StartPatrol() and ReconfigurePatrol() to use the Interval field
- Fix SetConfig() to use GetInterval() for interval comparison
- Add IntervalMs to PatrolStatusResponse and include it in the API response
- Add CollapsibleSection component with animated expand/collapse
- Wrap all 6 resource sections (Nodes, VMs, PBS, Storage, Backups, Snapshots) with accordion UI
- Add section icons and resource counts in headers
- Add expand all / collapse all buttons for quick navigation
- Make help banner dismissible with localStorage persistence
- Add Ctrl/Cmd+F keyboard shortcut to focus search
- Add keyboard shortcut hint badge on search input
- Add icons to tab navigation for quick identification
- Improve mobile tab labels with shorter text on small screens
- Create reusable components: ThresholdBadge, ResourceCard, GlobalDefaultsRow
- Create useCollapsedSections hook with localStorage persistence
- Default less-used sections (Storage, Backups, Snapshots, PBS) to collapsed
The issue was a SolidJS reactivity problem in the Dashboard component.
When guestMetadata signal was accessed inside a For loop callback and
assigned to a plain variable, SolidJS lost reactive tracking.
Changed from:
const metadata = guestMetadata()[guestId] || ...
customUrl={metadata?.customUrl}
To:
const getMetadata = () => guestMetadata()[guestId] || ...
customUrl={getMetadata()?.customUrl}
This ensures SolidJS properly tracks the signal dependency when the
getter function is called directly in JSX props.
Backend:
- Add smart provider fallback when selected model's provider isn't configured
- Automatically switch to a model from a configured provider instead of failing
- Log warning when fallback occurs for visibility
Frontend (AISettings.tsx):
- Add helper functions to check if model's provider is configured
- Group model dropdown: configured providers first, unconfigured marked with ⚠️
- Add inline warning when selecting model from unconfigured provider
- Validate on save that model's provider is configured (or being added)
- Warn before clearing last configured provider (would disable AI)
- Warn before clearing provider that current model uses
- Add patrol interval validation (must be 0 or >= 10 minutes)
- Show red border + inline error for invalid patrol intervals 1-9
- Update patrol interval hint: '(0=off, 10+ to enable)'
These changes prevent confusing '500 Internal Server Error' and
'AI is not enabled or configured' errors when model/provider mismatch.
Fixes issue where Ollama users get 'I'm a large language model, I can't do XYZ'
responses when trying to use the AI assistant. The problem was that the
Ollama provider was not passing tool definitions to the API.
Changes:
- Add Tools field to ollamaRequest struct
- Add ollamaTool, ollamaToolFunction, ollamaToolCall structs
- Convert tools from ChatRequest to Ollama format in Chat()
- Parse tool_calls from Ollama response
- Set StopReason to 'tool_use' when model requests tool execution
- Handle tool results in multi-turn conversations
Requires Ollama v0.3.0+ and a tool-capable model (llama3.1+, mistral-nemo, etc.)
Closes: Discussion #845 comment by misterlegend
- Add integration tests for Ollama provider (17 tests against real API)
- Add unit tests for baseline, correlation, patterns, memory, knowledge, cost packages
- Add context formatter and builder tests
- Add factory tests for provider initialization
- Add Makefile targets: test-integration, test-all
- Clean up test theatre (removed struct field tests)
Integration tests require Ollama at OLLAMA_URL (default: 192.168.0.124:11434)
Run with: make test-integration
- Replace verbose info banner with streamlined layout
- Add collapsible 'Advanced Model Selection' accordion for Chat/Patrol models
- Make AI Patrol Settings section collapsible with inline summary badges
- Compact Cost Controls into single-row inline layout
- Reduce form spacing for tighter presentation
- Remove unused formHelpText import
Also includes:
- OpenAI provider fixes for max_tokens parameters
- Security setup CSRF and 401 fixes
- Minor UI tweaks
Backend fixes:
- Strip provider prefix (anthropic:, openai:, deepseek:, ollama:) in all
provider Chat methods and constructors for robust handling
- Models are now correctly parsed regardless of caller format
Frontend fixes:
- Tool cards now persist in AI chat after approval execution by adding
to streamEvents array
- Dashboard now listens for pulse:metadata-changed custom event
- AI Chat emits this event when set_resource_url tool completes
- Guest URL icons now update instantly when AI sets them
- Add setup modal that appears when enabling AI without configured provider
- Modal allows selecting provider (Anthropic, OpenAI, DeepSeek, Ollama)
- Enter API key/URL and enable AI in one smooth flow
- Reorder backend to apply API keys before enabled check
- Fix Ollama to strip 'ollama:' prefix from model names
- Simplify backend error message for unconfigured providers
1. resources/store.go: Implement sorting in Query.Execute()
- Added sortResources function with support for common fields
- Supports: name, type, status, cpu, memory, disk, last_seen
- Both ascending and descending order supported
2. ai/service.go: Implement hasAgentForTarget properly
- Now maps target to specific agent based on hostname/node
- Uses ResourceProvider lookup for container→host mapping
- Supports cluster peer routing for Proxmox clusters
- Properly handles single-agent vs multi-agent scenarios
The set_resource_url tool had an incorrect example ID format ('pve1-delly-101')
which caused the AI to save URLs with wrong IDs that didn't match the actual
guest IDs used by Pulse ('instance-VMID' format like 'delly-150').
This fix updates the tool description to clearly document the correct format,
so URLs saved by the AI will now properly appear in the dashboard.
- Frontend: Add ociImage memo to extract clean image name from osTemplate
- Frontend: Show OCI image name in type badge tooltip
- Frontend: Display OCI image in OS column when no guest agent info available
- Frontend: Include ociImage in AI context data for selected OCI containers
- Backend: Differentiate OCI containers as 'oci_container' type in AI context
- Backend: Add Metadata field to ResourceContext for extensibility
- Backend: Include oci_image in container metadata for AI analysis
- Backend: Update section heading to 'LXC/OCI Containers' in AI context
This follows Docker container patterns to avoid duplicating work.
- Add DOMPurify sanitization for AI chat markdown rendering (XSS fix)
- Configure DOMPurify to add target=_blank and rel=noopener to links
- Update system prompt to align with command approval policy
- Clarify safe vs destructive commands in prompt
- Improve patrol auto-fix mode guidance with safe operation list
- Add verification requirements for auto-fix actions
- Update observe-only mode to be clearer about read-only restrictions
Create internal/ai/correlation package:
1. Correlation Detector (detector.go):
- Tracks events across resources
- Detects when events on one resource follow events on another
- Calculates average delay between correlated events
- Confidence scoring based on occurrence count
- Persists to ai_correlations.json
2. Features:
- GetCorrelations() - All detected relationships
- GetCorrelationsForResource() - Relationships for one resource
- GetDependencies() - What resources depend on this one
- GetDependsOn() - What this resource depends on
- PredictCascade() - Predict what will be affected
- FormatForContext() - AI-consumable summary
3. Integration:
- Wire to alert history in router startup
- Map alert types to correlation event types
- Add correlation context to enriched AI context
Example AI context now includes:
'When local-zfs experiences high usage, database often follows within 5 minutes'
This enables the AI to understand infrastructure dependencies
and predict cascade failures.
All tests passing.
Create internal/ai/patterns package:
1. Pattern Detector (detector.go):
- Records historical events (high memory, OOM, restarts, etc.)
- Detects recurring failure patterns
- Calculates average interval between occurrences
- Computes confidence based on pattern consistency
- Predicts when failures will occur again
- Persists to ai_patterns.json
2. Event types tracked:
- high_memory, high_cpu, disk_full
- oom, restart, unresponsive
- backup_failed
3. Integration:
- Wire PatternDetector into router startup
- Add to AI context in buildEnrichedContext
- FormatForContext generates failure predictions
Example AI context now includes:
'OOM events typically occurs every ~10 days (next expected in ~3 days)'
This enables proactive alerts before problems recur.
All tests passing.
Phase 4 - Remediation logging integration:
1. logRemediation hook after tool execution:
- Only logs run_command tools (main remediation action)
- Records resourceID, resourceType, findingID
- Extracts problem summary from user prompt
- Truncates output for storage (max 1000 chars)
- Distinguishes automatic (patrol) vs manual (chat) actions
2. buildRemediationContext for system prompts:
- Shows 'Past Successful Fixes for Similar Issues' section
- Uses keyword matching to find relevant past fixes
- Shows 'Remediation History for This Resource' section
- Includes timestamps and outcomes
This enables the AI to say things like:
- 'This worked before: apt clean to free 6GB (resolved)'
- 'Last time on this resource: restarted nginx (resolved)'
All tests passing.
Complete Phase 3 integration:
- Initialize ChangeDetector and RemediationLog in StartPatrol
- Add SetChangeDetector/SetRemediationLog to handler chain:
Router -> AISettingsHandler -> Service -> PatrolService
- Persist change history to ai_changes.json
- Persist remediation log to ai_remediations.json
- Both use the Pulse config directory for storage
Operational memory is now fully integrated:
- Change detector tracks infrastructure changes on each patrol
- Recent changes (24h) are appended to AI context
- Remediation log ready for command execution logging
All tests passing.
Integrate operational memory into patrol context:
- Add changeDetector and remediationLog fields to PatrolService
- Add SetChangeDetector and SetRemediationLog methods
- Integrate change detection into buildEnrichedContext
- Convert state to ResourceSnapshots for change tracking
- Append recent changes summary to AI context
The AI now sees a 'Recent Infrastructure Changes (24h)' section
showing events like:
- VM 'web-server' status changed: running → stopped (2h ago)
- 'db-server' migrated from node1 to node2 (4h ago)
- 'web-server' memory increased: 4 GB → 8 GB (1d ago)
All tests passing.
Phase 3 of Pulse AI differentiation:
Create internal/ai/memory package with:
1. Change Detection (changes.go):
- Tracks infrastructure changes: creation, deletion, config changes
- Detects status changes (started, stopped)
- Detects VM/container migrations between nodes
- Detects CPU/memory configuration changes
- Detects backup completions
- Persists change history to ai_changes.json
- GetChangesSummary for AI context
2. Remediation Logging (remediation.go):
- Records actions taken to fix problems
- Tracks command, output, and outcome
- Links to AI findings via findingID
- GetSimilar finds past similar problems
- GetSuccessfulRemediations for learning
- Persists to ai_remediations.json
3. Type exports (memory_exports.go):
- Clean re-exports from ai package
This enables the AI to say things like:
- 'This VM was migrated 2 hours ago'
- 'Memory was increased from 4GB to 8GB yesterday'
- 'Last time this happened, restarting nginx resolved it'
All tests passing.
Complete Phase 2 baseline integration:
- Add baseline_exports.go for clean type aliasing
- Wire baseline store initialization into StartPatrol
- Implement startBaselineLearning background loop
- Runs initial learning after 5 min delay
- Updates baselines every hour from metrics history
- Learns from 7 days of data for nodes, VMs, containers
- Add SetBaselineStore methods throughout the chain
(Router -> AIHandler -> Service -> PatrolService)
- Persists baselines to data directory as JSON
The baseline learning loop:
1. Starts automatically when AI patrol starts
2. Queries metrics history for all resources
3. Computes mean, stddev, percentiles for cpu/memory/disk
4. Saves baselines to disk for durability
5. Anomaly detection uses these baselines in context builder
All tests passing.
Phase 2 of Pulse AI differentiation:
- Create internal/ai/baseline package for learned baselines
- Implement statistical baseline learning with mean, stddev, percentiles
- Add z-score based anomaly detection with severity classification
(low, medium, high, critical based on standard deviations)
- Integrate baseline provider into context builder
- Wire baseline store into patrol service with adapters
- Add anomaly enrichment to resource contexts
Key features:
- Learn computes baseline from historical metric data points
- IsAnomaly and CheckAnomaly detect deviations from normal
- Persists baselines to disk as JSON for durability
- Formatted anomaly descriptions for AI consumption
Example: 'Memory is high above normal (85.2% vs typical 42.1% ± 8.3%)'
The baseline store needs to be initialized and triggered to learn
from metrics history. Next step is adding the learning loop.
All tests passing.
Phase 1 of Pulse AI differentiation:
- Create internal/ai/context package with types, trends, builder, formatter
- Implement linear regression for trend computation (growing/declining/stable/volatile)
- Add storage capacity predictions (predicts days until 90% and 100%)
- Wire MetricsHistory from monitor to patrol service
- Update patrol to use buildEnrichedContext instead of basic summary
- Update patrol prompt to reference trend indicators and predictions
This gives the AI awareness of historical patterns, enabling it to:
- Identify resources with concerning growth rates
- Predict capacity exhaustion before it happens
- Distinguish between stable high usage vs growing problems
- Provide more actionable, time-aware insights
All tests passing. Falls back to basic summary if metrics history unavailable.
Backend:
- Add per-provider API key fields to AIConfig (AnthropicAPIKey, OpenAIAPIKey, DeepSeekAPIKey, OllamaBaseURL, OpenAIBaseURL)
- Add NewForProvider() and NewForModel() factory functions for multi-provider instantiation
- Update ListModels() to aggregate models from all configured providers with provider:model format
- Update Execute/ExecuteStream to dynamically create provider based on selected model
- Update TestConnection to use multi-provider aware provider creation
- Add helper functions: HasProvider(), GetConfiguredProviders(), GetAPIKeyForProvider(), GetBaseURLForProvider(), ParseModelString(), FormatModelString()
Frontend:
- Remove legacy single-provider UI (provider grid, single API key input, single base URL)
- Add accordion-style UI for configuring all providers independently
- Add model grouping by provider in selectors using optgroup
- Update AIChat model dropdown with grouped provider sections
- Add helper functions for parsing provider from model ID and grouping models
API:
- Add multi-provider fields to AISettingsResponse and AISettingsUpdateRequest
- Add /api/ai/models endpoint for dynamic model listing
- Update settings handlers for per-provider credential management
Users can now:
1. View all suppression rules (both from dismissed findings and manually created)
2. Create manual rules like 'ignore performance issues on debian-go'
3. Delete rules when they want alerts to come back
Backend:
- Added SuppressionRule type for user-defined rules
- Added suppressionRules storage to FindingsStore
- Added AddSuppressionRule/GetSuppressionRules/DeleteSuppressionRule methods
- Added isSuppressedInternal check for manual rules
- Added API handlers and routes for /api/ai/patrol/suppressions
Frontend:
- Added SuppressionRule interface
- Added getSuppressionRules/addSuppressionRule/deleteSuppressionRule API functions
- Added getDismissedFindings for viewing dismissed findings
Example usage:
POST /api/ai/patrol/suppressions
{
'resource_id': 'debian-go',
'category': 'performance',
'description': 'Dev container runs hot - expected'
}
The main issue was that finding IDs included the title, which the LLM
generates differently each time. 'High CPU on minipc' vs 'Node minipc
experiencing high CPU load' got different IDs, making dismissals useless.
Changes:
1. LLM findings now get IDs based on resource+category only, not title
2. Add() now checks if finding is suppressed before adding as new
3. Add() now checks dismissed findings and only reactivates on severity escalation
4. IsSuppressed() now matches by resource+category only, not title
5. Added isSuppressedInternal() for use when lock is already held
Now when you dismiss 'performance issues on minipc', any future patrol finding
about performance on minipc will be recognized as the same issue and stay dismissed.
The ForcePatrol() function was using the HTTP request context, which gets
cancelled immediately when the API response is sent. This caused LLM analysis
to fail with 'context canceled' before it could complete.
Now uses context.Background() so the goroutine runs independently of the
HTTP request lifecycle.
Also fixed dropdown hover gap issue in the dismiss menu.