Patrol runs, evaluation passes, and QuickAnalysis calls were consuming
LLM tokens without recording them in the cost store. This made the
cost_budget_usd_30d budget setting ineffective since enforceBudget()
never saw patrol spend.
- Add RecordUsage() to ai.Service for thread-safe cost recording
- Add recordPatrolUsage() helper to PatrolService, called on both
success and error paths for main patrol and evaluation pass
- Record QuickAnalysis token usage in cost store
- Return partial PatrolResponse (with token counts) on error instead
of nil, so callers can always record consumed tokens
- Propagate partial response through chat_service_adapter on error
- Fix API-only mode to accept Bearer tokens and query params
- Fix data race in API token validation using fine-grained locking
- Fix unified agent download serving wrong binary for invalid arch
- Fix AI infra discovery running when AI disabled and missing stop mechanism
- Update patrol.go to use chat service for AI execution
- Update service.go with chat service provider integration
- Add patrol streaming endpoint to router
Major new AI capabilities for infrastructure monitoring:
Investigation System:
- Autonomous finding investigation with configurable autonomy levels
- Investigation orchestrator with rate limiting and guardrails
- Safety checks for read-only mode enforcement
- Chat-based investigation with approval workflows
Forecasting & Remediation:
- Trend forecasting for resource capacity planning
- Remediation engine for generating fix proposals
- Circuit breaker for AI operation protection
Unified Findings:
- Unified store bridging alerts and AI findings
- Correlation and root cause analysis
- Incident coordinator with metrics recording
New Frontend:
- AI Intelligence page with patrol controls
- Investigation drawer for finding details
- Unified findings panel with actions
Supporting Infrastructure:
- Learning store for user preference tracking
- Proxmox event ingestion and correlation
- Enhanced patrol with investigation triggers
- Add ChatStream method to all providers (Anthropic, OpenAI, Gemini, Ollama)
for real-time streaming of AI responses with tool call support
- Add StreamingProvider interface with StreamEvent types for content,
thinking, tool_start, tool_end, done, and error events
- Add notable models feature that fetches model metadata from models.dev
to identify recent/recommended models (within last 3 months)
- Add Notable field to ModelInfo struct to flag "latest and greatest" models
- Add SupportsThinking method to check for extended reasoning capability
The streaming support enables real-time AI chat responses instead of
waiting for complete responses. The notable models feature helps users
identify which models are current and recommended.
OpenCode client improvements:
- Fix session listing with proper timestamp parsing
- Model selection with provider inference (anthropic, google, etc)
- Add session management APIs (summarize, diff, fork, revert)
- Generated session titles from first user message
Control level refactoring:
- IsAutonomous() helper for cleaner checks
- Legacy autonomous_mode maps to control_level for backwards compat
- Simplified system instructions (rely on tool descriptions instead)
Includes tests for model provider inference.
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.
Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()
Related to #1063
Allow users to set custom disk usage thresholds per mounted filesystem
on host agents, rather than applying a single threshold to all volumes.
This addresses NAS/NVR use cases where some volumes (e.g., NVR storage)
intentionally run at 99% while others need strict monitoring.
Backend:
- Check for disk-specific overrides before using HostDefaults.Disk
- Override key format: host:<hostId>/disk:<mountpoint>
- Support both custom thresholds and disable per-disk
Frontend:
- Add 'hostDisk' resource type
- Add "Host Disks" collapsible section in Thresholds → Hosts tab
- Group disks by host for easier navigation
Closes#1103
- Add freebsd-amd64 and freebsd-arm64 to normalizeUnifiedAgentArch()
so the download endpoint serves FreeBSD binaries when requested
- Add FreeBSD/pfSense/OPNsense platform option to agent setup UI
with note about bash installation requirement
- Add FreeBSD test cases to unified_agent_test.go
Fixes installation on pfSense/OPNsense where users were getting 404
errors because the backend didn't recognize the freebsd-amd64 arch
parameter from install.sh.
The DisableDockerUpdateActions setting was being saved to disk but not
updated in h.config, causing the UI toggle to appear to revert on page
refresh since the API returned the stale runtime value.
Related to #1023
- Remove redundant nil checks before len() calls
- Mark unused parameters with underscore
- Convert if/else chains to switch statements for cleaner code
- Add test assertions to resolve unused write warnings in patrol_test.go
- Add Start/Stop lifecycle methods to AlertTriggeredAnalyzer
- Periodic cleanup of lastAnalyzed map every 30 minutes
- Prevents memory growth from stale cooldown entries
- Document that ai package feature constants are aliases of license constants
- Call Start() in StartPatrol and Stop() in StopPatrol
- Add tests for Start/Stop lifecycle
Content was being streamed twice:
1. During each iteration of the tool loop (intended for intermediate feedback)
2. Again after the loop ended with finalContent (redundant)
This caused duplicate responses when using Ollama and other providers.
BREAKING CHANGE: AI Patrol now uses EXACT alert thresholds by default
instead of warning 5-15% before the threshold.
Changes:
- Default behavior: Patrol warns at your configured threshold (e.g., 96% = warns at 96%)
- New setting: 'use_proactive_thresholds' enables the old early-warning behavior
- API: Added use_proactive_thresholds to GET/PUT /api/settings/ai
- Backend: Added SetProactiveMode/GetProactiveMode to PatrolService
- Backend: Added GetThresholds to PatrolService for UI display
- Tests: Updated and added tests for both exact and proactive modes
- Also fixed unused imports in dockeragent/agent.go
When proactive mode is disabled (default):
- Watch: threshold - 5% (slight buffer)
- Warning: exact threshold
When proactive mode is enabled:
- Watch: threshold - 15%
- Warning: threshold - 5%
Related to #951
- patrol.go: Auto-fix now requires both config flag AND ai_autofix license
- service.go: IsAutonomous() checks for ai_autofix license before enabling
- ai_handlers.go: API returns 402 if enabling auto-fix/autonomous without license
Adds structured XML finding responses to the demo mock AI service.
This prevents the background patrol service from failing with 'Analysis failed'
when running in demo mode without a real LLM provider.
Ensures the AI settings endpoint reports enabled=true and configured=true
when running in demo mode (PULSE_MOCK_MODE=true), even if no provider is
configured. This unlocks the frontend UI to allow interaction with the
mock AI assistant.
Allows the AI Assistant to provide realistic canned responses on the
live demo server without needing a real API key. Handled automatically
when PULSE_MOCK_MODE=true and no provider is configured.
When MOCK_ENABLED=true, Pulse now injects realistic AI patrol
findings to showcase the AI features without requiring actual
LLM API calls. This enables the demo instance to demonstrate:
- Critical/warning/info findings with realistic content
- Patrol run history
- Actionable recommendations
Also includes refinements to dismissal logic from earlier work:
- Only 'not_an_issue' creates permanent suppression
- 'expected_behavior' and 'will_fix_later' just acknowledge
Addresses #866 - agents were logging 'WebSocket connection failed' warnings
even during normal reconnection scenarios (server restart, network blip, etc).
Changes:
- Normal close errors (1000, 1001, connection reset) now log at Debug level
- Only log Warning after 3+ consecutive failures
- Changed 'Connecting to Pulse' from Info to Debug to reduce noise
- Successful connections still log at Info level
The WebSocket is only used for AI command execution, not metrics, so
transient disconnections don't affect monitoring functionality.
Add /api/ai/intelligence/anomalies endpoint that compares live metrics
against learned baselines to surface deviations - all deterministic
(no LLM required).
Backend:
- Add AnomalyReport struct with severity classification
- Add CheckResourceAnomalies method to baseline store
- Add HandleGetAnomalies API handler
- Add GetStateProvider getter to AI service
Frontend:
- Add AnomalyReport and AnomaliesResponse types
- Add getAnomalies API function
- Add AnomalySeverity type
This is the first step toward surfacing deterministic intelligence
directly in the UI without requiring LLM interaction.
Backend:
- Enhanced buildEnrichedResourceContext to ALWAYS show learned baselines with
status indicators (normal/elevated/anomaly) instead of only when anomalous
- This makes Pulse Pro's 'moat' visible - users can see the AI understands
their infrastructure's normal behavior patterns
- Added baseline import to service.go
Frontend (user changes):
- Added incident event type filtering with toggle buttons
- Added resource incident panel to view all incidents for a resource
- Added timeline expand/collapse functionality in alert history
- Added incident note saving with proper incidentId tracking
- Added startedAt parameter for proper incident timeline loading
Multiple frontend components were using - as a fallback
when guest.id was falsy. This format drops the node component, which is
critical for clustered setups where the same VMID can exist on different
nodes.
Changes:
- GuestDrawer.tsx: Updated guestId() and handleAskAI() to use canonical format
- GuestRow.tsx: Updated buildGuestId() to use canonical format
- Dashboard.tsx: Updated handleGuestRowClick() and guest rendering loop,
also fixed legacy metadata fallback to use consistent keying
- ThresholdsTable.tsx: Updated guestsGroupedByNode() to use canonical format
Backend changes:
- Removed temporary debug logging added during investigation
- Added alert history section to AI buildEnrichedResourceContext() function
The backend generates VM/Container IDs in instance:node:vmid format (e.g.,
delly:delly:101) via makeGuestID(). This format is now consistently used
across all frontend fallbacks to prevent AI context, metadata, overrides,
and metrics from colliding or desyncing in clustered environments.
- Fixed normalizeStorageDefaults to allow Trigger=0
- Fixed normalizeNodeDefaults (Temperature) to allow Trigger=0
- Added comprehensive tests for all threshold normalization patterns
- Updated existing test that expected old behavior
Related to #864
- Add HandleLicenseFeatures handler that was missing from license_handlers.go
- Add /api/license/features route to router
- Update AI service and metadata provider
- Update frontend license API and components
- Fix CI build failure caused by tests referencing unimplemented method
Fixes#858
The patrol interval setting was not being properly applied due to:
1. ReconfigurePatrol() was setting the deprecated QuickCheckInterval field
instead of the preferred Interval field
2. SetConfig() was comparing raw field values instead of using GetInterval()
to compare effective intervals, causing change detection to fail
3. The API response was missing interval_ms, preventing the frontend from
displaying the correct interval
Changes:
- Update StartPatrol() and ReconfigurePatrol() to use the Interval field
- Fix SetConfig() to use GetInterval() for interval comparison
- Add IntervalMs to PatrolStatusResponse and include it in the API response
The issue was a SolidJS reactivity problem in the Dashboard component.
When guestMetadata signal was accessed inside a For loop callback and
assigned to a plain variable, SolidJS lost reactive tracking.
Changed from:
const metadata = guestMetadata()[guestId] || ...
customUrl={metadata?.customUrl}
To:
const getMetadata = () => guestMetadata()[guestId] || ...
customUrl={getMetadata()?.customUrl}
This ensures SolidJS properly tracks the signal dependency when the
getter function is called directly in JSX props.
Backend:
- Add smart provider fallback when selected model's provider isn't configured
- Automatically switch to a model from a configured provider instead of failing
- Log warning when fallback occurs for visibility
Frontend (AISettings.tsx):
- Add helper functions to check if model's provider is configured
- Group model dropdown: configured providers first, unconfigured marked with ⚠️
- Add inline warning when selecting model from unconfigured provider
- Validate on save that model's provider is configured (or being added)
- Warn before clearing last configured provider (would disable AI)
- Warn before clearing provider that current model uses
- Add patrol interval validation (must be 0 or >= 10 minutes)
- Show red border + inline error for invalid patrol intervals 1-9
- Update patrol interval hint: '(0=off, 10+ to enable)'
These changes prevent confusing '500 Internal Server Error' and
'AI is not enabled or configured' errors when model/provider mismatch.
1. resources/store.go: Implement sorting in Query.Execute()
- Added sortResources function with support for common fields
- Supports: name, type, status, cpu, memory, disk, last_seen
- Both ascending and descending order supported
2. ai/service.go: Implement hasAgentForTarget properly
- Now maps target to specific agent based on hostname/node
- Uses ResourceProvider lookup for container→host mapping
- Supports cluster peer routing for Proxmox clusters
- Properly handles single-agent vs multi-agent scenarios
The set_resource_url tool had an incorrect example ID format ('pve1-delly-101')
which caused the AI to save URLs with wrong IDs that didn't match the actual
guest IDs used by Pulse ('instance-VMID' format like 'delly-150').
This fix updates the tool description to clearly document the correct format,
so URLs saved by the AI will now properly appear in the dashboard.
- Add DOMPurify sanitization for AI chat markdown rendering (XSS fix)
- Configure DOMPurify to add target=_blank and rel=noopener to links
- Update system prompt to align with command approval policy
- Clarify safe vs destructive commands in prompt
- Improve patrol auto-fix mode guidance with safe operation list
- Add verification requirements for auto-fix actions
- Update observe-only mode to be clearer about read-only restrictions