Commit graph

75 commits

Author SHA1 Message Date
rcourtman
1de1392c9b Preserve provider metadata in AI model lists (#1320) 2026-03-25 13:08:15 +00:00
rcourtman
5f372e257f Respect patrol model provider in quick analysis 2026-03-25 13:01:43 +00:00
rcourtman
d852964696 fix(ai): record patrol and QuickAnalysis token usage in cost store for budget enforcement
Patrol runs, evaluation passes, and QuickAnalysis calls were consuming
LLM tokens without recording them in the cost store. This made the
cost_budget_usd_30d budget setting ineffective since enforceBudget()
never saw patrol spend.

- Add RecordUsage() to ai.Service for thread-safe cost recording
- Add recordPatrolUsage() helper to PatrolService, called on both
  success and error paths for main patrol and evaluation pass
- Record QuickAnalysis token usage in cost store
- Return partial PatrolResponse (with token counts) on error instead
  of nil, so callers can always record consumed tokens
- Propagate partial response through chat_service_adapter on error
2026-03-01 19:19:47 +00:00
rcourtman
24f5b1cb31 fix(patrol): cap per-run tokens and reset patrol session history 2026-02-24 11:29:47 +00:00
rcourtman
60f9e6f07f security: fix multiple vulnerabilities (SAML, SSRF, Auth)
Addressed several security findings:
- SAML: Sanitized RelayState to prevent open redirects
- SAML: Fixed logout to properly invalidate server-side sessions
- Auth: Added auth, rate limiting, and logout checks to password change endpoint
- AI: Added admin/scope gating (ai:execute) for command execution
- AI: Blocked private IP ranges in fetch_url to prevent SSRF
- Config: Enforced settings:read/write scopes for export/import
- Agent: Added agent:exec scope requirement for WebSockets
2026-02-03 18:39:15 +00:00
rcourtman
935326ebb7 fix(api/ai): resolve critical auth, agent download, and lifecycle issues
- Fix API-only mode to accept Bearer tokens and query params
- Fix data race in API token validation using fine-grained locking
- Fix unified agent download serving wrong binary for invalid arch
- Fix AI infra discovery running when AI disabled and missing stop mechanism
2026-02-03 16:35:12 +00:00
rcourtman
0c802e7083 fix(patrol): improve service lifecycle, graceful shutdown, and concurrency 2026-02-01 16:27:25 +00:00
rcourtman
95a0d7a6bd feat(backend): implement AI Patrol, Investigation, and system-wide refactors 2026-01-30 19:02:14 +00:00
rcourtman
03b5586ac8 refactor(ai): update patrol and service to use chat service adapter
- Update patrol.go to use chat service for AI execution
- Update service.go with chat service provider integration
- Add patrol streaming endpoint to router
2026-01-28 21:24:34 +00:00
rcourtman
e194e17159 Update AI core services and adapters
AI module improvements:

Patrol System:
- Better trigger handling
- Improved history persistence
- Enhanced coverage testing

Knowledge Store:
- Extended functionality
- Better test coverage

Adapters:
- Discovery adapter updates
- Investigation adapter improvements

Unified Bridge:
- Setup improvements
- Better test coverage

Alert handling and service updates.
2026-01-28 16:51:53 +00:00
rcourtman
7f7edfceb4 test: expand backend coverage 2026-01-25 21:08:44 +00:00
rcourtman
ff2841a5c6 Fix patrol scoping and config propagation 2026-01-24 23:07:55 +00:00
rcourtman
27f1a11acb feat: add AI Intelligence system with investigation and forecasting
Major new AI capabilities for infrastructure monitoring:

Investigation System:
- Autonomous finding investigation with configurable autonomy levels
- Investigation orchestrator with rate limiting and guardrails
- Safety checks for read-only mode enforcement
- Chat-based investigation with approval workflows

Forecasting & Remediation:
- Trend forecasting for resource capacity planning
- Remediation engine for generating fix proposals
- Circuit breaker for AI operation protection

Unified Findings:
- Unified store bridging alerts and AI findings
- Correlation and root cause analysis
- Incident coordinator with metrics recording

New Frontend:
- AI Intelligence page with patrol controls
- Investigation drawer for finding details
- Unified findings panel with actions

Supporting Infrastructure:
- Learning store for user preference tracking
- Proxmox event ingestion and correlation
- Enhanced patrol with investigation triggers
2026-01-24 22:41:43 +00:00
rcourtman
37e7aebc98 feat: enhance AI patrol with streaming and improved findings
- Add streaming support to patrol operations
- Improve finding detection and reporting
- Enhance agentic chat capabilities
- Add alert integration improvements
2026-01-22 22:30:35 +00:00
rcourtman
4fe3d7df77 feat(ai): Add streaming support and notable models to AI providers
- Add ChatStream method to all providers (Anthropic, OpenAI, Gemini, Ollama)
  for real-time streaming of AI responses with tool call support
- Add StreamingProvider interface with StreamEvent types for content,
  thinking, tool_start, tool_end, done, and error events
- Add notable models feature that fetches model metadata from models.dev
  to identify recent/recommended models (within last 3 months)
- Add Notable field to ModelInfo struct to flag "latest and greatest" models
- Add SupportsThinking method to check for extended reasoning capability

The streaming support enables real-time AI chat responses instead of
waiting for complete responses. The notable models feature helps users
identify which models are current and recommended.
2026-01-19 19:10:58 +00:00
rcourtman
c26f0e6e6c feat(ai): improve OpenCode integration and control level handling
OpenCode client improvements:
- Fix session listing with proper timestamp parsing
- Model selection with provider inference (anthropic, google, etc)
- Add session management APIs (summarize, diff, fork, revert)
- Generated session titles from first user message

Control level refactoring:
- IsAutonomous() helper for cleaner checks
- Legacy autonomous_mode maps to control_level for backwards compat
- Simplified system instructions (rely on tool descriptions instead)

Includes tests for model provider inference.
2026-01-17 14:43:28 +00:00
rcourtman
035436ad6e fix: add mutex to prevent concurrent map writes in Docker agent CPU tracking
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.

Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()

Related to #1063
2026-01-15 21:10:55 +00:00
rcourtman
9cd53814a3 feat(alerts): add per-volume disk thresholds for host agents
Allow users to set custom disk usage thresholds per mounted filesystem
on host agents, rather than applying a single threshold to all volumes.

This addresses NAS/NVR use cases where some volumes (e.g., NVR storage)
intentionally run at 99% while others need strict monitoring.

Backend:
- Check for disk-specific overrides before using HostDefaults.Disk
- Override key format: host:<hostId>/disk:<mountpoint>
- Support both custom thresholds and disable per-disk

Frontend:
- Add 'hostDisk' resource type
- Add "Host Disks" collapsible section in Thresholds → Hosts tab
- Group disks by host for easier navigation

Closes #1103
2026-01-13 23:38:20 +00:00
rcourtman
b177812fd3 revert: remove accidentally committed WIP OpenCode changes
Reverts unintended changes from 4e064aa0 that broke the frontend build.
The workflow fix for cmd/pulse package build remains intact.
2026-01-13 09:15:42 +00:00
rcourtman
4e064aa0cc fix: build entire cmd/pulse package, not just main.go
The static binary build was only compiling main.go, missing bootstrap.go
and config.go which define osExit, bootstrapTokenCmd, and configCmd.
2026-01-13 09:06:21 +00:00
rcourtman
b2a6cd0fa3 fix(agent): add FreeBSD platform support to agent download and UI (#1051)
- Add freebsd-amd64 and freebsd-arm64 to normalizeUnifiedAgentArch()
  so the download endpoint serves FreeBSD binaries when requested
- Add FreeBSD/pfSense/OPNsense platform option to agent setup UI
  with note about bash installation requirement
- Add FreeBSD test cases to unified_agent_test.go

Fixes installation on pfSense/OPNsense where users were getting 404
errors because the backend didn't recognize the freebsd-amd64 arch
parameter from install.sh.
2026-01-11 23:51:12 +00:00
rcourtman
ed78509f92 Fix flaky tests and improve coverage across alerts, api, and config packages
- Fix deadlock and race conditions in internal/alerts
- Add comprehensive error path tests for internal/config
- Fix 401 handling in internal/api
- Fix Docker Swarm task filtering test logic
2026-01-03 18:36:17 +00:00
rcourtman
9e339957c6 fix: Update runtime config when toggling Docker update actions setting
The DisableDockerUpdateActions setting was being saved to disk but not
updated in h.config, causing the UI toggle to appear to revert on page
refresh since the API returned the stale runtime value.

Related to #1023
2026-01-03 11:14:17 +00:00
rcourtman
31c704c7a7 refactor: fix lint issues in internal/ai package
- Remove redundant nil checks before len() calls
- Mark unused parameters with underscore
- Convert if/else chains to switch statements for cleaner code
- Add test assertions to resolve unused write warnings in patrol_test.go
2026-01-02 19:53:01 +00:00
rcourtman
180cddb55b refactor: use license package constants for Pro features in AI service 2026-01-02 14:11:56 +00:00
rcourtman
c2de1b256b fix(pro): add cleanup goroutine for alert analyzer memory leak
- Add Start/Stop lifecycle methods to AlertTriggeredAnalyzer
- Periodic cleanup of lastAnalyzed map every 30 minutes
- Prevents memory growth from stale cooldown entries
- Document that ai package feature constants are aliases of license constants
- Call Start() in StartPatrol and Stop() in StopPatrol
- Add tests for Start/Stop lifecycle
2026-01-02 13:12:24 +00:00
rcourtman
ae1c39960f fix: Remove duplicate AI chat response streaming (issue #947)
Content was being streamed twice:
1. During each iteration of the tool loop (intended for intermediate feedback)
2. Again after the loop ended with finalContent (redundant)

This caused duplicate responses when using Ollama and other providers.
2025-12-29 09:18:05 +00:00
rcourtman
3040800e7b fix: AI Patrol now respects exact user-configured thresholds
BREAKING CHANGE: AI Patrol now uses EXACT alert thresholds by default
instead of warning 5-15% before the threshold.

Changes:
- Default behavior: Patrol warns at your configured threshold (e.g., 96% = warns at 96%)
- New setting: 'use_proactive_thresholds' enables the old early-warning behavior
- API: Added use_proactive_thresholds to GET/PUT /api/settings/ai
- Backend: Added SetProactiveMode/GetProactiveMode to PatrolService
- Backend: Added GetThresholds to PatrolService for UI display
- Tests: Updated and added tests for both exact and proactive modes
- Also fixed unused imports in dockeragent/agent.go

When proactive mode is disabled (default):
- Watch: threshold - 5% (slight buffer)
- Warning: exact threshold

When proactive mode is enabled:
- Watch: threshold - 15%
- Warning: threshold - 5%

Related to #951
2025-12-29 08:40:34 +00:00
rcourtman
fe3b4ed5b6 fix: require Pro license for auto-fix and autonomous mode
- patrol.go: Auto-fix now requires both config flag AND ai_autofix license
- service.go: IsAutonomous() checks for ai_autofix license before enabling
- ai_handlers.go: API returns 402 if enabling auto-fix/autonomous without license
2025-12-25 21:26:46 +00:00
rcourtman
d74eae3a3e fix(demo): support patrol analysis mock
Adds structured XML finding responses to the demo mock AI service.
This prevents the background patrol service from failing with 'Analysis failed'
when running in demo mode without a real LLM provider.
2025-12-23 18:48:50 +00:00
rcourtman
03d7147615 fix(ai): force enabled state in demo mode
Ensures the AI settings endpoint reports enabled=true and configured=true
when running in demo mode (PULSE_MOCK_MODE=true), even if no provider is
configured. This unlocks the frontend UI to allow interaction with the
mock AI assistant.
2025-12-23 18:39:34 +00:00
rcourtman
ead3e9ec7e feat(ai): add mock chat response for demo mode
Allows the AI Assistant to provide realistic canned responses on the
live demo server without needing a real API key. Handled automatically
when PULSE_MOCK_MODE=true and no provider is configured.
2025-12-23 18:34:38 +00:00
rcourtman
b75728922c feat: add demo AI findings for mock mode
When MOCK_ENABLED=true, Pulse now injects realistic AI patrol
findings to showcase the AI features without requiring actual
LLM API calls. This enables the demo instance to demonstrate:

- Critical/warning/info findings with realistic content
- Patrol run history
- Actionable recommendations

Also includes refinements to dismissal logic from earlier work:
- Only 'not_an_issue' creates permanent suppression
- 'expected_behavior' and 'will_fix_later' just acknowledge
2025-12-22 17:16:26 +00:00
rcourtman
28ac86c8ab fix: reduce WebSocket reconnection log noise in host agent
Addresses #866 - agents were logging 'WebSocket connection failed' warnings
even during normal reconnection scenarios (server restart, network blip, etc).

Changes:
- Normal close errors (1000, 1001, connection reset) now log at Debug level
- Only log Warning after 3+ consecutive failures
- Changed 'Connecting to Pulse' from Info to Debug to reduce noise
- Successful connections still log at Info level

The WebSocket is only used for AI command execution, not metrics, so
transient disconnections don't affect monitoring functionality.
2025-12-22 14:11:23 +00:00
rcourtman
4e893117cd fix: correct patrol interval logging
The log was showing QuickCheckInterval (deprecated, always 0) instead of
the actual Interval field. This caused confusing 'interval: 0' logs.
2025-12-21 21:52:57 +00:00
rcourtman
d9f1f7accd feat(ai): add real-time anomaly detection endpoint
Add /api/ai/intelligence/anomalies endpoint that compares live metrics
against learned baselines to surface deviations - all deterministic
(no LLM required).

Backend:
- Add AnomalyReport struct with severity classification
- Add CheckResourceAnomalies method to baseline store
- Add HandleGetAnomalies API handler
- Add GetStateProvider getter to AI service

Frontend:
- Add AnomalyReport and AnomaliesResponse types
- Add getAnomalies API function
- Add AnomalySeverity type

This is the first step toward surfacing deterministic intelligence
directly in the UI without requiring LLM interaction.
2025-12-21 10:52:54 +00:00
rcourtman
96573f4aca feat: enhance AI baseline context visibility and incident timeline improvements
Backend:
- Enhanced buildEnrichedResourceContext to ALWAYS show learned baselines with
  status indicators (normal/elevated/anomaly) instead of only when anomalous
- This makes Pulse Pro's 'moat' visible - users can see the AI understands
  their infrastructure's normal behavior patterns
- Added baseline import to service.go

Frontend (user changes):
- Added incident event type filtering with toggle buttons
- Added resource incident panel to view all incidents for a resource
- Added timeline expand/collapse functionality in alert history
- Added incident note saving with proper incidentId tracking
- Added startedAt parameter for proper incident timeline loading
2025-12-21 00:14:20 +00:00
rcourtman
5173fc3162 fix: normalize guest ID fallbacks to canonical instance:node:vmid format
Multiple frontend components were using - as a fallback
when guest.id was falsy. This format drops the node component, which is
critical for clustered setups where the same VMID can exist on different
nodes.

Changes:
- GuestDrawer.tsx: Updated guestId() and handleAskAI() to use canonical format
- GuestRow.tsx: Updated buildGuestId() to use canonical format
- Dashboard.tsx: Updated handleGuestRowClick() and guest rendering loop,
  also fixed legacy metadata fallback to use consistent keying
- ThresholdsTable.tsx: Updated guestsGroupedByNode() to use canonical format

Backend changes:
- Removed temporary debug logging added during investigation
- Added alert history section to AI buildEnrichedResourceContext() function

The backend generates VM/Container IDs in instance:node:vmid format (e.g.,
delly:delly:101) via makeGuestID(). This format is now consistently used
across all frontend fallbacks to prevent AI context, metadata, overrides,
and metrics from colliding or desyncing in clustered environments.
2025-12-20 22:11:35 +00:00
rcourtman
ae522c9a2b fix: Allow all threshold types (Storage, Temperature, Host Agent) to be set to 0 to disable alerting
- Fixed normalizeStorageDefaults to allow Trigger=0
- Fixed normalizeNodeDefaults (Temperature) to allow Trigger=0
- Added comprehensive tests for all threshold normalization patterns
- Updated existing test that expected old behavior

Related to #864
2025-12-20 20:42:23 +00:00
rcourtman
781442cdd0 test: Add comprehensive tests for Host Agent threshold normalization with Trigger=0. Related to #864 2025-12-20 20:32:59 +00:00
rcourtman
db5e79bb37 fix: Allow Host Agent thresholds to be set to 0 to disable alerting. Related to #864 2025-12-20 20:25:20 +00:00
rcourtman
7f05d87809 fix: add missing HandleLicenseFeatures method and related changes
- Add HandleLicenseFeatures handler that was missing from license_handlers.go
- Add /api/license/features route to router
- Update AI service and metadata provider
- Update frontend license API and components
- Fix CI build failure caused by tests referencing unimplemented method
2025-12-19 22:59:52 +00:00
rcourtman
4d1138793d feat(license): add initial license implementation structure to fix build 2025-12-19 17:01:57 +00:00
rcourtman
0d6aaff253 fix: AI Patrol frequency not obeying settings
Fixes #858

The patrol interval setting was not being properly applied due to:

1. ReconfigurePatrol() was setting the deprecated QuickCheckInterval field
   instead of the preferred Interval field

2. SetConfig() was comparing raw field values instead of using GetInterval()
   to compare effective intervals, causing change detection to fail

3. The API response was missing interval_ms, preventing the frontend from
   displaying the correct interval

Changes:
- Update StartPatrol() and ReconfigurePatrol() to use the Interval field
- Fix SetConfig() to use GetInterval() for interval comparison
- Add IntervalMs to PatrolStatusResponse and include it in the API response
2025-12-18 21:33:50 +00:00
rcourtman
c91307be94 fix: guest URL icon now appears/disappears immediately after AI sets/removes it
The issue was a SolidJS reactivity problem in the Dashboard component.
When guestMetadata signal was accessed inside a For loop callback and
assigned to a plain variable, SolidJS lost reactive tracking.

Changed from:
  const metadata = guestMetadata()[guestId] || ...
  customUrl={metadata?.customUrl}

To:
  const getMetadata = () => guestMetadata()[guestId] || ...
  customUrl={getMetadata()?.customUrl}

This ensures SolidJS properly tracks the signal dependency when the
getter function is called directly in JSX props.
2025-12-18 14:42:47 +00:00
rcourtman
54fc259221 fix(ai): improve AI settings UX with validation and smart fallbacks
Backend:
- Add smart provider fallback when selected model's provider isn't configured
- Automatically switch to a model from a configured provider instead of failing
- Log warning when fallback occurs for visibility

Frontend (AISettings.tsx):
- Add helper functions to check if model's provider is configured
- Group model dropdown: configured providers first, unconfigured marked with ⚠️
- Add inline warning when selecting model from unconfigured provider
- Validate on save that model's provider is configured (or being added)
- Warn before clearing last configured provider (would disable AI)
- Warn before clearing provider that current model uses
- Add patrol interval validation (must be 0 or >= 10 minutes)
- Show red border + inline error for invalid patrol intervals 1-9
- Update patrol interval hint: '(0=off, 10+ to enable)'

These changes prevent confusing '500 Internal Server Error' and
'AI is not enabled or configured' errors when model/provider mismatch.
2025-12-17 18:30:19 +00:00
rcourtman
7acff2215c style: remove emojis from AI context formatting and prompts
Replaced emoji indicators with text equivalents for better cross-platform
compatibility and cleaner LLM prompts.
2025-12-13 21:26:49 +00:00
rcourtman
26802cd7bf feat(backend): Implement remaining TODOs
1. resources/store.go: Implement sorting in Query.Execute()
   - Added sortResources function with support for common fields
   - Supports: name, type, status, cpu, memory, disk, last_seen
   - Both ascending and descending order supported

2. ai/service.go: Implement hasAgentForTarget properly
   - Now maps target to specific agent based on hostname/node
   - Uses ResourceProvider lookup for container→host mapping
   - Supports cluster peer routing for Proxmox clusters
   - Properly handles single-agent vs multi-agent scenarios
2025-12-13 13:21:23 +00:00
rcourtman
23a27b5b93 fix: correct AI tool description for guest resource ID format
The set_resource_url tool had an incorrect example ID format ('pve1-delly-101')
which caused the AI to save URLs with wrong IDs that didn't match the actual
guest IDs used by Pulse ('instance-VMID' format like 'delly-150').

This fix updates the tool description to clearly document the correct format,
so URLs saved by the AI will now properly appear in the dashboard.
2025-12-12 21:28:34 +00:00
rcourtman
8b077f69ce feat: AI security and policy improvements for 5.0
- Add DOMPurify sanitization for AI chat markdown rendering (XSS fix)
- Configure DOMPurify to add target=_blank and rel=noopener to links
- Update system prompt to align with command approval policy
- Clarify safe vs destructive commands in prompt
- Improve patrol auto-fix mode guidance with safe operation list
- Add verification requirements for auto-fix actions
- Update observe-only mode to be clearer about read-only restrictions
2025-12-12 17:38:55 +00:00