Commit graph

65 commits

Author SHA1 Message Date
rcourtman
d9f1f7accd feat(ai): add real-time anomaly detection endpoint
Add /api/ai/intelligence/anomalies endpoint that compares live metrics
against learned baselines to surface deviations - all deterministic
(no LLM required).

Backend:
- Add AnomalyReport struct with severity classification
- Add CheckResourceAnomalies method to baseline store
- Add HandleGetAnomalies API handler
- Add GetStateProvider getter to AI service

Frontend:
- Add AnomalyReport and AnomaliesResponse types
- Add getAnomalies API function
- Add AnomalySeverity type

This is the first step toward surfacing deterministic intelligence
directly in the UI without requiring LLM interaction.
2025-12-21 10:52:54 +00:00
rcourtman
417a523d85 feat(ai): add unified Intelligence orchestrator
- Create Intelligence struct that aggregates all AI subsystems
- Add /api/ai/intelligence endpoint for system-wide and per-resource insights
- Wire Intelligence into PatrolService as a facade (not replacement)
- Add TypeScript types and API client for frontend
- Add unit tests for Intelligence orchestrator
- Fix pre-existing test failures using diagnostic commands instead of actionable ones

The Intelligence orchestrator provides:
- System-wide health scoring (A-F grades)
- Aggregated findings, predictions, correlations
- Per-resource context generation for AI prompts
- Learning progress tracking

This unifies access to AI subsystems without replacing existing code paths.
2025-12-21 10:32:02 +00:00
rcourtman
96573f4aca feat: enhance AI baseline context visibility and incident timeline improvements
Backend:
- Enhanced buildEnrichedResourceContext to ALWAYS show learned baselines with
  status indicators (normal/elevated/anomaly) instead of only when anomalous
- This makes Pulse Pro's 'moat' visible - users can see the AI understands
  their infrastructure's normal behavior patterns
- Added baseline import to service.go

Frontend (user changes):
- Added incident event type filtering with toggle buttons
- Added resource incident panel to view all incidents for a resource
- Added timeline expand/collapse functionality in alert history
- Added incident note saving with proper incidentId tracking
- Added startedAt parameter for proper incident timeline loading
2025-12-21 00:14:20 +00:00
rcourtman
5173fc3162 fix: normalize guest ID fallbacks to canonical instance:node:vmid format
Multiple frontend components were using - as a fallback
when guest.id was falsy. This format drops the node component, which is
critical for clustered setups where the same VMID can exist on different
nodes.

Changes:
- GuestDrawer.tsx: Updated guestId() and handleAskAI() to use canonical format
- GuestRow.tsx: Updated buildGuestId() to use canonical format
- Dashboard.tsx: Updated handleGuestRowClick() and guest rendering loop,
  also fixed legacy metadata fallback to use consistent keying
- ThresholdsTable.tsx: Updated guestsGroupedByNode() to use canonical format

Backend changes:
- Removed temporary debug logging added during investigation
- Added alert history section to AI buildEnrichedResourceContext() function

The backend generates VM/Container IDs in instance:node:vmid format (e.g.,
delly:delly:101) via makeGuestID(). This format is now consistently used
across all frontend fallbacks to prevent AI context, metadata, overrides,
and metrics from colliding or desyncing in clustered environments.
2025-12-20 22:11:35 +00:00
rcourtman
ae522c9a2b fix: Allow all threshold types (Storage, Temperature, Host Agent) to be set to 0 to disable alerting
- Fixed normalizeStorageDefaults to allow Trigger=0
- Fixed normalizeNodeDefaults (Temperature) to allow Trigger=0
- Added comprehensive tests for all threshold normalization patterns
- Updated existing test that expected old behavior

Related to #864
2025-12-20 20:42:23 +00:00
rcourtman
781442cdd0 test: Add comprehensive tests for Host Agent threshold normalization with Trigger=0. Related to #864 2025-12-20 20:32:59 +00:00
rcourtman
db5e79bb37 fix: Allow Host Agent thresholds to be set to 0 to disable alerting. Related to #864 2025-12-20 20:25:20 +00:00
rcourtman
7f05d87809 fix: add missing HandleLicenseFeatures method and related changes
- Add HandleLicenseFeatures handler that was missing from license_handlers.go
- Add /api/license/features route to router
- Update AI service and metadata provider
- Update frontend license API and components
- Fix CI build failure caused by tests referencing unimplemented method
2025-12-19 22:59:52 +00:00
rcourtman
65e38fac91 test: improve test coverage for AI, license, config, and monitoring packages
New test files:
- internal/ai/providers/gemini_test.go: Comprehensive Gemini provider tests
- internal/api/ai_intelligence_handlers_test.go: AI intelligence endpoint tests
- internal/api/ai_patrol_handlers_test.go: AI patrol endpoint tests
- internal/api/license_handlers_test.go: License API handler tests
- internal/api/security_oidc_response_test.go: OIDC response formatting tests
- internal/config/ai_config_test.go: AI configuration function tests
- internal/config/persistence_ai_test.go: AI config persistence tests
- internal/config/persistence_extended_test.go: Extended persistence tests
- internal/license/persistence_test.go: License persistence tests
- internal/license/pubkey_test.go: Public key handling tests
- internal/monitoring/host_agent_temps_test.go: Temperature processing tests

Enhanced existing files:
- internal/api/updates_test.go: Added update handler tests
- internal/license/license_test.go: Added Service method tests

Coverage improvements:
- ai/providers: 57.3% -> 73.0% (+15.7%)
- license: 78.3% -> 85.9% (+7.6%)
- config: 49.7% -> 53.9% (+4.2%)
- monitoring: 49.8% -> 50.8% (+1.0%)
- api: 28.4% -> 29.8% (+1.4%)
2025-12-19 22:49:30 +00:00
rcourtman
a1f811cb9e test(ai): improve AI package test coverage from 59.7% to 69.5%
Add comprehensive tests for:
- alert_triggered.go: analysis functions (92%+ coverage)
- patrol_history_persistence.go: all store methods (100%)
- patrol.go: helper functions and getters (100%)
- findings.go: Add edge cases, severity escalation (100%)
- Export functions: all config/detector constructors (100%)

New test files created:
- patrol_history_persistence_test.go
- exports_test.go
- service_extended_test.go
- service_remediation_test.go
- service_tools_test.go
- mock_test.go

Also add coverage.html to .gitignore to exclude generated coverage reports.
2025-12-19 21:53:06 +00:00
rcourtman
4d1138793d feat(license): add initial license implementation structure to fix build 2025-12-19 17:01:57 +00:00
rcourtman
0d6aaff253 fix: AI Patrol frequency not obeying settings
Fixes #858

The patrol interval setting was not being properly applied due to:

1. ReconfigurePatrol() was setting the deprecated QuickCheckInterval field
   instead of the preferred Interval field

2. SetConfig() was comparing raw field values instead of using GetInterval()
   to compare effective intervals, causing change detection to fail

3. The API response was missing interval_ms, preventing the frontend from
   displaying the correct interval

Changes:
- Update StartPatrol() and ReconfigurePatrol() to use the Interval field
- Fix SetConfig() to use GetInterval() for interval comparison
- Add IntervalMs to PatrolStatusResponse and include it in the API response
2025-12-18 21:33:50 +00:00
rcourtman
0182cc8310 feat(thresholds): add collapsible accordion sections and UX improvements
- Add CollapsibleSection component with animated expand/collapse
- Wrap all 6 resource sections (Nodes, VMs, PBS, Storage, Backups, Snapshots) with accordion UI
- Add section icons and resource counts in headers
- Add expand all / collapse all buttons for quick navigation
- Make help banner dismissible with localStorage persistence
- Add Ctrl/Cmd+F keyboard shortcut to focus search
- Add keyboard shortcut hint badge on search input
- Add icons to tab navigation for quick identification
- Improve mobile tab labels with shorter text on small screens
- Create reusable components: ThresholdBadge, ResourceCard, GlobalDefaultsRow
- Create useCollapsedSections hook with localStorage persistence
- Default less-used sections (Storage, Backups, Snapshots, PBS) to collapsed
2025-12-18 15:47:44 +00:00
rcourtman
c91307be94 fix: guest URL icon now appears/disappears immediately after AI sets/removes it
The issue was a SolidJS reactivity problem in the Dashboard component.
When guestMetadata signal was accessed inside a For loop callback and
assigned to a plain variable, SolidJS lost reactive tracking.

Changed from:
  const metadata = guestMetadata()[guestId] || ...
  customUrl={metadata?.customUrl}

To:
  const getMetadata = () => guestMetadata()[guestId] || ...
  customUrl={getMetadata()?.customUrl}

This ensures SolidJS properly tracks the signal dependency when the
getter function is called directly in JSX props.
2025-12-18 14:42:47 +00:00
rcourtman
54fc259221 fix(ai): improve AI settings UX with validation and smart fallbacks
Backend:
- Add smart provider fallback when selected model's provider isn't configured
- Automatically switch to a model from a configured provider instead of failing
- Log warning when fallback occurs for visibility

Frontend (AISettings.tsx):
- Add helper functions to check if model's provider is configured
- Group model dropdown: configured providers first, unconfigured marked with ⚠️
- Add inline warning when selecting model from unconfigured provider
- Validate on save that model's provider is configured (or being added)
- Warn before clearing last configured provider (would disable AI)
- Warn before clearing provider that current model uses
- Add patrol interval validation (must be 0 or >= 10 minutes)
- Show red border + inline error for invalid patrol intervals 1-9
- Update patrol interval hint: '(0=off, 10+ to enable)'

These changes prevent confusing '500 Internal Server Error' and
'AI is not enabled or configured' errors when model/provider mismatch.
2025-12-17 18:30:19 +00:00
rcourtman
71e1b5dc86 test: expand AI provider test coverage with HTTP mocks 2025-12-17 15:53:56 +00:00
rcourtman
969fa0e509 test: add unit tests for AI, Kubernetes agent, and clients 2025-12-17 12:47:36 +00:00
rcourtman
667119269d feat: Add tool/function calling support to Ollama provider
Fixes issue where Ollama users get 'I'm a large language model, I can't do XYZ'
responses when trying to use the AI assistant. The problem was that the
Ollama provider was not passing tool definitions to the API.

Changes:
- Add Tools field to ollamaRequest struct
- Add ollamaTool, ollamaToolFunction, ollamaToolCall structs
- Convert tools from ChatRequest to Ollama format in Chat()
- Parse tool_calls from Ollama response
- Set StopReason to 'tool_use' when model requests tool execution
- Handle tool results in multi-turn conversations

Requires Ollama v0.3.0+ and a tool-capable model (llama3.1+, mistral-nemo, etc.)

Closes: Discussion #845 comment by misterlegend
2025-12-17 11:54:32 +00:00
rcourtman
b79d04f734 Add comprehensive AI test coverage
- Add integration tests for Ollama provider (17 tests against real API)
- Add unit tests for baseline, correlation, patterns, memory, knowledge, cost packages
- Add context formatter and builder tests
- Add factory tests for provider initialization
- Add Makefile targets: test-integration, test-all
- Clean up test theatre (removed struct field tests)

Integration tests require Ollama at OLLAMA_URL (default: 192.168.0.124:11434)
Run with: make test-integration
2025-12-16 12:33:06 +00:00
rcourtman
47ced7c97e feat(ui): make AI settings page more compact and user-friendly
- Replace verbose info banner with streamlined layout
- Add collapsible 'Advanced Model Selection' accordion for Chat/Patrol models
- Make AI Patrol Settings section collapsible with inline summary badges
- Compact Cost Controls into single-row inline layout
- Reduce form spacing for tighter presentation
- Remove unused formHelpText import

Also includes:
- OpenAI provider fixes for max_tokens parameters
- Security setup CSRF and 401 fixes
- Minor UI tweaks
2025-12-16 09:20:09 +00:00
rcourtman
fb01d87b00 fix: strip provider prefix from all AI provider models and instant URL refresh
Backend fixes:
- Strip provider prefix (anthropic:, openai:, deepseek:, ollama:) in all
  provider Chat methods and constructors for robust handling
- Models are now correctly parsed regardless of caller format

Frontend fixes:
- Tool cards now persist in AI chat after approval execution by adding
  to streamEvents array
- Dashboard now listens for pulse:metadata-changed custom event
- AI Chat emits this event when set_resource_url tool completes
- Guest URL icons now update instantly when AI sets them
2025-12-15 19:18:09 +00:00
rcourtman
3d6da91ac0 feat(ai): improve AI settings first-time setup UX
- Add setup modal that appears when enabling AI without configured provider
- Modal allows selecting provider (Anthropic, OpenAI, DeepSeek, Ollama)
- Enter API key/URL and enable AI in one smooth flow
- Reorder backend to apply API keys before enabled check
- Fix Ollama to strip 'ollama:' prefix from model names
- Simplify backend error message for unconfigured providers
2025-12-15 18:59:19 +00:00
rcourtman
0e8c8d51ca fix(ai): add fallback default model when Ollama model is empty
When model is not explicitly set in config or request, fall back to
llama3 to prevent 'model is required' errors from Ollama.
2025-12-15 16:59:51 +00:00
rcourtman
8687d69242 fix(ai): normalize Ollama base URL to prevent 405 errors
Users sometimes enter URLs with trailing slashes or include the /api path:
- http://host:11434/  -> would become http://host:11434//api/chat
- http://host:11434/api -> would become http://host:11434/api/api/chat

Now we strip trailing slashes and /api suffix during client initialization.

Fixes #847
2025-12-15 16:51:52 +00:00
rcourtman
7acff2215c style: remove emojis from AI context formatting and prompts
Replaced emoji indicators with text equivalents for better cross-platform
compatibility and cleaner LLM prompts.
2025-12-13 21:26:49 +00:00
rcourtman
26802cd7bf feat(backend): Implement remaining TODOs
1. resources/store.go: Implement sorting in Query.Execute()
   - Added sortResources function with support for common fields
   - Supports: name, type, status, cpu, memory, disk, last_seen
   - Both ascending and descending order supported

2. ai/service.go: Implement hasAgentForTarget properly
   - Now maps target to specific agent based on hostname/node
   - Uses ResourceProvider lookup for container→host mapping
   - Supports cluster peer routing for Proxmox clusters
   - Properly handles single-agent vs multi-agent scenarios
2025-12-13 13:21:23 +00:00
rcourtman
23a27b5b93 fix: correct AI tool description for guest resource ID format
The set_resource_url tool had an incorrect example ID format ('pve1-delly-101')
which caused the AI to save URLs with wrong IDs that didn't match the actual
guest IDs used by Pulse ('instance-VMID' format like 'delly-150').

This fix updates the tool description to clearly document the correct format,
so URLs saved by the AI will now properly appear in the dashboard.
2025-12-12 21:28:34 +00:00
rcourtman
6aefeca979 feat: Enhance OCI container display and AI context
- Frontend: Add ociImage memo to extract clean image name from osTemplate
- Frontend: Show OCI image name in type badge tooltip
- Frontend: Display OCI image in OS column when no guest agent info available
- Frontend: Include ociImage in AI context data for selected OCI containers
- Backend: Differentiate OCI containers as 'oci_container' type in AI context
- Backend: Add Metadata field to ResourceContext for extensibility
- Backend: Include oci_image in container metadata for AI analysis
- Backend: Update section heading to 'LXC/OCI Containers' in AI context

This follows Docker container patterns to avoid duplicating work.
2025-12-12 18:00:09 +00:00
rcourtman
8b077f69ce feat: AI security and policy improvements for 5.0
- Add DOMPurify sanitization for AI chat markdown rendering (XSS fix)
- Configure DOMPurify to add target=_blank and rel=noopener to links
- Update system prompt to align with command approval policy
- Clarify safe vs destructive commands in prompt
- Improve patrol auto-fix mode guidance with safe operation list
- Add verification requirements for auto-fix actions
- Update observe-only mode to be clearer about read-only restrictions
2025-12-12 17:38:55 +00:00
rcourtman
6f0379f879 feat(api): Add AI intelligence API endpoints
Expose learned AI intelligence data via REST API:

New endpoints:
- GET /api/ai/intelligence/patterns - Detected failure patterns
- GET /api/ai/intelligence/predictions - Failure predictions
- GET /api/ai/intelligence/correlations - Resource correlations
- GET /api/ai/intelligence/changes - Recent infrastructure changes
- GET /api/ai/intelligence/baselines - Learned baselines

All endpoints support ?resource_id filter for per-resource queries.
Changes endpoint supports ?hours filter (default: 24).

Backend additions:
- ai_intelligence_handlers.go - Handler implementations
- baseline.Store.GetAllBaselines() - Flat baseline export
- patrol.GetChangeDetector() - Access change detector

This enables frontend to display:
- 'OOM expected in 3 days based on pattern'
- 'When storage-1 is full, database VM restarts'
- 'VM memory baseline: 60-75%'

All tests passing.
2025-12-12 14:49:46 +00:00
rcourtman
9539ddaa6b feat(ai): Add multi-resource correlation detection (Phase 6)
Create internal/ai/correlation package:

1. Correlation Detector (detector.go):
   - Tracks events across resources
   - Detects when events on one resource follow events on another
   - Calculates average delay between correlated events
   - Confidence scoring based on occurrence count
   - Persists to ai_correlations.json

2. Features:
   - GetCorrelations() - All detected relationships
   - GetCorrelationsForResource() - Relationships for one resource
   - GetDependencies() - What resources depend on this one
   - GetDependsOn() - What this resource depends on
   - PredictCascade() - Predict what will be affected
   - FormatForContext() - AI-consumable summary

3. Integration:
   - Wire to alert history in router startup
   - Map alert types to correlation event types
   - Add correlation context to enriched AI context

Example AI context now includes:
'When local-zfs experiences high usage, database often follows within 5 minutes'

This enables the AI to understand infrastructure dependencies
and predict cascade failures.

All tests passing.
2025-12-12 14:26:10 +00:00
rcourtman
e76e86b298 feat(ai): Add failure pattern detection for predictive intelligence (Phase 5)
Create internal/ai/patterns package:

1. Pattern Detector (detector.go):
   - Records historical events (high memory, OOM, restarts, etc.)
   - Detects recurring failure patterns
   - Calculates average interval between occurrences
   - Computes confidence based on pattern consistency
   - Predicts when failures will occur again
   - Persists to ai_patterns.json

2. Event types tracked:
   - high_memory, high_cpu, disk_full
   - oom, restart, unresponsive
   - backup_failed

3. Integration:
   - Wire PatternDetector into router startup
   - Add to AI context in buildEnrichedContext
   - FormatForContext generates failure predictions

Example AI context now includes:
'OOM events typically occurs every ~10 days (next expected in ~3 days)'

This enables proactive alerts before problems recur.

All tests passing.
2025-12-12 14:11:28 +00:00
rcourtman
6a8745c7b3 feat(ai): Log command executions and show remediation history in prompts
Phase 4 - Remediation logging integration:

1. logRemediation hook after tool execution:
   - Only logs run_command tools (main remediation action)
   - Records resourceID, resourceType, findingID
   - Extracts problem summary from user prompt
   - Truncates output for storage (max 1000 chars)
   - Distinguishes automatic (patrol) vs manual (chat) actions

2. buildRemediationContext for system prompts:
   - Shows 'Past Successful Fixes for Similar Issues' section
   - Uses keyword matching to find relevant past fixes
   - Shows 'Remediation History for This Resource' section
   - Includes timestamps and outcomes

This enables the AI to say things like:
- 'This worked before: apt clean to free 6GB (resolved)'
- 'Last time on this resource: restarted nginx (resolved)'

All tests passing.
2025-12-12 14:02:14 +00:00
rcourtman
c63d7828a0 feat(ai): Wire operational memory into router startup
Complete Phase 3 integration:

- Initialize ChangeDetector and RemediationLog in StartPatrol
- Add SetChangeDetector/SetRemediationLog to handler chain:
  Router -> AISettingsHandler -> Service -> PatrolService
- Persist change history to ai_changes.json
- Persist remediation log to ai_remediations.json
- Both use the Pulse config directory for storage

Operational memory is now fully integrated:
- Change detector tracks infrastructure changes on each patrol
- Recent changes (24h) are appended to AI context
- Remediation log ready for command execution logging

All tests passing.
2025-12-12 13:54:38 +00:00
rcourtman
58e7091666 feat(ai): Wire change detection into patrol service
Integrate operational memory into patrol context:

- Add changeDetector and remediationLog fields to PatrolService
- Add SetChangeDetector and SetRemediationLog methods
- Integrate change detection into buildEnrichedContext
- Convert state to ResourceSnapshots for change tracking
- Append recent changes summary to AI context

The AI now sees a 'Recent Infrastructure Changes (24h)' section
showing events like:
- VM 'web-server' status changed: running → stopped (2h ago)
- 'db-server' migrated from node1 to node2 (4h ago)
- 'web-server' memory increased: 4 GB → 8 GB (1d ago)

All tests passing.
2025-12-12 13:53:04 +00:00
rcourtman
7ed985a690 feat(ai): Add operational memory (Phase 3) - change detection and remediation logging
Phase 3 of Pulse AI differentiation:

Create internal/ai/memory package with:

1. Change Detection (changes.go):
   - Tracks infrastructure changes: creation, deletion, config changes
   - Detects status changes (started, stopped)
   - Detects VM/container migrations between nodes
   - Detects CPU/memory configuration changes
   - Detects backup completions
   - Persists change history to ai_changes.json
   - GetChangesSummary for AI context

2. Remediation Logging (remediation.go):
   - Records actions taken to fix problems
   - Tracks command, output, and outcome
   - Links to AI findings via findingID
   - GetSimilar finds past similar problems
   - GetSuccessfulRemediations for learning
   - Persists to ai_remediations.json

3. Type exports (memory_exports.go):
   - Clean re-exports from ai package

This enables the AI to say things like:
- 'This VM was migrated 2 hours ago'
- 'Memory was increased from 4GB to 8GB yesterday'
- 'Last time this happened, restarting nginx resolved it'

All tests passing.
2025-12-12 13:49:37 +00:00
rcourtman
21abb6ef01 Clarify AI cost estimates with pricing coverage 2025-12-12 13:19:03 +00:00
rcourtman
4aea5ed730 Unify provider/model normalization for AI cost export 2025-12-12 13:04:42 +00:00
rcourtman
c598069da3 Add AI cost export and top target rollups 2025-12-12 12:55:39 +00:00
rcourtman
54a3c3c47d Persist AI cost budget and allow history reset 2025-12-12 12:10:58 +00:00
rcourtman
b3f283e7f5 Improve AI cost dashboard ranges and breakdowns 2025-12-12 11:35:41 +00:00
rcourtman
8310974634 feat(ai): Wire baseline learning loop into router startup
Complete Phase 2 baseline integration:

- Add baseline_exports.go for clean type aliasing
- Wire baseline store initialization into StartPatrol
- Implement startBaselineLearning background loop
  - Runs initial learning after 5 min delay
  - Updates baselines every hour from metrics history
  - Learns from 7 days of data for nodes, VMs, containers
- Add SetBaselineStore methods throughout the chain
  (Router -> AIHandler -> Service -> PatrolService)
- Persists baselines to data directory as JSON

The baseline learning loop:
1. Starts automatically when AI patrol starts
2. Queries metrics history for all resources
3. Computes mean, stddev, percentiles for cpu/memory/disk
4. Saves baselines to disk for durability
5. Anomaly detection uses these baselines in context builder

All tests passing.
2025-12-12 11:29:47 +00:00
rcourtman
5a77fab633 feat(ai): Add baseline learning and anomaly detection (Phase 2)
Phase 2 of Pulse AI differentiation:

- Create internal/ai/baseline package for learned baselines
- Implement statistical baseline learning with mean, stddev, percentiles
- Add z-score based anomaly detection with severity classification
  (low, medium, high, critical based on standard deviations)
- Integrate baseline provider into context builder
- Wire baseline store into patrol service with adapters
- Add anomaly enrichment to resource contexts

Key features:
- Learn computes baseline from historical metric data points
- IsAnomaly and CheckAnomaly detect deviations from normal
- Persists baselines to disk as JSON for durability
- Formatted anomaly descriptions for AI consumption
  Example: 'Memory is high above normal (85.2% vs typical 42.1% ± 8.3%)'

The baseline store needs to be initialized and triggered to learn
from metrics history. Next step is adding the learning loop.

All tests passing.
2025-12-12 11:26:31 +00:00
rcourtman
716a0b8c4d Fix DeepSeek cost attribution and pricing 2025-12-12 10:49:56 +00:00
rcourtman
50c171e3b5 Add estimated USD to AI cost dashboard 2025-12-12 10:43:07 +00:00
rcourtman
88d419dd5b feat(ai): Add enriched context with historical trends and predictions
Phase 1 of Pulse AI differentiation:

- Create internal/ai/context package with types, trends, builder, formatter
- Implement linear regression for trend computation (growing/declining/stable/volatile)
- Add storage capacity predictions (predicts days until 90% and 100%)
- Wire MetricsHistory from monitor to patrol service
- Update patrol to use buildEnrichedContext instead of basic summary
- Update patrol prompt to reference trend indicators and predictions

This gives the AI awareness of historical patterns, enabling it to:
- Identify resources with concerning growth rates
- Predict capacity exhaustion before it happens
- Distinguish between stable high usage vs growing problems
- Provide more actionable, time-aware insights

All tests passing. Falls back to basic summary if metrics history unavailable.
2025-12-12 09:45:57 +00:00
rcourtman
e842f523b7 feat: Implement multi-provider AI support
Backend:
- Add per-provider API key fields to AIConfig (AnthropicAPIKey, OpenAIAPIKey, DeepSeekAPIKey, OllamaBaseURL, OpenAIBaseURL)
- Add NewForProvider() and NewForModel() factory functions for multi-provider instantiation
- Update ListModels() to aggregate models from all configured providers with provider:model format
- Update Execute/ExecuteStream to dynamically create provider based on selected model
- Update TestConnection to use multi-provider aware provider creation
- Add helper functions: HasProvider(), GetConfiguredProviders(), GetAPIKeyForProvider(), GetBaseURLForProvider(), ParseModelString(), FormatModelString()

Frontend:
- Remove legacy single-provider UI (provider grid, single API key input, single base URL)
- Add accordion-style UI for configuring all providers independently
- Add model grouping by provider in selectors using optgroup
- Update AIChat model dropdown with grouped provider sections
- Add helper functions for parsing provider from model ID and grouping models

API:
- Add multi-provider fields to AISettingsResponse and AISettingsUpdateRequest
- Add /api/ai/models endpoint for dynamic model listing
- Update settings handlers for per-provider credential management
2025-12-11 16:00:45 +00:00
rcourtman
40236317fb feat(ai): Add suppression rules management API and UI
Users can now:
1. View all suppression rules (both from dismissed findings and manually created)
2. Create manual rules like 'ignore performance issues on debian-go'
3. Delete rules when they want alerts to come back

Backend:
- Added SuppressionRule type for user-defined rules
- Added suppressionRules storage to FindingsStore
- Added AddSuppressionRule/GetSuppressionRules/DeleteSuppressionRule methods
- Added isSuppressedInternal check for manual rules
- Added API handlers and routes for /api/ai/patrol/suppressions

Frontend:
- Added SuppressionRule interface
- Added getSuppressionRules/addSuppressionRule/deleteSuppressionRule API functions
- Added getDismissedFindings for viewing dismissed findings

Example usage:
POST /api/ai/patrol/suppressions
{
  'resource_id': 'debian-go',
  'category': 'performance',
  'description': 'Dev container runs hot - expected'
}
2025-12-11 00:12:18 +00:00
rcourtman
33af1627f4 fix(ai): Make LLM finding IDs stable across patrol runs
The main issue was that finding IDs included the title, which the LLM
generates differently each time. 'High CPU on minipc' vs 'Node minipc
experiencing high CPU load' got different IDs, making dismissals useless.

Changes:
1. LLM findings now get IDs based on resource+category only, not title
2. Add() now checks if finding is suppressed before adding as new
3. Add() now checks dismissed findings and only reactivates on severity escalation
4. IsSuppressed() now matches by resource+category only, not title
5. Added isSuppressedInternal() for use when lock is already held

Now when you dismiss 'performance issues on minipc', any future patrol finding
about performance on minipc will be recognized as the same issue and stay dismissed.
2025-12-11 00:03:17 +00:00
rcourtman
9a32c4fdae fix(ai): Use context.Background() for forced patrol runs
The ForcePatrol() function was using the HTTP request context, which gets
cancelled immediately when the API response is sent. This caused LLM analysis
to fail with 'context canceled' before it could complete.

Now uses context.Background() so the goroutine runs independently of the
HTTP request lifecycle.

Also fixed dropdown hover gap issue in the dismiss menu.
2025-12-10 23:31:21 +00:00