Commit graph

30 commits

Author SHA1 Message Date
rcourtman
c575c7e295 fix(patrol): rename wearout JSON field to ssd_life_remaining_pct (#1300)
The AI also receives disk data via tool calls (pulse_metrics type="disks"),
not just the patrol context table. The raw JSON field "wearout" was
ambiguous — rename to "ssd_life_remaining_pct" so the field name itself
communicates that 100 = healthy.
2026-02-27 23:12:27 +00:00
rcourtman
3006f51b60 fix(patrol): clarify wearout semantics so AI knows 100% = healthy (#1300)
The patrol context table header said "Wearout" and the tool returned a raw
"wearout" JSON field with no indication that 100 = full life remaining.
The AI interpreted "wearout: 100" as fully worn out and raised false
"100% Disk Wearout" findings on healthy NVMe drives.

Rename the patrol table column to "SSD Life Remaining (100%=new)" and
update the data type comment to clarify the semantics.
2026-02-27 23:05:02 +00:00
rcourtman
7efcec3120 fix(agents,ai): host URL field, AI Docker routing, Proxmox registration logging (#1197, #1210, #1267)
#1197: Add Custom URL input to the expanded host row in Settings → Agents.
Loads existing URL via HostMetadataAPI on row expand; saves on button click.
Only shown for host-type agent rows.

#1210: Fix agent_connected always false for Docker hosts on Proxmox VMs.
connectedAgentHostnames now also marks Docker host hostnames reachable when
their matching VM/LXC has a node with a connected Proxmox agent, mirroring
the routing logic already used in the control path.

#1267/#1269: Improve Proxmox auto-registration failure logging. Response body
is now included in the error message, and the warning directs users to delete
the state file to force re-registration rather than claiming the node exists.

(cherry picked from commit 305f6d3c94f0da4fc970450a6304da57d6d7fe80)
2026-02-18 12:57:09 +00:00
rcourtman
69e3286e5e security: fix AI OAuth scope bypass, approval replay attacks, and approval endpoint scope gating
- OAuth endpoints now require settings:write scope (not just admin)
- Approval endpoints now require ai:execute scope
- Added CommandHash to approvals for replay protection
- Approvals are now single-use (consumed on first use)
- consumeApprovalWithValidation validates command matches approval
2026-02-03 19:15:15 +00:00
rcourtman
36eb381c26 test(ai): add validation tests for file tools 2026-02-02 19:24:11 +00:00
rcourtman
712e5846ec test(ai): add unit tests for discovery adapter
- Add comprehensive tests for DiscoveryMCPAdapter in internal/ai/tools/discovery_adapter_test.go
- Validate strict delegation to DiscoverySource and data transformation
2026-02-02 15:04:45 +00:00
rcourtman
b6bd9fd2d4 feat(ai): add RegisterTool method for runtime tool registration 2026-02-02 11:14:55 +00:00
rcourtman
81ec5c525a feat(ai): parallelize tool execution and refine knowledge extraction
- Implement parallel execution for read-only tools in agentic loop
- Optimize negative marker summaries to be more informative
- Fix memory percentage scaling in query tools
- Add derived memory stats (avg/max) to extraction logic
- Add explicit fresh data intent detection to bypass knowledge gate
- Update associated tests
2026-02-01 00:12:36 +00:00
rcourtman
9b0fb527f5 feat(patrol): implement patrol findings, evaluation, and investigation logic
- Add core Patrol system for automated investigations
- Implement findings management and deduplication logic
- Add evaluation framework (patrol_eval) with quality assertions and scenarios
- Add patrol-specific tools and executor integration
- Add E2E test matrix script
2026-01-31 16:23:08 +00:00
rcourtman
95a0d7a6bd feat(backend): implement AI Patrol, Investigation, and system-wide refactors 2026-01-30 19:02:14 +00:00
rcourtman
e85ec858fd fix(ai): discovery transient error handling, agentic loop detection, and read-only classification
- Discovery: classify transient errors (429, timeout, connection refused, etc.)
  and return IsError:true so models stop retrying rate-limited calls
- Agentic loop: detect identical tool calls repeated >3 times and block with
  LOOP_DETECTED error, forcing the model to try a different approach
- OpenAI provider: skip tool_choice for DeepSeek Reasoner which doesn't support it
- Read-only classifier: fix curl -I case sensitivity (uppercase flags lowered),
  add iostat/vmstat/mpstat/sar/lxc-ls/lxc-info/nc -z to allowlist,
  fix 2>&1 false positive in input redirect detection
2026-01-29 18:29:54 +00:00
rcourtman
f83356b430 feat(ai): add patrol-specific tools for agentic finding creation
Add three new patrol tools that enable the LLM to create findings via
tool calls instead of relying on output parsing:

- patrol_report_finding: Create a structured finding with validation
- patrol_resolve_finding: Mark a finding as resolved
- patrol_get_findings: Query active findings for a resource

These tools are only functional during a patrol run when PatrolFindingCreator
is set on the executor. This approach is more reliable than parsing
JSON from LLM output.
2026-01-28 23:18:42 +00:00
rcourtman
9c2f8a3284 refactor(ai): remove obsolete tool and chat files
Remove files that were consolidated into other modules:
- chat/patrol.go, patrol_test.go → moved to chat/service.go
- tools_infrastructure.go → merged into tools_storage.go
- tools_intelligence.go → merged into tools_metrics.go
- tools_patrol.go → merged into tools_alerts.go
- tools_profiles.go, tools_profiles_test.go → removed (unused)

Update related test file references.
2026-01-28 21:30:24 +00:00
rcourtman
a75393d1c5 refactor(ai): consolidate tool implementations into domain-specific files
- Merge tools_infrastructure.go, tools_intelligence.go, tools_patrol.go,
  tools_profiles.go into their respective domain tools
- Expand tools_control.go with command execution logic
- Expand tools_discovery.go with resource discovery handlers
- Expand tools_storage.go with storage-related operations
- Expand tools_metrics.go with metrics functionality
- Update tests to match new structure

This consolidation reduces file count and groups related functionality together.
2026-01-28 21:21:28 +00:00
rcourtman
23ff4d1337 chore: remove remaining gitignored files from tracking
- analyze_coverage.py (local coverage analysis script)
- coverage_summary.txt (coverage output)
- mock.env (environment file)
2026-01-28 21:19:52 +00:00
rcourtman
0013d64c7b Consolidate and extend AI tool suite
Major tools refactoring for better organization and capabilities:

New consolidated tools:
- pulse_query: Unified resource search, get, config, topology operations
- pulse_read: Safe read-only command execution with NonInteractiveOnly
- pulse_control: Guest lifecycle control (start/stop/restart)
- pulse_docker: Docker container operations
- pulse_file: Safe file read/write operations
- pulse_kubernetes: K8s resource management
- pulse_metrics: Performance metrics retrieval
- pulse_alerts: Alert management
- pulse_storage: Storage pool operations
- pulse_knowledge: Note-taking and recall
- pulse_pmg: Proxmox Mail Gateway integration

Executor improvements:
- Cleaner tool registration pattern
- Better error handling and recovery
- Protocol layer for result formatting
- Enhanced adapter interfaces

Includes comprehensive tests for:
- File and Docker operations
- Kubernetes control operations
- Command execution safety
2026-01-28 16:50:25 +00:00
rcourtman
b2e0ae3fdb Add ExecutionIntent classification and NonInteractiveOnly enforcement
Implement safety layers for command execution:

ExecutionIntent classifies commands as:
- ObservationOnly: Pure read (status, logs, metrics)
- SideEffects: May change state (restart, write, delete)

NonInteractiveOnly enforces safe command forms:
- Blocks interactive commands (vim, top without -b, etc)
- Blocks unbounded streaming (tail -f without limit)
- Suggests safe alternatives in error messages

Add phantom execution detection:
- Catches when model claims actions without using tools
- Skips check when tools actually succeeded (fixes false positives)

Includes comprehensive tests for:
- Intent classification accuracy
- Interactive command blocking
- Strict resolution validation
2026-01-28 16:49:00 +00:00
rcourtman
7f7edfceb4 test: expand backend coverage 2026-01-25 21:08:44 +00:00
rcourtman
27f1a11acb feat: add AI Intelligence system with investigation and forecasting
Major new AI capabilities for infrastructure monitoring:

Investigation System:
- Autonomous finding investigation with configurable autonomy levels
- Investigation orchestrator with rate limiting and guardrails
- Safety checks for read-only mode enforcement
- Chat-based investigation with approval workflows

Forecasting & Remediation:
- Trend forecasting for resource capacity planning
- Remediation engine for generating fix proposals
- Circuit breaker for AI operation protection

Unified Findings:
- Unified store bridging alerts and AI findings
- Correlation and root cause analysis
- Incident coordinator with metrics recording

New Frontend:
- AI Intelligence page with patrol controls
- Investigation drawer for finding details
- Unified findings panel with actions

Supporting Infrastructure:
- Learning store for user preference tracking
- Proxmox event ingestion and correlation
- Enhanced patrol with investigation triggers
2026-01-24 22:41:43 +00:00
rcourtman
c93b54ce9f refactor: clean up AI tools and remove deprecated code
- Remove deprecated tool functions
- Simplify control helpers
- Clean up test files
2026-01-22 22:31:04 +00:00
rcourtman
422efdde61 Restore UI improvements and refine Docker/Hosts display
- Restore 'mini' mode for StackedDiskBar.
- Restore layout fixes (fixed table layout, mobile columns) for Docker and Hosts tables.
- Remove 'Ask AI' and AI context selection features.
- Docker: Use compact 'Cube' icon for Podman pods to prevent name obstruction.
- Docker: Show concise image names (strip registry URL).
- Backend: Include pending fixes for AI providers.
2026-01-22 18:03:35 +00:00
rcourtman
defe298ddd Refactor: AI provider and executor multi-tenancy support
- Updated AI providers and tests for context/tenant awareness
- Refactored tool executor for multi-tenant state handling
- Added new tests for Docker control and update tools
2026-01-22 16:51:45 +00:00
rcourtman
798f6a8deb Refactor: Update AI tools and tests for multi-tenancy
- Refactored tool execution to handle tenant-scoped contexts
- Added new tests for infrastructure, control, and kubernetes tools
- Improved test coverage for agentic chat and approval store
2026-01-22 16:43:08 +00:00
rcourtman
267d5f97e5 Support: Fix OpenAI tool schema error by ensuring properties field is always present
- Removed omitempty from InputSchema.Properties
- Ensures OpenAI accepts tools with no input parameters
2026-01-22 16:41:57 +00:00
rcourtman
6e2cae2363 feat(ui): add history chart components for guest drawer
- HistoryChart: single metric visualization (CPU, memory, disk)
- UnifiedHistoryChart: combined multi-metric view
- Support for time range selection (1h to 90d)
- Responsive charts with proper dark mode support
- Fix corrupted tools_query_test.go from stash merge
2026-01-22 00:46:52 +00:00
rcourtman
f293f41499 refactor: consolidate AI tools tests
- Remove executor_test.go (tests moved to specific tool test files)
- Refactor infrastructure, patrol, profiles, and query tests
- Add query tool enhancements for better resource filtering
2026-01-22 00:43:41 +00:00
rcourtman
36622d2c17 Hide unavailable AI tools 2026-01-20 17:19:47 +00:00
rcourtman
b57b4a7c3c Tighten AI chat routing and context display 2026-01-20 16:30:55 +00:00
rcourtman
96b7370f7b test: improve coverage for API, AI, Alerts, and Frontend Utils
- Add comprehensive tests for internal/api/config_handlers.go (Phases 1-3)
- Improve test coverage for AI tools, chat service, and session management
- Enhance alert and notification tests (ResolvedAlert, Webhook)
- Add frontend unit tests for utils (searchHistory, tagColors, temperature, url)
- Add proximity client API tests
2026-01-20 15:52:39 +00:00
rcourtman
17ca31a557 refactor(ai): Replace mcp package with tools package for direct tool execution
This refactoring removes the MCP (Model Context Protocol) server layer and
converts AI tools to be called directly by the chat service.

Key changes:
- Rename package from internal/ai/mcp to internal/ai/tools
- Remove server.go - tools no longer exposed via MCP server
- Tools are now called directly by the chat service via ExecuteTool()

New tools added:
- Kubernetes: clusters, nodes, pods, deployments (4 tools)
- PMG: mail gateway status, mail stats, queues, spam stats (4 tools)
- Infrastructure: snapshots, PBS jobs, backup tasks, network stats,
  disk I/O, cluster status, swarm, services, tasks, recent tasks,
  physical disks, RAID status, host Ceph, resource disks (14 tools)
- Patrol: connection health, resolved alerts (2 tools)

Test coverage:
- Added comprehensive test files for adapters, infrastructure,
  patrol, profiles, and query tools

Total tools: 50 (was ~25)
2026-01-19 19:17:24 +00:00