The AI also receives disk data via tool calls (pulse_metrics type="disks"),
not just the patrol context table. The raw JSON field "wearout" was
ambiguous — rename to "ssd_life_remaining_pct" so the field name itself
communicates that 100 = healthy.
The patrol context table header said "Wearout" and the tool returned a raw
"wearout" JSON field with no indication that 100 = full life remaining.
The AI interpreted "wearout: 100" as fully worn out and raised false
"100% Disk Wearout" findings on healthy NVMe drives.
Rename the patrol table column to "SSD Life Remaining (100%=new)" and
update the data type comment to clarify the semantics.
#1197: Add Custom URL input to the expanded host row in Settings → Agents.
Loads existing URL via HostMetadataAPI on row expand; saves on button click.
Only shown for host-type agent rows.
#1210: Fix agent_connected always false for Docker hosts on Proxmox VMs.
connectedAgentHostnames now also marks Docker host hostnames reachable when
their matching VM/LXC has a node with a connected Proxmox agent, mirroring
the routing logic already used in the control path.
#1267/#1269: Improve Proxmox auto-registration failure logging. Response body
is now included in the error message, and the warning directs users to delete
the state file to force re-registration rather than claiming the node exists.
(cherry picked from commit 305f6d3c94f0da4fc970450a6304da57d6d7fe80)
- OAuth endpoints now require settings:write scope (not just admin)
- Approval endpoints now require ai:execute scope
- Added CommandHash to approvals for replay protection
- Approvals are now single-use (consumed on first use)
- consumeApprovalWithValidation validates command matches approval
- Add comprehensive tests for DiscoveryMCPAdapter in internal/ai/tools/discovery_adapter_test.go
- Validate strict delegation to DiscoverySource and data transformation
- Discovery: classify transient errors (429, timeout, connection refused, etc.)
and return IsError:true so models stop retrying rate-limited calls
- Agentic loop: detect identical tool calls repeated >3 times and block with
LOOP_DETECTED error, forcing the model to try a different approach
- OpenAI provider: skip tool_choice for DeepSeek Reasoner which doesn't support it
- Read-only classifier: fix curl -I case sensitivity (uppercase flags lowered),
add iostat/vmstat/mpstat/sar/lxc-ls/lxc-info/nc -z to allowlist,
fix 2>&1 false positive in input redirect detection
Add three new patrol tools that enable the LLM to create findings via
tool calls instead of relying on output parsing:
- patrol_report_finding: Create a structured finding with validation
- patrol_resolve_finding: Mark a finding as resolved
- patrol_get_findings: Query active findings for a resource
These tools are only functional during a patrol run when PatrolFindingCreator
is set on the executor. This approach is more reliable than parsing
JSON from LLM output.
Remove files that were consolidated into other modules:
- chat/patrol.go, patrol_test.go → moved to chat/service.go
- tools_infrastructure.go → merged into tools_storage.go
- tools_intelligence.go → merged into tools_metrics.go
- tools_patrol.go → merged into tools_alerts.go
- tools_profiles.go, tools_profiles_test.go → removed (unused)
Update related test file references.
- Merge tools_infrastructure.go, tools_intelligence.go, tools_patrol.go,
tools_profiles.go into their respective domain tools
- Expand tools_control.go with command execution logic
- Expand tools_discovery.go with resource discovery handlers
- Expand tools_storage.go with storage-related operations
- Expand tools_metrics.go with metrics functionality
- Update tests to match new structure
This consolidation reduces file count and groups related functionality together.
Major new AI capabilities for infrastructure monitoring:
Investigation System:
- Autonomous finding investigation with configurable autonomy levels
- Investigation orchestrator with rate limiting and guardrails
- Safety checks for read-only mode enforcement
- Chat-based investigation with approval workflows
Forecasting & Remediation:
- Trend forecasting for resource capacity planning
- Remediation engine for generating fix proposals
- Circuit breaker for AI operation protection
Unified Findings:
- Unified store bridging alerts and AI findings
- Correlation and root cause analysis
- Incident coordinator with metrics recording
New Frontend:
- AI Intelligence page with patrol controls
- Investigation drawer for finding details
- Unified findings panel with actions
Supporting Infrastructure:
- Learning store for user preference tracking
- Proxmox event ingestion and correlation
- Enhanced patrol with investigation triggers
- Restore 'mini' mode for StackedDiskBar.
- Restore layout fixes (fixed table layout, mobile columns) for Docker and Hosts tables.
- Remove 'Ask AI' and AI context selection features.
- Docker: Use compact 'Cube' icon for Podman pods to prevent name obstruction.
- Docker: Show concise image names (strip registry URL).
- Backend: Include pending fixes for AI providers.
- Updated AI providers and tests for context/tenant awareness
- Refactored tool executor for multi-tenant state handling
- Added new tests for Docker control and update tools
- Refactored tool execution to handle tenant-scoped contexts
- Added new tests for infrastructure, control, and kubernetes tools
- Improved test coverage for agentic chat and approval store
- HistoryChart: single metric visualization (CPU, memory, disk)
- UnifiedHistoryChart: combined multi-metric view
- Support for time range selection (1h to 90d)
- Responsive charts with proper dark mode support
- Fix corrupted tools_query_test.go from stash merge
- Remove executor_test.go (tests moved to specific tool test files)
- Refactor infrastructure, patrol, profiles, and query tests
- Add query tool enhancements for better resource filtering
- Add comprehensive tests for internal/api/config_handlers.go (Phases 1-3)
- Improve test coverage for AI tools, chat service, and session management
- Enhance alert and notification tests (ResolvedAlert, Webhook)
- Add frontend unit tests for utils (searchHistory, tagColors, temperature, url)
- Add proximity client API tests
This refactoring removes the MCP (Model Context Protocol) server layer and
converts AI tools to be called directly by the chat service.
Key changes:
- Rename package from internal/ai/mcp to internal/ai/tools
- Remove server.go - tools no longer exposed via MCP server
- Tools are now called directly by the chat service via ExecuteTool()
New tools added:
- Kubernetes: clusters, nodes, pods, deployments (4 tools)
- PMG: mail gateway status, mail stats, queues, spam stats (4 tools)
- Infrastructure: snapshots, PBS jobs, backup tasks, network stats,
disk I/O, cluster status, swarm, services, tasks, recent tasks,
physical disks, RAID status, host Ceph, resource disks (14 tools)
- Patrol: connection health, resolved alerts (2 tools)
Test coverage:
- Added comprehensive test files for adapters, infrastructure,
patrol, profiles, and query tools
Total tools: 50 (was ~25)