Three follow-up fixes:
1. RestartAIChat() now performs the full post-start wiring (MCP providers,
patrol adapter, investigation orchestrator) when the service starts for
the first time via Restart(). Previously these were only wired via
StartAIChat(), leaving first-time configure with a partially wired service.
2. The Ollama→OpenAI-compatible fallback in createProviderForModel is now
guarded by !strings.HasPrefix(modelStr, "ollama:") so explicit
"ollama:llama3" models are never silently rerouted to a different provider.
3. Windows install script registration check now uses the $Hostname override
(if set) instead of always looking up $env:COMPUTERNAME, so post-install
verification works correctly when a custom hostname is specified.
When Pulse starts before AI is configured, legacyService is nil.
Saving AI settings called Restart() which bailed immediately on the
nil check, leaving the service unstarted (503 on /api/ai/sessions)
until a full process restart.
Merged the nil and !IsRunning checks so first-time configure now
starts the service inline, same as the already-handled stopped case.
Also: bare model names that ParseModelString routes to Ollama (e.g.
"qwen3-omni") now fall back to a configured custom OpenAI base URL
when Ollama is not explicitly configured — handles manually-typed
model names on self-hosted OpenAI-compatible endpoints.
Fixes#1339, #1296
ZFS zvols (zd*), device-mapper, virtio disks, and other virtual block
devices don't support SMART and were being reported as FAILED. Use lsblk
JSON metadata to filter by device prefix, transport, subsystem, and
vendor/model. Also treat missing smart_status as unknown rather than
failed, and ignore UNKNOWN health in Patrol/AI signals.
createProviderForModel() only handled "provider:model" colon format.
Models like "google/gemini-2.5-flash" or "google/gemini-2.0-flash:free"
(OpenRouter format) failed because the colon split produced invalid
provider names.
Now uses config.ParseModelString() which correctly detects slash-
delimited models as OpenRouter (routed via OpenAI-compatible API).
The Undismiss() method existed on FindingsStore but was never exposed
via the API. Users who dismissed findings as "not_an_issue" had no way
to revert them.
- Add HandleUndismissFinding handler and route
- Add Undismiss() to UnifiedStore for parity with FindingsStore
- Also remove matching explicit suppression rules on undismiss
Patrol runs, evaluation passes, and QuickAnalysis calls were consuming
LLM tokens without recording them in the cost store. This made the
cost_budget_usd_30d budget setting ineffective since enforceBudget()
never saw patrol spend.
- Add RecordUsage() to ai.Service for thread-safe cost recording
- Add recordPatrolUsage() helper to PatrolService, called on both
success and error paths for main patrol and evaluation pass
- Record QuickAnalysis token usage in cost store
- Return partial PatrolResponse (with token counts) on error instead
of nil, so callers can always record consumed tokens
- Propagate partial response through chat_service_adapter on error
The AI also receives disk data via tool calls (pulse_metrics type="disks"),
not just the patrol context table. The raw JSON field "wearout" was
ambiguous — rename to "ssd_life_remaining_pct" so the field name itself
communicates that 100 = healthy.
The patrol context table header said "Wearout" and the tool returned a raw
"wearout" JSON field with no indication that 100 = full life remaining.
The AI interpreted "wearout: 100" as fully worn out and raised false
"100% Disk Wearout" findings on healthy NVMe drives.
Rename the patrol table column to "SSD Life Remaining (100%=new)" and
update the data type comment to clarify the semantics.
Show "Pro" badge on the Reporting settings tab so users know upfront
that advanced reporting requires a Pro license, rather than discovering
it after filling out the form.
Downgrade patrol trigger queue-full and rejection messages from Warn to
Debug — these are normal rate-limiting behavior, not actionable warnings.
Recovery notifications were silently disabled for users with pre-5.1.12
configs because the NotifyOnResolve bool field defaults to false when
absent from JSON. Use a *bool probe to detect missing field and default
to true.
Patrol trigger queue filled with warnings when the patrol loop wasn't
running. Gate TriggerPatrolForAlert on p.running and clear the flag
via defer when the loop exits.
saveToDisk used os.WriteFile which doesn't sync to disk before the
atomic rename. On CI runners with aggressive filesystem caching this
can leave the destination file with zero bytes, causing
TestKnowledgeStore_SaveLoad to fail with "unexpected end of JSON input".
#1197: Add Custom URL input to the expanded host row in Settings → Agents.
Loads existing URL via HostMetadataAPI on row expand; saves on button click.
Only shown for host-type agent rows.
#1210: Fix agent_connected always false for Docker hosts on Proxmox VMs.
connectedAgentHostnames now also marks Docker host hostnames reachable when
their matching VM/LXC has a node with a connected Proxmox agent, mirroring
the routing logic already used in the control path.
#1267/#1269: Improve Proxmox auto-registration failure logging. Response body
is now included in the error message, and the warning directs users to delete
the state file to force re-registration rather than claiming the node exists.
(cherry picked from commit 305f6d3c94f0da4fc970450a6304da57d6d7fe80)
TriggerPatrolForAlert was enqueuing into adHocTrigger regardless of
whether Patrol was enabled. With patrolLoop not running (disabled),
nothing drained the channel — it filled on the 10th alert and spammed
"Patrol trigger queue full, dropping trigger" on every subsequent alert.
Read p.config.Enabled in the same RLock as triggerManager and return
early when disabled.
Fixes#1258
(cherry picked from commit 69f399469538f0c9cd59084f6429fed8a793c042)
Patrol only pinged the first IP address of each VM/container, causing
false "unreachable" reports for guests with multiple IPs (common with
Windows VMs that have IPv6 or multi-adapter setups). Now probes all
IPs and marks reachable if any responds.
Fixes#1215
Enrich the patrol seed context with service identity (from discovery
store) and network reachability (via ICMP ping through host agents).
The guest metrics table now includes Service and Reachable columns,
and a Service Health Issues section highlights running-but-unreachable
guests. A new SignalGuestUnreachable signal type creates deterministic
findings for unreachable guests.
New files:
- patrol_intelligence.go: GuestProber interface, GuestIntelligence
type, gatherGuestIntelligence() with concurrent per-node probing
- patrol_prober.go: agentExecProber implementation using batch ping
commands via connected host agents
- Unified Proxmox VE discovery by redirecting Node requests to linked Host Agents.
- Added smart deduplication and legacy fallback for Proxmox discovery results.
- Integrated Proxmox Mail Gateway (PMG) into AI Patrol system.
- Added comprehensive tests for discovery redirection and deduplication.
- Redirect PVE node lookups to linked Host Agent ID when available.
- Implement deduplication in discovery lists to prefer Host Agent data over redundant Node entries.
- Add fallback mechanism to original Node ID for discovery retrieval ensuring compatibility with legacy data.
- Update data adapters and added comprehensive unit tests for redirection and deduplication logic.
- KnowledgeStore: use atomic write (temp+rename) to prevent file
corruption from concurrent async saves
- Change password tests: add auth headers since endpoint now requires
authentication
- ClearSession test: expect 2 cookies (pulse_session + pulse_csrf)
matching updated clearSession behavior
- API token test: update to match current behavior where query-string
tokens are accepted (needed for WebSocket connections)
- Host agent config: allow ScopeHostManage to resolve any host, not
just token-bound hosts