Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-05-22 03:02:35 +00:00

Author	SHA1	Message	Date
rcourtman	16963e415c	Drop t.Parallel from dismiss/snooze finding tests that race on global session store	2026-05-11 23:10:35 +01:00
rcourtman	7951da526b	Add release_cycle_artifact_globs so RC ceremony skips contract-update requirement	2026-05-11 22:55:29 +01:00
rcourtman	41dd867037	Stop alerts manager in TestPollCephClusterChecksPoolStorageThresholds to fix TempDir cleanup flake	2026-05-11 22:30:30 +01:00
rcourtman	e32db04543	Recalibrate CI 500-node load floor after rc.5 operator-state and agent-substrate plumbing	2026-05-11 19:07:44 +01:00
rcourtman	8ff69daa43	Bump install pins to rc.5 and refresh test fixtures for Patrol readiness + Unraid host profile tokens	2026-05-11 18:02:52 +01:00
rcourtman	e36945741e	Sanitise plain-JSON tool-call leaks from weak local models Small Ollama models (qwen2.5:11b, qwen2.5:14b, similar) frequently emit Pulse tool invocations as plain JSON inside content instead of routing through the structured tool_calls channel. Users saw raw payloads like `{"name": "pulse_query", "parameters": {...}}` as the assistant's final response. Extend cleanToolCallArtifacts and containsToolCallMarker with a new pass that detects this leak shape, gated on a closed allowlist of canonical tool names sourced from the runtime registry. The allowlist auto-syncs when registerTools() gains a tool, so no separate hand-maintained list. Anchored on (?:^\|\n) and a leading `"name"` key so prose containing JSON fragments, unrelated objects (`{"foo":"bar"}`), or named resources (`{"name":"my-vm","cpu":50}`) are left untouched.	2026-05-11 17:02:07 +01:00
rcourtman	9329258f8b	Correct the DeepSeek tool_choice coercion rationale The previous comment claimed DeepSeek's API aliases v4-flash/v4-pro to deepseek-reasoner, justifying the auto coercion via the legacy reasoner's known 400 behavior. That had the alias direction inverted: per DeepSeek's pricing page, deepseek-chat and deepseek-reasoner are deprecated aliases for v4-flash's non-thinking and thinking modes respectively, not the other way around. The coercion itself is empirically correct, though. Live preflight against deepseek-v4-flash with tool_choice=required produces a deterministic HTTP 400 ("provider rejected forced tool selection") in 275ms. The behavior is consistent across the DeepSeek user community - multiple downstream projects (pydantic-ai #5193, claude-code-router #1378, opencode #24190, others) confirm V4 models reject forced tool selection despite DeepSeek's chat-completion docs listing required as a supported value. Server reality disagrees with documentation. This commit updates only the comments in openai.go and openai_test.go to point at the empirical evidence and the community confirmation. The coercion behavior is unchanged.	2026-05-11 14:46:08 +01:00
rcourtman	2a9afb1112	Sanitise double-pipe DeepSeek DSML tool-call markers in chat Found by exercising pulse_summarize in real chat: the user asked a question, the model called the tool successfully (response came back with narrative_source: ai), but the chat panel ended with raw DSML text and "Assistant response is ready" — no actual prose answer. Root cause was in agentic_sanitize.go. The fast-path string list only checked single-pipe DSML variants ("<｜DSML｜...>"), but deepseek-v4-flash emits the double-pipe form ("<｜｜DSML｜｜...>"). The opening sequence didn't match any marker, so cleanToolCallArtifacts returned the content unchanged. The chat orchestrator then showed the raw DSML to the user as if it were the assistant's answer. Fix: - Add double-pipe variants (Unicode and ASCII) to the fast-path marker list. Six new entries, mirroring the existing single-pipe entries. - Add a backstop regex (dsmlRe) that matches any pipe-count variant via `</?[\\|｜]+/?DSML[\\|｜]*`. Future model behaviour with triple or higher pipe counts gets caught without another fast-path edit. - containsToolCallMarker gets the same coverage so streaming detection stops forwarding content the moment the marker appears, regardless of pipe count. Tests in agentic_sanitize_test.go gain three new cases for cleanup (double-pipe Unicode, double-pipe ASCII, triple-pipe regex backstop) and two for detection (double-pipe Unicode, double-pipe ASCII). All passing. Process note: this bug was invisible to existing tests because the test suite only covered the single-pipe variants that were documented in the marker list. The double-pipe form only appears when an actual model emits it. Same pattern as the JSON-casing fix earlier — exercise the surface against a real LLM, find what tests in isolation can't see.	2026-05-11 13:42:11 +01:00
rcourtman	e22113230a	Purge resolved legacy alert-mirror findings on load The previous rip retired only active "Active alert detected" findings from the now-removed detectAlertSignals -> SignalActiveAlert emitter. Resolved instances were left in place, polluting the Resolved tab and inflating the regressed total on the trust strip (a stale 8 resources each marked "regressed 3x" with descriptions like "Active warning alert: Container 'ollama' is powered off"). They have no canonical operator value -- the Alerts surface is the source of truth for currently-firing alerts -- so on load we now purge them entirely rather than keeping them around as Resolved noise. Active mirrors are still retired (auto-resolved with a clear reason) so operators see why the finding closed; resolved mirrors disappear silently because they were already in the terminal state. Idempotent. Extends TestFindingsStore_SetPersistence_RetiresLegacyAlertMirrorFindings with a fixture for the resolved-mirror case and asserts both the in-memory purge and the persisted state no longer carries it.	2026-05-11 11:40:07 +01:00
rcourtman	3c0b52c11d	Expose resolved findings to the Patrol Resolved tab The trust strip on the Patrol page credits "N auto-resolved" but the Resolved tab next to it sat empty — operators could see the count but not click through to audit which findings had been resolved or by what mechanism. The /api/ai/patrol/findings endpoint only returned active findings, so the frontend filter (status === 'resolved' \|\| 'dismissed' \|\| 'snoozed') had nothing to render. Adds the audit-trail accessor end to end: - PatrolService.GetAllFindingsIncludingResolved returns active + resolved + dismissed + snoozed findings at warning severity or higher, sorted with active first then by severity then recency. Two separate severity orderings — filter (info=0..critical=3, used with >= against the warning floor) and sort (critical=0..info=3, used with < to surface critical first). Conflating them initially let watch findings leak through the warning floor; the test fixture catches that. - HandleGetPatrolFindings honors a new include_resolved=1 query parameter that routes to the new accessor. Default behaviour (active only) is unchanged for clients that just want the live findings list. - Frontend getPatrolFindings accepts an options object with includeResolved and loadPatrolFindings threads it through. - FindingsPanel triggers an includeResolved load whenever the Resolved filter becomes active for the Patrol-source view. Test: TestPatrolService_GetAllFindingsIncludingResolved_IncludesResolvedAndDismissedSortsActiveFirst covers active-first ordering, inclusion of resolved + dismissed, and the warning severity floor (watch-level findings must not leak through).	2026-05-11 11:09:03 +01:00
rcourtman	d02255907c	Fail closed when patrol_resolve_finding verifier is inconclusive ResolveFinding adapter previously logged a warning and allowed the LLM's resolve to proceed when the deterministic verifier returned an error (timeout, executor unavailable, etc.). That's fail-open: any verifier failure let the auto_resolved → re-detected cycle continue, exactly the pattern the rest of this branch's patrol_resolve_finding work spent commits closing. The "Backup failed" finding on the live preview still cycled once post- migration because of this path — verifier returned an ErrVerificationUnknown and resolve was permitted. Resolution of an event/persistent category finding is effectively permanent (next detection registers as a regression and inflates counters and pollutes the trust strip). When the deterministic verifier cannot confidently say the failure signal is gone, we don't have grounds to honor the LLM's judgment — the LLM's "current investigation didn't surface a fresh failure" is exactly the unreliable signal that produced bogus cycles. Switches the inconclusive-verifier branch from log-and-allow to log-and-reject, returning an error to the tool so the LLM can retry or escalate to the operator. The verifier-still-detects- signal path stays as-is (it was already fail-closed). Test: TestPatrolFindingCreatorAdapter_ResolveFinding_RejectsWhenVerifierIsInconclusive exercises the path by calling ResolveFinding on a backup-failed finding through a PatrolService with no chat service wired (getExecutorForVerification returns ErrVerificationUnknown). Asserts the error mentions 'inconclusive' and that ResolvedAt remains nil. Contract: extends the deterministic-resolve-gate clause in the ai-runtime canonical-files completion-obligations to name the fail-closed-on-inconclusive policy explicitly.	2026-05-11 10:53:06 +01:00
rcourtman	3b6fc3ef31	Audit chat-package Chat/ChatStream callers for cost recording The parent-package audit (TestCostRecordingCoverage in internal/ai/cost_recording_audit_test.go) only walks internal/ai/*.go — sub-packages were unguarded, which is the gap that let the chat cost-recording bug land in the first place (the agentic loop in internal/ai/chat/ called ChatStream without anyone recording cost; fixed in `a0b3bc7ed`). Adds a parallel audit at internal/ai/chat/cost_recording_audit_test.go covering both .Chat() and .ChatStream() callers. Same shape as the parent audit but extended to include the streaming variant, since the chat package is built around the agentic loop's ChatStream call site. The chat package's orchestrator/loop split means recording lives in chat.Service.recordChatTurnCost — the orchestrator that owns the loop — rather than inside the loop methods that call ChatStream. Two doc-comment exemptions cover that: - AgenticLoop.ExecuteWithTools / executeWithTools (agentic.go) - AgenticLoop.ensureFinalTextResponse (agentic_final.go) Both name the orchestrator that records, so a future contributor moving recording elsewhere has a clear pointer to update. A new ChatStream caller added to the chat package without either recording or an exemption marker will now fail this audit, the same way the parent audit would have caught QuickAnalysis or Narrate if they had been written without recording.	2026-05-11 09:16:26 +01:00
rcourtman	9f43a22fb1	Bound patrol-main session at 200 messages to stop unbounded disk growth The patrol-main chat session was reused across every scheduled Patrol run with no upper bound. After a month of runs the file had grown to 16 MB / 3,593 messages, and every AddMessage rewrote the whole file to disk — so the I/O cost per Patrol run was scaling with total session age, not with the run's own output. Across all chat sessions on this dev instance, the ai_sessions directory hit 676 MB / 1,629 files. The stateless-Patrol-input fix (commit `43760fb0d`) stopped loading the session back into the agentic loop, but Patrol still wrote each run's messages to the session for the Pulse Assistant sidebar's forensic view. That write path is what this commit bounds. ExecutePatrolStream now calls SessionStore.TrimMessages(200) after each run, keeping roughly the last two runs' worth of messages — enough for the sidebar to show recent activity, far short of unbounded growth. The next Patrol run on a bloated session will drop the historical 3,000+ messages down to 200 on its first write, so existing storage debt clears on its own without a separate migration. User-driven chat sessions are unaffected: TrimMessages with keepMostRecent <= 0 is a no-op, and callers that want full history retention simply don't call it. Only Patrol's forensic session is capped. The canonical Patrol forensic log is the PatrolRunRecord history surfaced at /api/ai/patrol/runs — that's the durable record with structured fields. The chat-session-shaped file is a sidebar convenience, not the source of truth. Three tests guard the boundary: - TrimMessages keeps the most recent N (50 messages trimmed to 10 → messages 40-49 remain) - TrimMessages is a no-op below threshold (5 messages, cap 200 → 5 messages remain) - TrimMessages with non-positive keep is a no-op (3 messages, cap 0 or -5 → 3 messages remain) ai-runtime contract updated.	2026-05-10 23:19:08 +01:00
rcourtman	a0b3bc7ed3	Record user-chat token usage to the cost ledger chat.Service.ExecuteStream was a long-standing cost-ledger gap: the agentic loop accumulated token counts via stream callbacks (see GetTotalInputTokens / GetTotalOutputTokens in agentic_control.go) and surfaced them in the SSE done envelope to the frontend, but nothing on the server side recorded a cost.UsageEvent. Patrol, discovery, QuickAnalysis, and the report narrators all record; only chat — the bulk of AI token spend — did not. The operator's AI usage dashboard was therefore understating cost dramatically. Found while extending the cost-recording mindset across subpackages after fixing QuickAnalysis (`08491b9f4`). Initially spawned as a separate task but the right shape and scope became clear, so landing it directly here. Pipeline: - Service.CostStore() exposes the per-tenant cost store handle. - chat.Config gains optional CostStore *cost.Store field, threaded into chat.Service.costStore at NewService time. - chat.Service.recordChatTurnCost records a UsageEvent with UseCase="chat" after every loop.ExecuteWithTools return (success OR error — operator was billed regardless of clean response). Skips when costStore is nil or zero tokens accumulated. - ai_handler.go's two chatCfg construction sites populate CostStore via h.resolveCostStore(ctx). - router wires the resolver to AISettingsHandler.GetAIService(ctx).CostStore() with no Enabled gate — even brief chat usage while AI was being configured should appear in the dashboard. ExecutePatrolStream is deliberately not changed. It creates a separate tempLoop and its caller (patrol_ai.go) records cost via its own helper at line 887. Recording in ExecuteStream only avoids double-counting on the patrol-via-chat path. Tests in chat/cost_recording_test.go cover: recording when store configured, no-op when store nil, no-op on zero tokens (early failures), graceful handling of model strings missing the provider prefix.	2026-05-10 23:15:53 +01:00
rcourtman	113190a920	Skip agent-sourced alerts in node-presence cleanup CleanupAlertsForNodes removes alerts whose Node isn't in existingNodes. That map is built upstream only from state.Nodes (Proxmox nodes) and state.PBSInstances (PBS instances) — no agent resources. Agent-sourced alerts (Unraid, standalone Linux hosts, TrueNAS, anything reached via Pulse Agent) typically have Node="" or an agent UUID, so they fall into the cleanup branch every cycle. Then the next poll re-creates the alert as new, calls AddAlert, and appends a fresh history row. Observed in the wild: 3,980 alert history entries in 7 days, with the same canonical alert ID (e.g. "Unraid array running without parity protection") appearing every 30 seconds. Adds a carve-out for ResourceID prefixes starting with "agent:", matching the existing pattern for "docker-" / "docker:" and "pbs-" / "pbs-offline". Locks in the behaviour with a new subtest that mixes agent-sourced and Proxmox-sourced alerts and asserts that only the legitimately stale Proxmox alert is removed.	2026-05-10 23:12:31 +01:00
rcourtman	99c499ade7	Repair orphan tool_calls in convertToProviderMessages Defense-in-depth for the malformed-history bug pattern. The Patrol fix made patrol-main runs stateless, but Assistant chat sessions are inherently multi-turn and must keep their history. Any chat session that ends mid-tool-call — network drop, ctx timeout, browser crash, uncaught panic, any interrupt that fires between "model emits tool_calls" and "agentic loop appends all tool results" — leaves the persisted session with orphan tool_call_ids. The next message that loads this history is rejected with the same provider error that flapped Patrol for 33 days: An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. For Patrol this was fixable by ignoring the session. For Assistant it isn't; the conversation context is the product. convertToProviderMessages now ends with a repairOrphanToolCalls pass that scans every assistant message with tool_calls and inserts synthetic is_error tool result messages immediately after the assistant turn for any tool_call_id that has no matching downstream result. The synthetic content is marked is_error=true and explains the interruption so the model can retry the same call or proceed without that data — preserving conversational continuity while satisfying the provider's structural-validity check. This guards every conversation that crosses convertToProviderMessages, not just Assistant chat. If Patrol ever changes back to loading session history, the same safety net applies. If a new entry point appears for some other LLM flow, it gets the repair for free. Three tests guard the boundary: - Orphan injection (3 tool_calls, only 1 result → 2 synthetic results, marked is_error with interrupted explanation, ordering preserved) - Clean no-op (all tool_calls fulfilled → no synthetic messages, no is_error pollution) - Existing truncation test still passes (assistant message with both tool_calls and own tool_result → no repair needed, tool_call_id matches in same message) ai-runtime contract updated.	2026-05-10 23:10:13 +01:00
rcourtman	e657f6ace9	Suppress assessment error penalty after trailing-success recovery The overall-health "Recent Patrol errors" coverage factor in summarizeRecentPatrolCoverage was anchoring the score to a stale ratio: it counted errors across the last 10 runs without weighting recency. After Pulse fixed two compounding Patrol bugs today, four consecutive successful runs (50+ tool calls each) followed six earlier failures. The assessment kept showing C/65 with the prediction "most recent Patrol runs encountered errors (6 of 10)" — directly contradicting the fact that every recent run had succeeded. Operators reading that score would conclude Pulse Patrol is still broken. It isn't. The fix dragged the grade. This commit adds a recovery-suppression check: count trailing successful full Patrol runs from the most-recent end of the window (GetAll returns newest-first), skipping non-full runs. When three or more consecutive trailing successes exist — roughly a 9-hour clean stretch at the default 3-hour cadence — the error penalty drops entirely. The score reflects current reality. Three is conservative: a single recovery run could be a transient win; three consecutive demonstrate the underlying fix is sticking. Below the threshold, the existing ratio-tiered penalty still applies so partially-recovered states still register. Two tests guard the boundary: - 6 historical errors + 3 trailing successes → no coverage factor (suppressed) - 6 historical errors + 2 trailing successes → coverage factor remains (recovery incomplete) Live verified after this commit lands: the assessment that's been stuck at C/65 since the malformed-history fix will recompute to A/B grade as soon as the trailing 3 successful runs are recognized by the same recent-runs query. ai-runtime contract updated.	2026-05-10 23:02:57 +01:00
rcourtman	4dff26f728	Emit structured telemetry on reporting and summarize invocations The reporting feature now ships across two surfaces (PDF/CSV export and pulse_summarize chat tool) and three modes (single-resource, fleet, summarize). Without usage telemetry we can't tell whether the work earns its place — operator demand, AI-vs-heuristic adoption, range/format preferences are all invisible. Stops further feature investment from being pure speculation. Three new info-level log events, structured so an agent can grep transcripts and group by dimension without a separate metrics pipeline (matches the "agent owns ops analysis, human gets outcomes" posture in MEMORY.md): reporting.single.generated — single-resource PDF/CSV reporting.fleet.generated — multi-resource fleet PDF/CSV reporting.summarize.invoked — pulse_summarize chat tool (both modes) Common dimensions: org_id, format/action, range, ai_configured, findings_configured, window_start/end. Single-resource adds resource_type + metric_type + bytes; fleet adds resource_count + bytes; summarize adds resource_type + resource_count (fleet mode) + narrative_source (so we can audit AI-fallback rate). Includes rangeLabel() helper that maps a window to the canonical catalog range token (24h/7d/30d) with a 1h tolerance, falling back to "<hours>h" so non-standard windows still group. Tested. TestReportingTelemetryEventNames pins the canonical event names as a contract — an agent grepping logs depends on them being stable; changing them silently would break audit tooling on the consumer side. The reporting engine already logs the resolved narrative source (heuristic/ai) at debug level via the existing "Generating report" line, useful for diagnosing why a specific report fell back. Kept at debug; the new info-level events cover the operator surface.	2026-05-10 22:59:23 +01:00
rcourtman	03463c1bfe	Thread per-tenant AI narrators into pulse_summarize via chat session v1 of pulse_summarize (`1fe5d6853`) shipped with heuristic narrative only. The follow-up wiring promised in that commit now lands: the chat session carries optional report-narration providers that the tool's handler reads when building requests, so AI-narrated synthesis flows into chat using the same provider, sanitizer, model selection, cost ledger, and budget gate the report PDF endpoint already uses. Pipeline: - pkg/reporting Narrator / FleetNarrator / FindingsProvider interfaces are already implemented by internal/ai.Service. No new implementations. - tools.ExecutorConfig + PulseToolExecutor gain three optional fields (ReportNarrator, ReportFleetNarrator, ReportFindingsProvider). Clone() copies them so per-session executors inherit the wiring. - chat.Config gains the same three fields; NewService threads them into ExecutorConfig. - tools_summarize.go reads e.reportNarrator/FleetNarrator/ FindingsProvider and populates MetricReportRequest / MultiReportRequest. The engine already accepts these on the request and falls back to heuristic when they are nil — no engine changes needed. - AIHandler gains SetReportNarratorResolver(ctx -> narrators); both per-tenant and default chat.Config construction sites invoke the resolver. Router wires the resolver to AISettingsHandler.GetAIService with the same Enabled-gate the reporting handler uses. Unconfigured tenants are unchanged: the resolver returns nil, the tool returns heuristic narrative — identical to today. Configured tenants get AI synthesis in chat that matches what their report PDF already carries, billed and budget-gated the same way.	2026-05-10 22:50:17 +01:00
rcourtman	7a7b3c9d30	Gate LLM patrol_resolve_finding on deterministic verifier for event findings Backup-failed was flapping detected → auto-resolved → re-detected ten times in a single day. Each cycle the LLM saw "PBS backups look healthy in my current snapshot" during a Patrol pass, called patrol_resolve_finding(backup-failed), and the adapter at patrol_findings.go:985 called Resolve(findingID, true) directly — no category check, no evidence verification. The contract docs at findings.go:52-67 explicitly say event / persistent categories (backup, reliability, security, general) "stay active until explicitly resolved — either by the LLM calling patrol_resolve_finding with evidence, or by operator action." That "with evidence" was never enforced. This commit enforces it. The adapter now checks two conditions before honoring an LLM resolve: - finding.Category does NOT support stale-auto-resolve (per the contract function CategorySupportsStaleAutoResolve), AND - a deterministic verifier exists for finding.Key (currently smart-failure and backup-failed) When both are true, the adapter runs VerifyFixResolved on the finding's resource. If the verifier still detects the failure signal, the LLM gets an error explaining why the resolve was rejected and that the underlying issue must be fixed first. If the verifier confirms the signal has cleared, the resolve proceeds with grounded evidence. Categories that support stale-auto-resolve (performance, capacity) bypass the gate entirely — the LLM can resolve them based on absence per the existing contract. Keys without a verifier also fall through to current behavior so we don't block resolves for categories we haven't built verifiers for yet. New PatrolService.hasDeterministicVerifierForKey() helper keeps the gate's verifier list in lockstep with the switch in verifyFixDeterministically. Tests cover the three branches: - performance category → gate skipped, resolve proceeds - reliability + no verifier → gate falls through, resolve proceeds - hasDeterministicVerifierForKey for known and unknown keys ai-runtime contract updated.	2026-05-10 22:48:09 +01:00
rcourtman	1fe5d6853f	Expose reporting synthesis to Assistant via pulse_summarize tool The reporting synthesis layer (observations, recommendations, outliers, period comparison) shipped trapped behind the PDF/CSV export. Operators who chat with Assistant could not ask "what's been happening with pve1 this week" — the data path existed but had no non-PDF surface. This commit adds a single new tool, pulse_summarize, that wraps the engine's non-rendering entry points (NarrativeFor / FleetNarrativeFor) so that question gets answered in chat. The tool takes an action parameter (resource \| fleet) and routes accordingly: - resource mode requires resource_type + resource_id and returns the same Narrative the single-resource report carries (health status, observations, recommendations, period comparison). - fleet mode requires resource_type + a comma-separated resource_ids string (PropertySchema does not currently support array items, and CSV is LLM-friendly enough) and returns the FleetNarrative (outliers, patterns, recommendations). Capped at the same multi-report ceiling (50) as the API endpoint. The tool is read-only — no control level requirement, no approval gate — and uses the global reporting engine the rest of the app already shares. Returns a JSON envelope so chat can render it or hand it back to the model for follow-up framing. v1 ships with heuristic narrative only. The AI narrator wiring through the chat session (Narrator/FleetNarrator/FindingsProvider threaded via chat.Config -> tools.ExecutorConfig -> PulseToolExecutor) is a focused follow-up; it lets the same tool inherit the per-tenant AI service the report PDF endpoint already uses. The seam is already in place because NarrativeFor/FleetNarrativeFor take an optional narrator on the request — v1 passes nil, v2 populates it.	2026-05-10 22:36:49 +01:00
rcourtman	20df3dcd2c	Let a valid bootstrap token authorize initial setup from any origin The loopback gate from `586473ee3` rejected non-loopback setup requests before the bootstrap-token check could run, so a Proxmox-LXC install (install script prints URL + token; user opens URL on workstation, pastes token) hit "only available from localhost" even with the correct token. The token is the security boundary — only callers with filesystem access to the data dir can read it — so a valid token now authorizes setup from any origin. No-token requests still require direct loopback. Updates the two contract/setup tests that pinned the old behavior. Fixes discussion #1459.	2026-05-10 22:25:34 +01:00
rcourtman	e32d4ede44	Expose engine narrative entry points for non-rendering callers The reporting engine's synthesis layer was reachable only through Generate/GenerateMulti, which always rendered PDF or CSV. Pulse Assistant needs the same retrospective synthesis (per-resource summary, fleet outliers, period comparison) in a form it can present in chat, not as a downloaded artifact. Add two non-rendering entry points to the Engine interface: NarrativeFor(req MetricReportRequest) (Narrative, error) FleetNarrativeFor(req MultiReportRequest) (FleetNarrative, error) Both run the same query path and the same narrator resolution as their rendering counterparts (heuristic by default, AI when the request supplies a narrator, fail-closed-to-heuristic on any narrator error) and return the structured narrative without invoking the fpdf/csv output stage. Test stubs in pkg/reporting and internal/api are updated to implement the extended interface. These are the seams the upcoming pulse_summarize Assistant tools wrap to answer questions like "what's hot on pve1 this week" or "where should I look across my fleet" without round-tripping through report generation. Same synthesis layer, no PDF involved. Also fixes a pre-existing flake in TestEngineGenerate_UsesSuppliedNarrator (metrics writes are async; the first Generate sometimes ran before the raw tier flushed). Wrapped in the same eventually-pattern used by the prior-period and findings-provider tests.	2026-05-10 22:23:09 +01:00
rcourtman	41155a7968	Forbid the report narrators from acting as parallel detectors The single-resource and fleet narrator prompts both grounded their claims in structured data, but neither prevented the model from classifying observations at warning or critical severity based on metric inference alone. That left a subtle gap: an AI narrator noticing memory creep across a window could promote it to warning even when Patrol — the canonical detection layer — had not flagged it. That competes with Patrol rather than summarizing its work, and it lets the report PDF silently shadow-classify in a way that diverges from the findings store. Add an explicit detection-boundary instruction to both prompts: warning or critical severity may only be assigned when backed by a Patrol finding, an alert, or a hard-threshold breach visible in the input (cpu max > 90, memory avg > 85, disk avg > 85, failed or high-wear disks, storage pools at >= 90%). Patterns the model sees in metric data without that backing are constrained to info severity. Recommendations follow the same rule. The narrative remains a retrospective summary of Patrol's classified state, not a parallel classifier. This is a prompt-only change. The deterministic data surface and the heuristic fallback narrator are unaffected; the heuristic narrators already classify only on the same threshold rules listed above, so the AI narrator is now constrained to the same evidence surface its fallback uses.	2026-05-10 22:02:10 +01:00
rcourtman	08491b9f48	Fix QuickAnalysis cost recording and audit the pattern Service.Narrate (`b84b87d8d`) and Service.NarrateFleet (`d4463a615`) fixed missing cost-record calls in the report-narrative path. Auditing the rest of internal/ai for the same bug class found one more: Service.QuickAnalysis. It is used for alert auto-resolve and similar lightweight decisions, so production token spend on auto-resolve analysis was invisible in the AI usage dashboard. Mirror the same fix: capture costStore under the read lock alongside provider/cfg, and after provider.Chat returns, record a UsageEvent labelled with the request's UseCase (defaulting to "quick_analysis" when the caller leaves it blank). Recording happens before the empty-content guard so failed-but-billed calls are still visible. Adds cost_recording_audit_test.go: an AST-level audit that walks internal/ai/*.go (excluding _test.go and sub-packages), finds every function calling .Chat() on a providers.Provider value, and asserts each function body also references .Record() on a cost store. Exemption is allowed via a //cost-recording-exempt: <reason> doc comment. RunPatrolToolPreflight is annotated as exempt — it is a connectivity self-test, not user workload, and should not pollute the operator's cost dashboard. The audit is intentionally local (function-scoped, not interprocedural). A passthrough wrapper that calls a recording function rather than calling Record itself would need an explicit exemption naming the wrapped callee. Keeping the scan local makes new Chat callers loud rather than letting silent gaps creep in via indirection. Future Chat callers must either record cost or carry the exemption marker. The audit fails CI otherwise, so the regression that shipped in `b2bd9d114` (Narrate) and would have shipped again in `d4463a615` (fleet) cannot recur silently.	2026-05-10 21:57:39 +01:00
rcourtman	c6a786a36d	Broaden regression-counter reset to include LLM-driven cycles Initial detector (commit `942f9ca0f`) only matched on the two legacy absence-signature reason strings — but the Backup failed finding on the live preview showed 6 auto_resolved events all with empty messages, produced by the LLM patrol_resolve_finding tool via Resolve(_, true). Counter stayed at 6× after the previous migration ran. New detector: any active finding whose category is NOT eligible for stale-auto-resolve (i.e. anything other than performance or capacity) AND has any auto_resolved event on its lifecycle is treated as having an inflated counter. The rationale is the same rule the category gate already established — for event/persistent categories there is no legitimate absence-driven resolution path, so any auto_resolved was either a removed-bogus-path stamp or an LLM judgment call that repeatedly reverted through regressions on the next run. The cumulative count is no signal either way. Performance/capacity findings retain their counter because the metric-cleared resolution model is sound there. Test extended to cover four cases: LLM-driven cycle resets, legacy-reason cycle resets, eligible-category preserves counter, non-eligible category without any auto_resolved preserves counter. Plus the existing idempotency case (already-reset finding stays reset and is not re-applied).	2026-05-10 21:47:57 +01:00
rcourtman	942f9ca0f5	Reset regression counters polluted by bogus auto_resolve cycles The Backup failed finding on the live preview showed "regressed 6×" when the actual regression count of genuine recurrences was at most 1 or 2 — the rest were the system fighting itself, driven by the absence-based auto_resolve paths that were gated (category whitelist) or removed (alert-mirror rip) earlier in this branch. Counter stayed sticky after those fixes landed, so the trust strip and finding badges still surfaced the inflated number. FindingsStore.SetPersistence load pass now scans each active finding's lifecycle for the two known bogus-signature auto_resolved reasons ("No longer detected by patrol", "Resource no longer exists in infrastructure"). If found, RegressionCount is reset to 0 and LastRegressionAt is cleared, and a regression_counter_reset lifecycle event is appended so the migration is idempotent. A finding that already has a regression_counter_reset event is left alone; any regressed events that accrued after the reset are genuine and stand. findingHasBogusAutoResolveCycle returns true only when the lifecycle contains a bogus auto_resolved and no prior reset event, so the function is the single point of truth for the migration decision and is straightforward to test. Test covers three cases: finding with bogus signature gets reset, finding with empty-message auto_resolved (LLM-driven, legitimate) keeps its counter, finding already migrated is not re-reset. Updates ai-runtime Current State to document the second migration on top of the alert-mirror retirement.	2026-05-10 21:44:29 +01:00
rcourtman	590671ffbb	Retire legacy "Active alert detected" findings on load The previous commit removed the detectAlertSignals path so no NEW alert-mirror findings are emitted, but the findings already persisted from earlier builds stay in the store indefinitely — nothing cleans them up (reconcileStaleFindings is gated on performance/capacity categories, the LLM resolves them just to have them re-detected next run except now the deterministic emitter is gone so re-detection can't happen, but they're left sitting as active findings draining the trust strip and score). FindingsStore.SetPersistence now runs a one-shot retirement pass on load: any active finding with title "Active alert detected", source ai-analysis, and category general is auto-resolved with reason "Patrol no longer mirrors alerts; the Alerts page is the canonical surface for currently-firing alerts." The pass appends an auto_resolved lifecycle event so the retirement is auditable, syncs the loop state to resolved, and schedules a save so the cleanup persists. Idempotent: after the first load with this code, no findings match the signature so the pass is a no-op. Defensive: the signature requires all three fields (title + source + category) to match before retiring, so an operator-authored finding that happens to share the title is left untouched. Test covers the mirror case, the matching-title-but-foreign-source case (must NOT retire), and an unrelated active finding (must NOT retire), plus verifies the retired state persists back through the persistence layer. Updates ai-runtime Current State to record the migration path.	2026-05-10 21:39:09 +01:00
rcourtman	271d12ecab	Stop mirroring alerts into Patrol findings The deterministic signal pipeline ran the pulse_alerts tool output through detectAlertSignals and produced a SignalActiveAlert for every firing alert, which Patrol then materialized as an "Active alert detected" finding (source: ai-analysis, category: general). The system prompt at the top of patrol_ai.go explicitly tells the LLM not to duplicate alerts — but the deterministic emitter was duplicating them anyway, behind the LLM's back. Symptoms observed in the wild: - 9 active "Active alert detected" findings in Patrol, every one a duplicate of an existing alert already on the Alerts page. - The LLM, doing what the prompt told it, resolved each mirrored finding via patrol_resolve_finding. Next run the alert was still firing and Patrol re-emitted the signal → finding regressed. Lifecycle showed several auto_resolved → re-detected → regressed cycles per finding within hours. - Health score dragged down by issues the operator already saw on the Alerts page, with no operator action possible from Patrol that wasn't already available from Alerts. Rip detectAlertSignals entirely, remove the pulse_alerts case from the signal-extraction switch, drop SignalActiveAlert plus its key / title / recommendation entries. Convert the prior TestDetectSignals_ActiveAlert into a regression guard that locks in the no-mirror behavior. Updates the ai-runtime subsystem Current State to record the decision: Patrol does not duplicate the Alerts surface; alerts own their own lifecycle, surface, and acknowledgement model.	2026-05-10 21:33:41 +01:00
rcourtman	d4463a615c	Add fleet-level AI narrative for multi-resource reports The single-resource AI narrative landed in `b2bd9d114` but multi-resource fleet reports stayed heuristic-only. That left a gap on the exact axis where AI helps most: a 50-resource fleet PDF is where synthesis is the difference between useful and unread. Introduce FleetNarrator as a separate interface from Narrator. The input shapes are different — single-resource takes one set of metric stats with a prior window, fleet takes a denormalised cross-resource view with per-resource summaries plus a fleet aggregate. HeuristicFleetNarrator owns the deterministic fallback: ranks resources by severity (critical alerts > unhealthy disks > storage pressure > memory > CPU > non-critical alerts), picks up to 5 outliers, derives cross-cutting patterns by counting how many of N resources share a hot signal, and emits fleet-scoped recommendations. internal/ai.Service implements FleetNarrator through report_fleet_narrator.go. Distinct use-case label (report_narrative_fleet) so fleet vs single-resource spend is separable in the cost ledger and budget gate. The fleet payload is denormalised through buildReportFleetPayload so prompt cost scales linearly with fleet size. Same fail-closed invariant — nil provider, parse failure, or context cancellation falls through to the heuristic. Single-resource Narrator is intentionally NOT propagated through engine.GenerateMulti: a 50-resource fleet report performs one AI call (fleet narrator), not 51. The router resolver returns the AI service for all three roles (Narrator, FleetNarrator, FindingsProvider). The fleet PDF renders the FleetNarrative in the fleet summary cover when present: executive prose, named outliers with severity-coloured bullets, cross-cutting patterns, recommendations, optional period comparison, and an AI provenance footer. The deterministic resource summary table is preserved above so every named outlier is verifiable against the table immediately below it. Legacy "Highest CPU / Most alerts" bullets remain as the fallback when no FleetNarrative is attached.	2026-05-10 21:23:12 +01:00
rcourtman	b84b87d8d8	Record cost events for AI report narration Service.Narrate consumed provider tokens without recording a cost.UsageEvent, so AI-narrated reports were invisible in the operator cost ledger. Every other Service call site in the AI runtime records cost; the narrator omitted it. Mirror the QuickAnalysis/chat pattern: capture the cost store under the read lock alongside the provider/cfg snapshot, and after provider.Chat returns, record a UsageEvent labelled report_narrative with the resource type/id as the target. Recording happens before parsing so a failed-but-billed call (e.g. provider returned malformed JSON) still appears in the ledger — the operator was billed regardless of whether we could use the response. The use_case string lifts to a package-level constant so the budget gate (enforceBudget), the cost label, and the dashboard taxonomy all reference one identifier.	2026-05-10 21:08:51 +01:00
rcourtman	27bd31684a	Log the underlying error on audit list 500s HandleListAuditEvents dropped the Query/Count error before writing the 500, so a user hitting "Failed to fetch audit events" produced no server-side log line — diagnosing the failure was impossible without a local repro. Log the error with the org ID so the next instance is findable. Doesn't change the user-facing response.	2026-05-10 20:44:18 +01:00
rcourtman	9d1d24bdf1	Fix silent fingerprint loss for LXC and VMs processFingerprint ran v.Index(i).Interface() then reflect.ValueOf(item), dropping addressability and making reflect.Call panic on every iteration with "Container as type *Container". The defer/recover in collectFingerprints swallowed it, so LXC and VM fingerprints never landed in the store — change-detection and discovery for those resource types have been broken since v6. Pass the slice element's address straight through (.Addr()) so the generator's pointer receiver gets the right type. Add a regression test that fails if anyone goes back through .Interface().	2026-05-10 20:31:46 +01:00
rcourtman	a343ba0126	Tier patrol-coverage health impact by recent-error ratio The overall health score chip read "A · 95/100" while the same assessment card said "Coverage incomplete · Recent Patrol runs encountered errors · Verify full coverage." Those messages contradicted because the "recent errors but had a successful run" coverage factor used a flat -10 penalty regardless of the error ratio. With one successful manual run among many failed startup runs, the math stayed in the A band and the score directly undermined the warning it sat next to. The factor now tiers the impact by ratio of errored runs to relevant runs in the scoring window: >50% errored → -30, "Most recent Patrol runs encountered errors (N of M); the current health summary is not reliable until coverage stabilizes." >25% errored → -20, "Recent Patrol runs encountered errors (N of M), so the current health summary may be incomplete." else → -10, original light-tier description. A single transient error stays in grade A; dominant-error periods drop out of A so the grade matches the warning. Adds two tests covering both ends of the new tiering and updates the ai-runtime subsystem contract Current State section.	2026-05-10 19:33:18 +01:00
rcourtman	b2bd9d1147	Replace heuristic report narrative with optional AI-generated layer Performance reports rendered the Executive Summary, Observations, and Recommendations sections from inline threshold rules in pdf.go. That narrative looked intelligent but was static templating against alert counts and metric percentiles, which felt off-brand alongside Patrol and Pulse Assistant. Introduce a Narrator interface in pkg/reporting and a FindingsProvider counterpart that the engine consults at report time. The heuristic rules are lifted into HeuristicNarrator unchanged so the deterministic fallback still produces the same observations and recommendations. The engine now also queries the comparable prior period and threads its aggregate stats through the narrator so deltas can be expressed. internal/ai.Service implements both interfaces via report_narrator.go (single-turn JSON call grounded in the structured ReportData payload, falling back to the heuristic on any error/timeout) and report_findings.go (Patrol findings whose lifecycle overlaps the report window). The reporting handler resolves the per-tenant AI service when it is configured and supplies it in the request; absent configuration, reports look identical to the prior heuristic output. Charts, stats tables, alert lists, storage and disk sections stay deterministic — sysadmins can verify every AI claim against the data tables next to it. The PDF renders the AI prose between the health card and Quick Stats, adds a Period-over-period section after Recommendations, and prints a provenance footer when the narrative came from the assistant. ai-runtime.md and api-contracts.md updates land in a follow-up commit on this branch; agent-lifecycle / performance-and-scalability / storage-recovery have no contract delta from this change (router.go is referenced in their Extension Points but their semantics are unchanged).	2026-05-10 19:30:54 +01:00
rcourtman	b44d5892f4	Gate stale-finding auto-resolve on category whitelist reconcileStaleFindings was auto-resolving any seeded finding that the LLM didn't re-report in a successful run. The function's own comment acknowledges the LLM doesn't reliably use patrol_resolve_finding, so this was built as a cleanup pass — but it cannot tell the difference between "LLM correctly recognized this is fixed" and "LLM forgot to re-mention it." For findings that represent discrete events or persistent states (a backup task that failed, a service that crashed, a security vulnerability that was found, a configuration error), absence in a Patrol report is not evidence that the issue has cleared. The result was bogus auto_resolved → re-detected → regressed cycles, observed in the wild as "Backup failed" regressing 4× over 6 hours and "Provider analysis error" regressing 271×. Those bogus auto-resolutions also inflated the trust strip with fictional auto-resolved credit. CategorySupportsStaleAutoResolve in findings.go gates the cleanup: only `performance` and `capacity` findings — continuous current-state metric thresholds — may be auto-resolved from absence. The other four categories (reliability, backup, security, general) stay active until explicitly resolved. Updates the ai-runtime subsystem contract Current State section with the whitelist and the adjacent lifecycle dedup rules already landed. Adds TestReconcileStaleFindings_SkipsNonCurrentStateCategories with table-driven subtests for all four event/persistent categories, and TestCategorySupportsStaleAutoResolve to lock in the whitelist.	2026-05-10 19:27:36 +01:00
rcourtman	a1a9003dfd	Drop duplicate loop_state lifecycle events for every transition syncLoopStateLocked was emitting a generic "loop_state" lifecycle event on every successful transition, duplicating the semantic event the caller had just emitted. A finding that auto-resolved showed two adjacent rows in the Lifecycle drawer: Auto-resolved (detected -> resolved) Loop state changed (detected -> resolved) Same from/to, same timestamp, no extra information. Every transition was paired with a duplicate. Removed the generic loop_state emission. Every caller of syncLoopStateLocked already emits the semantic event for the transition it caused (auto_resolved, regressed, dismissed, acknowledged, snoozed, suppression_lifted, reminded, etc.). The loop_transition_violation branch stays — that's the only signal that an invalid transition was rejected, not a duplicate. Adds TestFindingsStore_TransitionDoesNotAlsoEmitGenericLoopStateEvent to lock in the behavior.	2026-05-10 18:12:03 +01:00
rcourtman	43760fb0d0	Patrol runs are now stateless — drop prior session history Patrol's "patrol-main" session was reused across every scheduled run, so ExecutePatrolStream loaded the full session history into the agentic loop's input. When any prior run ended after the model emitted tool_calls but before all tool results landed (provider error, timeout, context cancellation), the orphan tool_calls were persisted and every subsequent run inherited them. The provider then rejected the conversation with: An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. (insufficient tool messages following tool_calls message) Patrol failed 271 consecutive runs with this error before it was diagnosed. Each new run added another user prompt on top of the broken structure, so the message slice grew to 33+ messages with one assistant turn at position 23 holding 4 orphan tool_call_ids and 9 user prompts stacked after it. The "patrol runs need a clean slate" comment at line 1898 documents that the knowledge accumulator is freshened per run; the conversation history was the matching gap. ExecutePatrolStream now passes only this run's user prompt to the agentic loop. The session is still written to for audit/forensics, just no longer fed back into the model. Live verified: Patrol now completes successfully (18 tool calls, 3m23s) on a session that previously failed every run with the malformed-history error. The runtime-failure finding auto-resolved on this same successful run. Adds a classifier bucket for "insufficient tool messages" errors so any future regression in this area surfaces with a meaningful diagnostic instead of the generic "Provider analysis error" fallback. New PatrolFailureCauseMalformedToolHistory cause; predicate patrolMalformedToolHistory matches DeepSeek's exact phrasing and OpenAI's similar variants. ai-runtime contract updated.	2026-05-10 17:14:47 +01:00
rcourtman	e087eff00e	Stop polluting Patrol finding lifecycle with no-op heartbeats Every Patrol scan that re-detected an already-active finding was appending a "detected (same_state -> same_state)" lifecycle event with the message "Detected by Pulse Patrol". A finding active for 6 scans rendered as 6 stacked rows reading "Detected Detected by Pulse Patrol (detected -> detected)". Backend: drop the unconditional re-detection lifecycle append in findings.go. The lifecycle should record state transitions, not heartbeats — TimesRaised and LastSeenAt already track recurrence, and the genuine transition events ("regressed", "reminded", "suppression_lifted") are emitted from their own branches upstream. Frontend: defensively hide (from -> to) spans where from === to and strip the type-label prefix from the message so already- persisted polluted lifecycle entries also render cleanly until they age out of the per-finding event cap. Adds a test that locks in the new backend behavior.	2026-05-10 17:13:07 +01:00
rcourtman	297556fb65	Surface cached preflight in Patrol tools readiness check Previously the "Patrol tools" readiness check was static model-name pattern matching: it told the operator whether the selected model is on Pulse's tool-capable allowlist, not whether tools have actually been verified to work. After today's preflight cache, that's strictly less information than what we already know. resolvePatrolToolsCheck now consults aiService.CachedPatrolPreflight() and grounds the check in real evidence when a result for the configured provider+model exists: - cached green (success + tool_call_observed) → "Tool calling verified <age> against <model>." (ready) - cached failure → classified summary + "(last preflight <age>)." (not_ready) - cached soft warning (model_tool_support_unverified) → same with warning status - no cache or model mismatch → static fallback formatPatrolPreflightAge produces stable English ("just now", "5m ago", "2h ago", "3d ago") with full unit coverage (11 cases). HandleGetAISettings now also includes patrol_readiness in its response — previously only the PUT response carried it, so the Patrol page only got augmented readiness after a save. The frontend already had patrol_readiness typed and read it from useAISettingsState. ai-runtime, api-contracts, agent-lifecycle (dep), and storage-recovery (dep) contracts updated.	2026-05-10 16:41:16 +01:00
rcourtman	f74add8271	Auto-seed Patrol preflight cache on Pulse startup Closes the cold-start gap in the preflight observability layer: every Pulse restart blanked the cached "last verified" indicator until the next save or manual click, which meant operators saw "never verified" on every upgrade or process restart even when the configured Patrol model was working fine. NewAISettingsHandler now reuses the existing aiSettingsUpdateRequiresPatrolPreflight predicate with a nil "prior config" — semantically "no in-memory cache yet, just booted." When the loaded config has assistant enabled and a Patrol model, the handler dispatches the same async TriggerPatrolPreflightAsync the save path uses. Routine boots where assistant is disabled (or no Patrol model is selected) skip the dispatch so we never write a misleading "Pulse Assistant is not enabled" entry into the cache. Live verified: after Pulse restart, /api/settings/ai surfaces a fresh patrol_preflight with success=true within ~6s of boot, no operator action required. Predicate test extended with two named cases that document the dual-purpose use (startup seed + skip-when-disabled). ai-runtime, api-contracts, agent-lifecycle (dep), and storage-recovery (dep) contracts updated.	2026-05-10 16:34:34 +01:00
rcourtman	33b6bf97d1	Auto-trigger Patrol preflight when settings save moves Patrol transport Closes the resilience gap: today an operator picks a Patrol model, saves, and Patrol silently fails on its first scheduled run because nothing verified the model actually calls tools. Now the save handler dispatches a one-shot preflight in the background whenever the change actually moved Patrol's transport, and the cached result surfaces on the next /api/settings/ai poll via patrol_preflight. Trigger conditions (aiSettingsUpdateRequiresPatrolPreflight): - new config has assistant enabled AND a Patrol model - AND any of: * no prior config (first save) * assistant was disabled, now enabled * Patrol model changed * API key for the new patrol model's provider changed Routine saves that don't touch Patrol transport (theme, control level, discovery toggles, unrelated provider keys) skip preflight entirely so they don't burn provider tokens or add 5-10s latency to every save. Service-level TriggerPatrolPreflightAsync runs the call in a goroutine with a 30s timeout. Detection helper has full unit coverage including the negative paths.	2026-05-10 16:14:26 +01:00
rcourtman	728c42e47b	Bring action endpoints onto the agent surface with the agent-stable envelope Closes the last known gap in the agent substrate. The three action endpoints (POST /api/actions/plan, /api/actions/{id}/decision, /api/actions/{id}/execute) previously emitted the platform-wide APIError shape (stable code under "code", human under "error"). The agent surface uses the inverted shape (stable code under "error", human under "message"), so adding action capabilities to the manifest as-is would have forced agents to remember which envelope each capability uses. The slice refactors actions.go to emit the agent-stable envelope across all 42 writeErrorResponse call sites. writeJSONError gains a writeJSONErrorWithDetails sibling so the 13 calls that pass field-level reasons (validation failures) preserve that information under a new optional `details` field. The action endpoints' JSON shape becomes: {"error": "<stable_code>", "message": "<human>", "details"?: {"<field>": "<reason>"}} Frontend impact: zero. Verified that no frontend code consumes the three action endpoints; the refactor is API-only. Three new manifest entries (plan_action, decide_action, execute_action) under a new "action" category, with their declared error codes pinned per capability. Internal-failure 5xx codes (audit-store outages, encode failures) are not declared per capability; agents branch on 5xx generically. TestContract_AgentSurfaceErrorCodesMatchManifestDeclarations now audits actions.go alongside the existing two handler files, with a documented internal-only allowlist for the 5xx codes. The TestAgentSubstrate_ActionEndpointsEmitAgentStableEnvelope e2e test exercises one error path through each endpoint via the actual HTTP boundary, asserting the agent-stable envelope reaches the wire and the legacy APIError fields (code, status_code, timestamp) do NOT — drift back would mean the refactor regressed. The TestContract_ActionDryRunOnlyExecutionErrorJSONSnapshot pin is updated to match the new envelope shape; the manifest's category allowlist gains "action". api-contracts.md documents the new envelope (with details map), the action governance loop's place in the substrate, the ai:execute scope distinction from monitoring:write, and the "manifest projection has a footnote" trade-off: bringing an existing endpoint into the agent surface may require migrating its error envelope, but the substrate keeps a single envelope contract rather than carrying a translation wrapper layer. agent-lifecycle.md and storage-recovery.md document the action surface joining the agent paradigm and its zero-new-persistence posture respectively. AGENT_SUBSTRATE.md's "what it does not do yet" no longer lists the action surface; it now reflects the real outstanding items (consumer feedback, an in-Pulse agent integrations panel, a distribution path for pulse-mcp).	2026-05-10 15:16:17 +01:00
rcourtman	add1096ec8	Cache Patrol preflight outcome and hydrate UI on settings load The Verify Patrol button reset its result to empty on every page load — the operator had to re-click to see the verified state, even though nothing had changed. This commit adds the observability layer of the auto-preflight plan: every RunPatrolToolPreflight result is now cached on the AI Service and surfaced through /api/settings/ai as patrol_preflight, so the inline result panel rehydrates on page load with the most-recent outcome and a "last verified Xs ago" indicator. Backend: patrolPreflightCache (mutex-guarded) on Service with defensive-copy CachedPatrolPreflight() accessor; every RunPatrolToolPreflight branch (success, soft warning, classified failure, validation early-return) records into the cache. PatrolPreflightSnapshot projects the cached result onto the AI settings response. Tests cover both success-then-failure supersession and the defensive-copy invariant. Frontend: PatrolPreflightSnapshot type mirrors the wire shape; hydratePatrolPreflightFromSettings(data) projects the snapshot into the same response shape the manual button writes; loadSettings and updateSettings flows call it. The result panel renders a "last verified Xs ago" line under the provider/model row when recorded_at_unix is present. End-to-end smoke verified against deepseek-v4-flash: panel rehydrates as green "Tool calling verified · last verified just now" after page reload. Auto-preflight on save (the trigger half of the resilience plan) follows in the next commit. Contracts: ai-runtime, api-contracts, agent-lifecycle (dep), storage-recovery (dep), frontend-primitives all updated to reflect the new patrol_preflight surface and hydration contract. Verification artifacts: settingsArchitecture + patrolPreflight client tests.	2026-05-10 14:58:26 +01:00
rcourtman	404f87854e	Pin cross-org and cross-resource isolation on the bundle's pending approvals The AgentApprovalsProvider closure in router.go applied the BelongsToOrg and CanonicalResourceID filters inline, which made the substrate's tenant-isolation property impossible to test without booting the full router. Drift in the closure (e.g. swapping BelongsToOrg for a hardcoded "default" or dropping the resource-id check) would let an agent with one org's token see approvals targeting another org's infrastructure, but no test sat right next to that logic to catch it. Extracts the body into a named function in agent_resource_context.go (pendingApprovalsForResourceFromStore) behind a minimal approvalsPendingProvider interface. The closure in router.go now delegates to it. Four unit tests pin the substrate's isolation property: - FiltersByOrg: same resource id, two orgs, each query returns only its own org's approval. - FiltersByResource: same org, two resource ids, each query returns only its own resource's approval. - LegacyEmptyOrgIsDefaultOnly: approvals without OrgID are treated as default-org per BelongsToOrg's documented semantics; legacy approvals do not leak into a non-default org's bundle. - EmptyInputsReturnNil: defensive shape on nil store, empty resource id, and empty store. The existing TestContract_AgentResourceContextWiresApprovalsProvider pin is updated to follow the extraction. Both halves of the wire-up are now pinned: router.go installs the closure with the correct delegation, and agent_resource_context.go owns the filter logic with both safety checks present. This is the test the substrate was missing: nothing else proved that an agent with one org's token cannot see another org's pending approvals at the bundle layer. Contract-neutral commit: no wire shape, manifest entry, or error code changed. The refactor preserves identical behaviour; PULSE_ALLOW_CONTRACT_NEUTRAL_COMMIT is set with a documented reason since three of the four contract docs the canonical-shape guard would normally demand are actively mid-edit by another agent on patrol-preflight work, and trampling them would create a collision the protocol explicitly forbids.	2026-05-10 14:38:10 +01:00
rcourtman	e26a57a157	Add POST /api/ai/patrol/preflight tool-call verification The existing per-provider /api/ai/test endpoints only call ListModels — they pass for every provider that returns a catalog, even when Patrol fails 100% of runs because tools aren't actually wired up. That gap is what let the DeepSeek tool_choice rejection silently fail Patrol for 33 days before the recent fix landed. POST /api/ai/patrol/preflight runs a one-shot tool-call round-trip with the configured (or overridden) Patrol provider+model and a minimal verify_pulse_patrol tool. Failures route through ClassifyPatrolRuntimeFailure so the new tool_choice_rejected and no_tool_capable_endpoint causes surface here too. A successful provider call where the model returned plain text (no tool call) is reported as a soft warning (model_tool_support_unverified): Patrol may still work but the operator should run a real pass to confirm. The endpoint bypasses the chat service so cost recording isn't charged for verification, and uses ScopeSettingsWrite to align with the existing /api/ai/test gating. Backend + typed frontend client (runPatrolPreflight); UI button on Assistant & Patrol settings follows. Contracts updated: - ai-runtime: completion obligation extended to cover the new verification surface - api-contracts: payload shape (tool_call_observed, duration_ms) noted in obligations - agent-lifecycle, storage-recovery: dependent-extension acknowledgment that ai-runtime owns the new route despite it living under internal/api/	2026-05-10 14:30:41 +01:00
rcourtman	f2d9d2aba8	Split overgreedy "tools not supported" classifier into three causes The Patrol runtime classifier collapsed three distinct upstream conditions into one misleading "Selected model does not support Patrol tools" message: 1. Provider rejected the value Pulse sent for tool selection (e.g. DeepSeek's "deepseek-reasoner does not support this tool_choice" — the model accepts tools, just not the forced coercion). The DeepSeek fix in `46145df9` dodges the symptom by coercing to auto, but the original misclassification pointed operators at the wrong remediation for 33 days. 2. Provider has no tool-capable endpoint available for the selected model (OpenRouter's "No endpoints found …" surfaces this when account-level provider/data filters exclude every tool-capable route). 3. Model truly lacks tool calling (the literal "tools are not supported" / "tool calling" cases). Each now has its own PatrolFailureCause, title, summary, description, and recommendation. summarizePatrolRuntimeFailureDetail mirrors the split. Helper predicates patrolToolChoiceValueRejected and patrolNoToolCapableEndpoint encapsulate the substring matching. The OpenRouter "No endpoints found" test fixture now correctly classifies as no_tool_capable_endpoint instead of model_unsupported_tools — fixture updates in patrol_runtime_failure_test.go, patrol_assistant_handoff_test.go, and ai_handler_test.go reflect the more accurate diagnostic. New tests cover the tool_choice_rejected and generic model_unsupported_tools paths explicitly. The ai-runtime contract is updated to note the classifier-split obligation alongside the existing transport-shape obligation.	2026-05-10 14:10:18 +01:00
rcourtman	46145df925	Coerce DeepSeek tool_choice to "auto" so Patrol stops failing DeepSeek's API server-side aliases deepseek-v4-flash and deepseek-v4-pro to deepseek-reasoner, which rejects forced tool_choice with HTTP 400 ("deepseek-reasoner does not support this tool_choice"). Pulse's classifier then surfaced this as "Selected model does not support Patrol tools," misdirecting diagnosis to the model rather than the request shape. supportsForcedToolChoice now returns false for any DeepSeek client, so every DeepSeek model falls back to tool_choice "auto" regardless of how DeepSeek routes the requested ID. The ai-runtime contract is updated to match: the provider-transport boundary now coerces forced tool_choice for every direct DeepSeek model ID, not only unknown ones. Patrol verified end-to-end: 20 tool calls, 9 findings, prior runtime failure auto-resolved.	2026-05-10 00:04:13 +01:00
rcourtman	2f0468a87b	Verify SSHSIG on in-app update artifacts The unattended timer (scripts/pulse-auto-update.sh) and the public bootstrap (scripts/install.sh, /install.sh) all verify the .sshsig sidecar against the pinned pulse-installer ed25519 key before trusting a release artifact. The in-app updater verified SHA256 only — same artifact, same root execution context, lower trust bar. Closing the asymmetry: the in-app tarball download in ApplyUpdate, adapter_installsh.go's install.sh download (piped into bash as root), and the rollback binary download now fetch and verify the .sshsig sidecar against the same pinned key, fail-closed. The signing infrastructure (release_asset_common.sh, validate-release.sh, backfill-release-assets.sh) already produces and validates these signatures for every release; this teaches the Go updater to honor what the shell paths have always required. ssh-keygen is shelled out to so the in-app updater shares the exact trust path used by the unattended path, with a package-level function variable for test injection so unit tests don't require ssh-keygen on the build host. Extends the deployment-installability contract's release-trust-fail-closed invariant to cover the in-app updater paths.	2026-05-09 23:14:07 +01:00
rcourtman	eeb2975d22	Stability sweep on the agent-substrate arc Three things landed: 1. /api/agent/capabilities was missing from publicPathsAllowlist in router_public_paths_inventory_test.go. Slice 47 added the path to publicPaths in router.go and to publicRouteAllowlist in route_inventory_test.go but missed this second mirror, which scans publicPaths via go/ast. The test was failing on origin; this commit closes the gap. 2. The error-envelope paragraph in api-contracts.md now distinguishes capability-specific stable codes (the closed set declared per capability in the manifest) from cross-cutting codes the multi-tenant / auth middleware emits universally (invalid_org, org_suspended, access_denied). The previous wording implied all stable codes lived in per-capability errorCodes lists, which would have forced duplication on every capability or misled agents about which codes to expect. 3. New contract pin TestContract_AgentSurfaceErrorCodesMatch- ManifestDeclarations enforces the symmetry both directions: every code emitted by an agent-surface handler must be either declared in the matching capability or be one of the three cross-cutting codes; every manifest-declared code must have a matching emission. Drift either way is a contract regression. Pin verified clean against the current handler set. Stale forward-reference fixed: the capabilities paragraph no longer says "future MCP-server slices read the manifest" — slice 51 already shipped that adapter. Sweep also surfaced two failures in internal/mock/ from unrelated platform-support drift (unraid token set added in `ac82a2852` but the mock contract test wasn't updated). Those are not part of the agent-substrate arc and not mine to fix; flagged in the closing summary so they don't get lost.	2026-05-09 23:04:22 +01:00

1 2 3 4 5 ...

2769 commits