Small Ollama models (qwen2.5:11b, qwen2.5:14b, similar) frequently emit
Pulse tool invocations as plain JSON inside content instead of routing
through the structured tool_calls channel. Users saw raw payloads like
`{"name": "pulse_query", "parameters": {...}}` as the assistant's final
response.
Extend cleanToolCallArtifacts and containsToolCallMarker with a new pass
that detects this leak shape, gated on a closed allowlist of canonical
tool names sourced from the runtime registry. The allowlist auto-syncs
when registerTools() gains a tool, so no separate hand-maintained list.
Anchored on (?:^|\n) and a leading `"name"` key so prose containing JSON
fragments, unrelated objects (`{"foo":"bar"}`), or named resources
(`{"name":"my-vm","cpu":50}`) are left untouched.
Cluster peers without a Pulse Unified Agent are still monitored via the
cluster's PVE API token — the agent adds richer telemetry (hardware,
sensors, OS metadata) but isn't the only source of monitoring. The
previous "N node(s) unmonitored" copy misrepresented the state.
Reframes the banner as an action opportunity, matching the adjacent
"Review & Deploy" CTA. Gate logic unchanged.
The "N nodes unmonitored" banner gated purely on absence of a Pulse
Unified Agent (r.agent?.agentId). Offline cluster members (status
'offline') were counted as deploy candidates, which is wrong on two
fronts: those nodes are typically still covered by the cluster's PVE
API token (so they aren't truly unmonitored, just unreachable), and an
offline host is precisely the case where deploying an agent cannot
succeed.
Exclude status === 'offline' from the unmonitored count.
The previous comment claimed DeepSeek's API aliases v4-flash/v4-pro to
deepseek-reasoner, justifying the auto coercion via the legacy
reasoner's known 400 behavior. That had the alias direction inverted:
per DeepSeek's pricing page, deepseek-chat and deepseek-reasoner are
deprecated aliases for v4-flash's non-thinking and thinking modes
respectively, not the other way around.
The coercion itself is empirically correct, though. Live preflight
against deepseek-v4-flash with tool_choice=required produces a
deterministic HTTP 400 ("provider rejected forced tool selection") in
275ms. The behavior is consistent across the DeepSeek user community
- multiple downstream projects (pydantic-ai #5193, claude-code-router
#1378, opencode #24190, others) confirm V4 models reject forced tool
selection despite DeepSeek's chat-completion docs listing required as
a supported value. Server reality disagrees with documentation.
This commit updates only the comments in openai.go and openai_test.go
to point at the empirical evidence and the community confirmation.
The coercion behavior is unchanged.
The readiness banner previously rendered only the summary string
("Provider connection issue (last preflight 2h ago)"), even though
the underlying payload already carried the provider, model, preflight
duration, tool-call observation, and recommendation. Operators had to
open dev tools to find out which provider failed and what to check.
Now the banner inlines:
- Provider and model that preflight tried to reach
- Preflight duration and whether a tool call was observed
- The preflight recommendation text
The state hook gains a patrolPreflight memo that reads from the same
AISettings payload the rest of the patrol surface already consumes.
Found by exercising pulse_summarize in real chat: the user asked a
question, the model called the tool successfully (response came back
with narrative_source: ai), but the chat panel ended with raw DSML
text and "Assistant response is ready" — no actual prose answer.
Root cause was in agentic_sanitize.go. The fast-path string list
only checked single-pipe DSML variants ("<|DSML|...>"), but
deepseek-v4-flash emits the double-pipe form ("<||DSML||...>").
The opening sequence didn't match any marker, so cleanToolCallArtifacts
returned the content unchanged. The chat orchestrator then showed
the raw DSML to the user as if it were the assistant's answer.
Fix:
- Add double-pipe variants (Unicode and ASCII) to the fast-path
marker list. Six new entries, mirroring the existing single-pipe
entries.
- Add a backstop regex (dsmlRe) that matches any pipe-count variant
via `</?[\||]+/?DSML[\||]*`. Future model behaviour with triple
or higher pipe counts gets caught without another fast-path edit.
- containsToolCallMarker gets the same coverage so streaming
detection stops forwarding content the moment the marker
appears, regardless of pipe count.
Tests in agentic_sanitize_test.go gain three new cases for cleanup
(double-pipe Unicode, double-pipe ASCII, triple-pipe regex
backstop) and two for detection (double-pipe Unicode, double-pipe
ASCII). All passing.
Process note: this bug was invisible to existing tests because the
test suite only covered the single-pipe variants that were
documented in the marker list. The double-pipe form only appears
when an actual model emits it. Same pattern as the JSON-casing fix
earlier — exercise the surface against a real LLM, find what tests
in isolation can't see.
The previous rip retired only active "Active alert detected" findings
from the now-removed detectAlertSignals -> SignalActiveAlert emitter.
Resolved instances were left in place, polluting the Resolved tab and
inflating the regressed total on the trust strip (a stale 8 resources
each marked "regressed 3x" with descriptions like "Active warning
alert: Container 'ollama' is powered off"). They have no canonical
operator value -- the Alerts surface is the source of truth for
currently-firing alerts -- so on load we now purge them entirely
rather than keeping them around as Resolved noise. Active mirrors are
still retired (auto-resolved with a clear reason) so operators see
why the finding closed; resolved mirrors disappear silently because
they were already in the terminal state. Idempotent.
Extends TestFindingsStore_SetPersistence_RetiresLegacyAlertMirrorFindings
with a fixture for the resolved-mirror case and asserts both the
in-memory purge and the persisted state no longer carries it.
Found by exercising pulse_summarize in a real chat session. The
chat-tool response surfaced:
"observations": [{"Text": "...", "Severity": "info"}]
The AI narrator's system prompt (report_narrator.go) tells the
model to emit lowercase keys:
{"text": "...", "severity": "..."}
The model was being taught one schema and shown a different one
in the tool response for the same shape. NarrativeBullet,
FleetOutlier, Narrative, and FleetNarrative had no JSON tags, so
embedded struct fields serialised with their Go names.
Add struct tags so the wire shape matches the prompt schema. Pure
marshaling change — JSON tags don't affect Go field access, so
PDF rendering (which reads fields directly) is unchanged. Tests
in narrative_json_test.go pin the shape so the inconsistency
can't reappear silently.
Process note: this is a class of bug that only appears when an
LLM actually consumes the output. No unit test caught it; no
review of the code showed it; the model running through the chat
path is what surfaced it. Another argument for "exercise the
artifact" — even the tool surface that looks correct in
isolation has hidden inconsistencies you only see when something
external reads it.
loadDashboardData and other unscoped loadPatrolFindings callers were
overwriting the Resolved/All tab data set with the active-only payload
on every poll, causing those tabs to blink empty between refreshes. A
module-level sticky preference now flips to true the first time any
caller asks for includeResolved and keeps subsequent unscoped loads
expanded, so the Resolved and All tabs hold their data across polling.
Investigate's prompt instructs the LLM to "actively use the Pulse
tools to gather fresh evidence about the current state of the
affected resource." Verify fix asks "did the applied fix actually
clear the underlying condition right now." Both prompts are about
CURRENT state — irrelevant for findings whose status is resolved,
dismissed, or snoozed, where there is no current condition to
investigate or verify against.
Gates both buttons on finding.status === 'active'. Explain, Why,
and Discuss stay on closed findings — retrospective questions
("walk me through what we know," "what caused this") remain
meaningful for audit-trail review.
(The hasAppliedFix gate stays on Verify fix; the new active-status
gate just adds an additional necessary condition.)
The Resolved-tab fix triggered an includeResolved load when the
operator selected 'resolved'. The 'all' filter had the same
problem one layer over — it shows whatever's in sourceFindings()
without further filtering, but sourceFindings was loaded
active-only, so All and Active rendered identically.
Extends the same load trigger to the 'all' filter so both
history-bearing tabs (All and Resolved) hydrate the full
active+resolved+dismissed+snoozed payload.
Found by actually generating two PDFs against the dev server and
holding them in hand — neither was visible by reading code alone.
1. HEALTHY on empty data was misleading. A report against a resource
with zero data points and no alerts showed a green HEALTHY card
with "All systems operating normally," contradicting the
"Data Points: 0" line on the cover. A user reading the report
would believe their resource was operating cleanly when really
Pulse had no metrics to evaluate. writeExecutiveSummary now
detects TotalPoints == 0 and len(Summary.ByMetric) == 0 and
renders a muted grey "NO DATA / No metrics reported during the
selected window" card instead.
2. AI discoverability gap. With AI unconfigured (or failing), the
PDF is functionally identical to what it was before the AI
narrative work landed — no AI prose, no period comparison, no
provenance footer. A user has zero signal that AI-narrated
reports are a separate Pulse Assistant capability. Adds a
one-line muted tip at the end of the executive summary when
Narrative.Source == NarrativeSourceHeuristic pointing at
Settings. Fleet path gets the same nudge scoped to fleet
synthesis. Mutually exclusive with the AI provenance disclaimer
so we never show both.
Tests in pdf_ux_test.go inflate FlateDecode'd content streams to
substring-check the actual rendered text, covering empty-data ->
NO DATA, quiet-with-data -> HEALTHY (regression guard), heuristic
narrative -> tip, AI narrative -> disclaimer + no tip, and the
fleet-heuristic tip.
The trust strip on the Patrol page credits "N auto-resolved" but
the Resolved tab next to it sat empty — operators could see the
count but not click through to audit which findings had been
resolved or by what mechanism. The /api/ai/patrol/findings
endpoint only returned active findings, so the frontend filter
(status === 'resolved' || 'dismissed' || 'snoozed') had nothing
to render.
Adds the audit-trail accessor end to end:
- PatrolService.GetAllFindingsIncludingResolved returns active +
resolved + dismissed + snoozed findings at warning severity or
higher, sorted with active first then by severity then recency.
Two separate severity orderings — filter (info=0..critical=3,
used with >= against the warning floor) and sort
(critical=0..info=3, used with < to surface critical first).
Conflating them initially let watch findings leak through the
warning floor; the test fixture catches that.
- HandleGetPatrolFindings honors a new include_resolved=1 query
parameter that routes to the new accessor. Default behaviour
(active only) is unchanged for clients that just want the live
findings list.
- Frontend getPatrolFindings accepts an options object with
includeResolved and loadPatrolFindings threads it through.
- FindingsPanel triggers an includeResolved load whenever the
Resolved filter becomes active for the Patrol-source view.
Test: TestPatrolService_GetAllFindingsIncludingResolved_IncludesResolvedAndDismissedSortsActiveFirst
covers active-first ordering, inclusion of resolved + dismissed,
and the warning severity floor (watch-level findings must not
leak through).
ResolveFinding adapter previously logged a warning and allowed the
LLM's resolve to proceed when the deterministic verifier returned
an error (timeout, executor unavailable, etc.). That's fail-open:
any verifier failure let the auto_resolved → re-detected cycle
continue, exactly the pattern the rest of this branch's
patrol_resolve_finding work spent commits closing. The "Backup
failed" finding on the live preview still cycled once post-
migration because of this path — verifier returned an
ErrVerificationUnknown and resolve was permitted.
Resolution of an event/persistent category finding is effectively
permanent (next detection registers as a regression and inflates
counters and pollutes the trust strip). When the deterministic
verifier cannot confidently say the failure signal is gone, we
don't have grounds to honor the LLM's judgment — the LLM's
"current investigation didn't surface a fresh failure" is exactly
the unreliable signal that produced bogus cycles.
Switches the inconclusive-verifier branch from log-and-allow to
log-and-reject, returning an error to the tool so the LLM can
retry or escalate to the operator. The verifier-still-detects-
signal path stays as-is (it was already fail-closed).
Test: TestPatrolFindingCreatorAdapter_ResolveFinding_RejectsWhenVerifierIsInconclusive
exercises the path by calling ResolveFinding on a backup-failed
finding through a PatrolService with no chat service wired
(getExecutorForVerification returns ErrVerificationUnknown). Asserts
the error mentions 'inconclusive' and that ResolvedAt remains nil.
Contract: extends the deterministic-resolve-gate clause in the
ai-runtime canonical-files completion-obligations to name the
fail-closed-on-inconclusive policy explicitly.
Adding the four new rubric buttons (Investigate, Why, Verify fix,
Create rule from this) plus the Remember-as-expected rename
expanded the action area to 15 buttons in two undifferentiated
flat rows — same visual weight, no intent grouping. Operator had
to read the row left-to-right to discover what was where.
Row 1 splits into:
- Ask Pulse Assistant cluster: Explain | Investigate | Why |
(Verify fix, conditional) | Discuss
- Operator memory + share cluster: Add Note | Copy summary
Row 2 splits into:
- Decide cluster: Acknowledge | Mark resolved
- Delay cluster: Snooze 1h | 24h | 7d
- Dismiss/promote cluster: Dismiss: Not an issue | Remember as
expected | Dismiss: Later | Create rule from this
Each cluster sits in its own flex container; clusters separated
by a faint vertical divider on sm+ screens (border-l +
border-border-subtle) and stack with extra gap on narrow screens.
"Discuss with Assistant" shortened to "Discuss" to free horizontal
space without losing meaning — the surrounding cluster context
already says "Pulse Assistant."
No new behaviour. The button handlers, prompt builders, API
helpers, and conditional gates are unchanged. This is pure layout
so the surface matches the intent hierarchy the seven-button
rubric implies.
Last of the seven contextual entries from the captured Pulse
Intelligence rubric. "Remember as expected" handles one
instance; "Create rule from this" promotes the pattern: any
future finding on the same {resource, category} pair auto-
dismisses inside the backend's existing
FindingsStore.isSuppressedInternal /
MatchesSuppressionRule machinery rather than surfacing as a new
finding.
The backend endpoint already exists
(POST /api/ai/patrol/suppressions →
HandleAddSuppressionRule → FindingsStore.AddSuppressionRule).
The button + inline confirm panel is the missing surface:
- frontend-modern/src/api/patrol.ts gains
createSuppressionRuleFromFinding(input) that POSTs to the
existing endpoint with the finding's resource + category +
operator-supplied reason.
- FindingsPanel adds a Create rule button at the end of the
lifecycle action row, plus an inline confirmation that
surfaces the rule scope (resource + category), requires a
reason, and explains the future-auto-dismiss commitment.
Submission goes through aiIntelligenceStore.loadDashboardData
so the local view reflects the audit trail of record.
- Mirrors the visual pattern of the existing dismiss-confirmation
panel but uses neutral surface styling because this isn't a
dismissal — it's a permanent commitment, distinct from
Remember as expected which dismisses just this instance.
No backend changes; the rule machinery and the API endpoint are
unchanged. This is the surface piece. Type-check clean.
The locked-state description advertised the v5 capability set ("PDF
and CSV performance reports plus current-state VM inventory
exports") and never caught up with what the v6 reporting feature
actually delivers behind the gate: AI-narrated executive summary,
fleet outlier detection with named resources, period-over-period
comparison, and Patrol findings rolled into the narrative.
Update both the LockedState (what non-Pro users see on the upsell)
and Guidance (what Pro users see on the enabled surface) copy so
they match what ships. The locked-state copy is honest about the
AI being optional — narration uses Pulse Assistant when configured
and falls back to a deterministic summary otherwise — so users who
haven't set up Assistant don't think the Pro feature is gated
behind a separate AI configuration.
No structural change to the catalog: same fields, same JSON shape,
same downstream consumers. Frontend renders these strings directly
from the catalog endpoint, so the copy update propagates without
any frontend code change.
Fifth of the seven contextual entries from the captured Pulse
Intelligence direction. The captured rubric named this button
"Remember this is expected" — future-looking, "Pulse should know
this state is expected" — but the surface labelled it
"Dismiss: Expected", which reads past-looking and groups with
two unrelated dismissal intents (Not an issue, Later).
Renames the button to "Remember as expected" and updates the
confirmation panel:
- Header verb tracks intent: "Remembering as expected" for the
expected_behavior reason, "Dismiss as: ..." for the other
reasons.
- Confirmation copy now says Pulse will "remember that this state
is expected on this resource" alongside the existing
acknowledgement-and-no-renotify framing.
- Button tooltip explains the future-looking commitment so
operators understand the intent before clicking.
No wire change — the dismiss reason is still expected_behavior,
and operator-state on the resource still drives auto-acknowledge
behaviour for future similar findings (already in place from
earlier work). This is the rubric-alignment piece; the
operator-memory machinery underneath is unchanged.
Fourth of the seven contextual Assistant entries. Verify fix is
the post-remediation confirmation step: after a fix has run, the
operator asks the Assistant whether the underlying condition
actually cleared, rather than trusting the fix command's exit
code or the LLM's prior self-verification.
- Widens PatrolAssistantFindingIntent to include 'verify_fix'.
- buildPatrolAssistantFindingPrompt gains a verify_fix branch
that directs the LLM to check the current evidence against the
original signal that fired the finding (metrics, resource
state, recent alerts, service health), then synthesize: is the
condition cleared, what evidence supports that judgment, how
confident, and is there residual risk to monitor for. Tool
calls are allowed; state-changing commands are explicitly
forbidden — verification is read-only.
- FindingsPanel adds a Verify fix button after Why, gated by
hasAppliedFix() which returns true for investigation outcomes
fix_executed, fix_verified, fix_verification_failed, and
fix_verification_unknown. For fix_queued (no fix has run yet)
and fix_failed (fix didn't complete) the button is hidden
because there is nothing applied to verify.
- autoSendInitialPrompt extends to verify_fix; Discuss with
Assistant unchanged.
Test: new verify_fix-intent prompt-builder case asserts the
verification dimensions (condition cleared, evidence, confidence,
residual / monitor) and the read-only safety boundary, and
isn't either Discuss or Investigate phrasing.
Third of the seven contextual Assistant entries from the captured
Pulse Intelligence direction. Where Explain says "tell me what we
know" and Investigate says "go find out what's true now," Why
says "what caused this":
- Widens PatrolAssistantFindingIntent to include 'why'.
- buildPatrolAssistantFindingPrompt gains a why branch that
directs the LLM toward cause signals — recent changes around
detection time, learned correlations, prior incident memory,
regression history — rather than current state. The prompt
asks for: what most likely caused this to fire now, what
evidence in the attached context supports that cause, what
would have to be true for the cause to recur. Tool calls for
verification are allowed; state-changing commands still
require operator approval.
- FindingsPanel adds a Why button between Investigate and Copy
summary, with a handleWhyFinding handler routing through
openFindingInAssistant.
- autoSendInitialPrompt now triggers for explain / investigate /
why; Discuss with Assistant unchanged.
Test: new why-intent prompt-builder case asserts the prompt
mentions cause-focused signals (recent changes, correlations,
prior incidents, regressions), the synthesis dimensions (caused,
evidence, recur), and the operator-approval boundary, and isn't
either Discuss or Investigate phrasing.
The parent-package audit (TestCostRecordingCoverage in
internal/ai/cost_recording_audit_test.go) only walks
internal/ai/*.go — sub-packages were unguarded, which is the gap
that let the chat cost-recording bug land in the first place (the
agentic loop in internal/ai/chat/ called ChatStream without anyone
recording cost; fixed in a0b3bc7ed).
Adds a parallel audit at internal/ai/chat/cost_recording_audit_test.go
covering both .Chat() and .ChatStream() callers. Same shape as the
parent audit but extended to include the streaming variant, since
the chat package is built around the agentic loop's ChatStream
call site.
The chat package's orchestrator/loop split means recording lives
in chat.Service.recordChatTurnCost — the orchestrator that owns
the loop — rather than inside the loop methods that call
ChatStream. Two doc-comment exemptions cover that:
- AgenticLoop.ExecuteWithTools / executeWithTools (agentic.go)
- AgenticLoop.ensureFinalTextResponse (agentic_final.go)
Both name the orchestrator that records, so a future contributor
moving recording elsewhere has a clear pointer to update.
A new ChatStream caller added to the chat package without either
recording or an exemption marker will now fail this audit, the
same way the parent audit would have caught QuickAnalysis or
Narrate if they had been written without recording.
Pulse Intelligence direction named seven contextual Assistant entry
points: Explain, Investigate, Prepare fix, Verify fix, Why did
this happen, Remember this is expected, Create rule from this.
Explain was the only first-class one. Investigate was rolled into
"Discuss with Assistant" with a generic open-ended prompt.
Splits Investigate off as its own action. Where Explain says
"tell me what we already know," Investigate says "go find out
what's true right now":
- Widens PatrolAssistantFindingIntent to 'discuss' | 'explain' |
'investigate'.
- buildPatrolAssistantFindingPrompt gains an investigate branch:
the prompt explicitly instructs the LLM to use its Pulse tools
(metrics, alerts, resource state, recent changes, correlations)
to gather fresh evidence, then synthesize root cause +
confidence + safe next step + whether the recommended action
still holds. Any command-running step must route through
governed approval, not the LLM's own judgment.
- FindingsPanel adds an Investigate button between Explain and
Copy summary, and a handleInvestigateFinding handler that
routes through openFindingInAssistant with the new intent.
- Both Explain and Investigate now set autoSendInitialPrompt:
true (single condition update); Discuss stays false.
Test: a new investigate-intent prompt-builder case asserts the
prompt mentions active tool use, the synthesis dimensions
(root cause, confidence, safe next step), and the governed-
approval safety boundary, and isn't the Discuss seed.
Pulse Assistant has been reactive: clicking Explain on a Patrol
finding opened the drawer, pre-filled a substantive investigation
prompt, and then waited for the operator to press Enter. The
captured Pulse Intelligence direction calls for the opposite —
when the operator clicks an action button, the analysis should
already be in flight by the time they land on the drawer, not
parked on a textarea waiting for a second confirmation step.
Adds autoSendInitialPrompt to AIChatContext. When true, the chat
surface fires handleSubmit immediately after the initialPrompt is
written to the input (deferred via queueMicrotask so the input
signal has propagated and the drawer-open effects have settled).
handleSubmit's existing guards against empty prompts and concurrent
submissions make this safe. clearAutoSendFlag is the symmetric
clearer so the flag doesn't persist across opens.
Wires Explain to autoSendInitialPrompt: true; Discuss with
Assistant stays false because that entry is open-ended by
design. The same plumbing is what future action-style entries
(Investigate, Verify fix) will route through.
Tests: two new aiChat-store cases lock in the flag plumbing and
the default-undefined behaviour. All 15 aiChat-store + 70
AIChat-component tests pass. Verified live: clicking Explain
clears the textarea and surfaces the in-flight Thinking
indicator without the operator pressing Enter.
The Infrastructure workspace previously inverted the default by
initializing focusedNavigationExpanded=false, which meant landing on
Settings (which defaults to the Infrastructure tab) always opened with
the navigation collapsed to a 4rem rail. On a typical desktop the 18rem
sidebar isn't crowding the workspace, but the rail-by-default cost a
click on every settings visit.
Remove the focused-navigation special case and the now-trivial wrappers
around setSidebarCollapsed; let Infrastructure share the same
sidebar-defaults-expanded behavior as every other settings tab. Manual
collapse/expand via the chevron is unchanged.
Two small UX touches on Settings → Infrastructure.
1. The empty-cell placeholders for endpoint and coverage on the cluster
header row and on member rows were rendered with an ASCII hyphen while
elsewhere in the same table (the member actions cell, the 'unknown'
Source badge, and other tables across the app) the convention is an
em-dash. Unify on em-dash so the placeholder reads as a single visual
language.
2. The cluster header row's lastActivityText is an aggregation: the
oldest timestamp across the cluster API connection, member liveness,
and member agent connections. That biases toward surfacing the most
stale source, but the column header just says 'STATUS' so a viewer can
mistake it for a most-recent timestamp. Add a title attribute on the
cluster row's timestamp explaining the aggregation. Non-cluster rows
keep no tooltip since their timestamp is single-sourced.
Two adjacent UX wrinkles on Settings → Infrastructure:
1. The primary cluster member's subtitle read "API contact" while the
Source column rendered an "Agent" badge directly beside it. Two unrelated
axes (role within cluster vs how Pulse reads this row) shared visually
adjacent space and parsed as a contradiction. Rename the subtitle to
"Primary node" so it no longer collides with the Source badge.
2. A clustered PVE member without an attached agent fell back to
Source = 'unknown' (rendered as "— No source attached"). That row IS
read via the cluster API, so 'unknown' is incorrect. Default the
fallback to 'api'; agent-installed members keep 'agent'. Invisible in
the current single-cluster homelab setup but accurate for mixed-agent
clusters going forward.
Follow-up to 112d42801: the test asserted the old hardcoded 'api' badge
for a PVE cluster whose members both carry agents. With the row builder
now using sourceFor() aggregation, the expected source is 'both'.
The Infrastructure connected-systems table hardcoded the cluster header
row's source to 'api', so a PVE cluster whose nodes all carry agents
displayed as 'API' while a standalone PVE host with the same setup
correctly displayed as 'API + Agent'. The backend already folds member
agents into ConnectionSystem.Components as attachments
(connections_grouping.go), so the row builder can use the same
sourceFor() aggregation it uses for non-cluster rows.
The patrol-main chat session was reused across every scheduled
Patrol run with no upper bound. After a month of runs the file
had grown to 16 MB / 3,593 messages, and every AddMessage
rewrote the whole file to disk — so the I/O cost per Patrol
run was scaling with total session age, not with the run's own
output. Across all chat sessions on this dev instance, the
ai_sessions directory hit 676 MB / 1,629 files.
The stateless-Patrol-input fix (commit 43760fb0d) stopped
loading the session back into the agentic loop, but Patrol
still wrote each run's messages to the session for the Pulse
Assistant sidebar's forensic view. That write path is what
this commit bounds.
ExecutePatrolStream now calls SessionStore.TrimMessages(200)
after each run, keeping roughly the last two runs' worth of
messages — enough for the sidebar to show recent activity, far
short of unbounded growth. The next Patrol run on a bloated
session will drop the historical 3,000+ messages down to 200
on its first write, so existing storage debt clears on its
own without a separate migration.
User-driven chat sessions are unaffected: TrimMessages with
keepMostRecent <= 0 is a no-op, and callers that want full
history retention simply don't call it. Only Patrol's
forensic session is capped.
The canonical Patrol forensic log is the PatrolRunRecord
history surfaced at /api/ai/patrol/runs — that's the durable
record with structured fields. The chat-session-shaped file
is a sidebar convenience, not the source of truth.
Three tests guard the boundary:
- TrimMessages keeps the most recent N (50 messages
trimmed to 10 → messages 40-49 remain)
- TrimMessages is a no-op below threshold (5 messages,
cap 200 → 5 messages remain)
- TrimMessages with non-positive keep is a no-op (3
messages, cap 0 or -5 → 3 messages remain)
ai-runtime contract updated.
Follow-up to a0b3bc7ed which closed the chat.Service cost-ledger
gap. ai-runtime.md gains a Current State paragraph documenting:
- The pre-fix bug (chat accumulated tokens via SSE done envelope
but never recorded a cost.UsageEvent server-side; chat is the
bulk of AI token spend so the dashboard was dramatically
understating cost).
- The fix shape (recordChatTurnCost runs after every loop return,
success or error since the operator was billed regardless).
- The threading path (chat.Config.CostStore wired by the router
from AISettingsHandler.GetAIService.CostStore()).
- The double-recording invariant (ExecutePatrolStream is
deliberately not changed; its caller patrol_ai.go records via
its own helper).
- UseCase="chat" matches the canonical taxonomy noted on
cost.UsageEvent.UseCase ("chat" or "patrol").
chat.Service.ExecuteStream was a long-standing cost-ledger gap: the
agentic loop accumulated token counts via stream callbacks (see
GetTotalInputTokens / GetTotalOutputTokens in agentic_control.go)
and surfaced them in the SSE done envelope to the frontend, but
nothing on the server side recorded a cost.UsageEvent. Patrol,
discovery, QuickAnalysis, and the report narrators all record; only
chat — the bulk of AI token spend — did not. The operator's AI
usage dashboard was therefore understating cost dramatically.
Found while extending the cost-recording mindset across subpackages
after fixing QuickAnalysis (08491b9f4). Initially spawned as a
separate task but the right shape and scope became clear, so landing
it directly here.
Pipeline:
- Service.CostStore() exposes the per-tenant cost store handle.
- chat.Config gains optional CostStore *cost.Store field, threaded
into chat.Service.costStore at NewService time.
- chat.Service.recordChatTurnCost records a UsageEvent with
UseCase="chat" after every loop.ExecuteWithTools return (success
OR error — operator was billed regardless of clean response).
Skips when costStore is nil or zero tokens accumulated.
- ai_handler.go's two chatCfg construction sites populate CostStore
via h.resolveCostStore(ctx).
- router wires the resolver to AISettingsHandler.GetAIService(ctx).CostStore()
with no Enabled gate — even brief chat usage while AI was being
configured should appear in the dashboard.
ExecutePatrolStream is deliberately not changed. It creates a
separate tempLoop and its caller (patrol_ai.go) records cost via
its own helper at line 887. Recording in ExecuteStream only avoids
double-counting on the patrol-via-chat path.
Tests in chat/cost_recording_test.go cover: recording when store
configured, no-op when store nil, no-op on zero tokens (early
failures), graceful handling of model strings missing the
provider prefix.
Extends 688d00a55 (Workloads guest rows) to the remaining tables and
shared cells: Infrastructure host/PMG/PBS rows, Alerts desktop and
mobile rows, Recovery history and protected inventory, AI cost
dashboard cards, ResponsiveMetricCell fallback, the infrastructure
summary table CPU-temp cell, the AGENT/CHECKER node columns under
Settings → Infrastructure, and the help-icon example bullets.
Each "—" or "-" that signals "no value here" now carries
aria-hidden="true" so screen readers skip the decoration and announce
just the cell label / surrounding context. Dashes with informational
title attributes (e.g. "Disk stats unavailable…") are left audible —
the title is the accessible name and should be read.
Verified live: 24 of 25 dash spans on the Workloads page now hidden,
the only audible one is the title-bearing disk-status span.
CleanupAlertsForNodes removes alerts whose Node isn't in
existingNodes. That map is built upstream only from state.Nodes
(Proxmox nodes) and state.PBSInstances (PBS instances) — no agent
resources. Agent-sourced alerts (Unraid, standalone Linux hosts,
TrueNAS, anything reached via Pulse Agent) typically have Node=""
or an agent UUID, so they fall into the cleanup branch every
cycle. Then the next poll re-creates the alert as new, calls
AddAlert, and appends a fresh history row. Observed in the wild:
3,980 alert history entries in 7 days, with the same canonical
alert ID (e.g. "Unraid array running without parity protection")
appearing every 30 seconds.
Adds a carve-out for ResourceID prefixes starting with "agent:",
matching the existing pattern for "docker-" / "docker:" and
"pbs-" / "pbs-offline". Locks in the behaviour with a new subtest
that mixes agent-sourced and Proxmox-sourced alerts and asserts
that only the legitimately stale Proxmox alert is removed.
Defense-in-depth for the malformed-history bug pattern. The
Patrol fix made patrol-main runs stateless, but Assistant
chat sessions are inherently multi-turn and must keep their
history. Any chat session that ends mid-tool-call — network
drop, ctx timeout, browser crash, uncaught panic, any
interrupt that fires between "model emits tool_calls" and
"agentic loop appends all tool results" — leaves the
persisted session with orphan tool_call_ids. The next message
that loads this history is rejected with the same provider
error that flapped Patrol for 33 days:
An assistant message with 'tool_calls' must be followed by
tool messages responding to each 'tool_call_id'.
For Patrol this was fixable by ignoring the session. For
Assistant it isn't; the conversation context is the product.
convertToProviderMessages now ends with a repairOrphanToolCalls
pass that scans every assistant message with tool_calls and
inserts synthetic is_error tool result messages immediately
after the assistant turn for any tool_call_id that has no
matching downstream result. The synthetic content is marked
is_error=true and explains the interruption so the model can
retry the same call or proceed without that data — preserving
conversational continuity while satisfying the provider's
structural-validity check.
This guards every conversation that crosses
convertToProviderMessages, not just Assistant chat. If Patrol
ever changes back to loading session history, the same safety
net applies. If a new entry point appears for some other LLM
flow, it gets the repair for free.
Three tests guard the boundary:
- Orphan injection (3 tool_calls, only 1 result → 2
synthetic results, marked is_error with interrupted
explanation, ordering preserved)
- Clean no-op (all tool_calls fulfilled → no synthetic
messages, no is_error pollution)
- Existing truncation test still passes (assistant message
with both tool_calls and own tool_result → no repair
needed, tool_call_id matches in same message)
ai-runtime contract updated.
Workloads guest rows render — (or -) as a visual no-data signal in
every cell where a guest doesn't expose that metric (info, vmid,
disk, ip, uptime, node, image, namespace, context, backup). Each one
was a plain <span>, so screen readers narrated "dash" alongside every
cell label.
Mark every dash that conveys "no value" with aria-hidden="true" so SR
users hear the column label and skip the placeholder. Dashes that
carry an informational title attribute (e.g. "Disk stats
unavailable…") are intentionally left visible to assistive tech —
title is the accessible name and replacing it with aria-hidden would
drop real context.
Visual unchanged; tested live via DOM probe — 25 of 27 dash spans on
the Workloads page now carry aria-hidden, with the two title-bearing
dashes still announceable.
The overall-health "Recent Patrol errors" coverage factor in
summarizeRecentPatrolCoverage was anchoring the score to a
stale ratio: it counted errors across the last 10 runs without
weighting recency. After Pulse fixed two compounding Patrol
bugs today, four consecutive successful runs (50+ tool calls
each) followed six earlier failures. The assessment kept
showing C/65 with the prediction "most recent Patrol runs
encountered errors (6 of 10)" — directly contradicting the
fact that *every* recent run had succeeded.
Operators reading that score would conclude Pulse Patrol is
still broken. It isn't. The fix dragged the grade.
This commit adds a recovery-suppression check: count trailing
successful full Patrol runs from the most-recent end of the
window (GetAll returns newest-first), skipping non-full runs.
When three or more consecutive trailing successes exist —
roughly a 9-hour clean stretch at the default 3-hour cadence —
the error penalty drops entirely. The score reflects current
reality.
Three is conservative: a single recovery run could be a
transient win; three consecutive demonstrate the underlying
fix is sticking. Below the threshold, the existing ratio-tiered
penalty still applies so partially-recovered states still
register.
Two tests guard the boundary:
- 6 historical errors + 3 trailing successes → no coverage
factor (suppressed)
- 6 historical errors + 2 trailing successes → coverage
factor remains (recovery incomplete)
Live verified after this commit lands: the assessment that's
been stuck at C/65 since the malformed-history fix will
recompute to A/B grade as soon as the trailing 3 successful
runs are recognized by the same recent-runs query.
ai-runtime contract updated.
The Pulse Assistant briefing and prompt for a powered-off alert
rendered "Current value 0.0%; threshold 0.0%" because the backend
sends value=0 and threshold=0 for state alerts (which have no
metric semantics). That line is misleading to the operator and
gives the LLM no useful signal.
Adds isMetricAlertType / isStateAlertType helpers to
frontend-modern/src/utils/alerts.ts naming the state-alert set
(powered-off, unreachable, offline, host-offline, connectivity,
docker-container-state, docker-container-health,
docker-host-offline). State alerts represent binary or enumerated
conditions, not metric threshold crossings.
The alert handoff builder routes through that helper:
- Briefing detailLines omit the value/threshold line when the
alert is a state alert.
- Prompt omits the **Current Value:** and **Threshold:** lines.
- Prompt now includes **Message:** so the actual signal is
surfaced (was previously dropped from the prompt).
- Prompt step 2 swaps "Check related metrics" for "Check what
changed recently for this resource (state events, recent
commands, related alerts)" — the right question for a
binary-state alert.
Two new tests cover the state-alert and metric-alert branches.
The reporting feature now ships across two surfaces (PDF/CSV export
and pulse_summarize chat tool) and three modes (single-resource,
fleet, summarize). Without usage telemetry we can't tell whether the
work earns its place — operator demand, AI-vs-heuristic adoption,
range/format preferences are all invisible. Stops further feature
investment from being pure speculation.
Three new info-level log events, structured so an agent can grep
transcripts and group by dimension without a separate metrics
pipeline (matches the "agent owns ops analysis, human gets outcomes"
posture in MEMORY.md):
reporting.single.generated — single-resource PDF/CSV
reporting.fleet.generated — multi-resource fleet PDF/CSV
reporting.summarize.invoked — pulse_summarize chat tool (both modes)
Common dimensions: org_id, format/action, range, ai_configured,
findings_configured, window_start/end. Single-resource adds
resource_type + metric_type + bytes; fleet adds resource_count +
bytes; summarize adds resource_type + resource_count (fleet mode) +
narrative_source (so we can audit AI-fallback rate).
Includes rangeLabel() helper that maps a window to the canonical
catalog range token (24h/7d/30d) with a 1h tolerance, falling back
to "<hours>h" so non-standard windows still group. Tested.
TestReportingTelemetryEventNames pins the canonical event names as
a contract — an agent grepping logs depends on them being stable;
changing them silently would break audit tooling on the consumer
side.
The reporting engine already logs the resolved narrative source
(heuristic/ai) at debug level via the existing "Generating report"
line, useful for diagnosing why a specific report fell back. Kept
at debug; the new info-level events cover the operator surface.
Follow-up to 03463c1bf which threaded the per-tenant report
narrators through chat.Config -> tools.ExecutorConfig ->
PulseToolExecutor so pulse_summarize can produce AI-narrated
synthesis in chat instead of heuristic-only. ai-runtime.md's
Current State paragraph documents the wiring:
- chat.Config carries three optional fields (ReportNarrator,
ReportFleetNarrator, ReportFindingsProvider) threaded through
to the executor at session construction time.
- The router installs a SetReportNarratorResolver closure that
mirrors the reporting handler's pattern, asking the
AISettingsHandler for the per-tenant ai.Service and returning
it as the implementation for all three roles when AI is
enabled.
- Unconfigured tenants still get the heuristic fallback —
matching the report PDF's graceful-degradation posture.
- AI-narrated chat synthesis uses the same provider, sanitizer,
model selection, cost ledger (report_narrative /
report_narrative_fleet use-cases), and budget gate the report
PDF endpoint enforces, so there is exactly one canonical
synthesis path for both surfaces.
v1 of pulse_summarize (1fe5d6853) shipped with heuristic narrative
only. The follow-up wiring promised in that commit now lands: the
chat session carries optional report-narration providers that the
tool's handler reads when building requests, so AI-narrated synthesis
flows into chat using the same provider, sanitizer, model selection,
cost ledger, and budget gate the report PDF endpoint already uses.
Pipeline:
- pkg/reporting Narrator / FleetNarrator / FindingsProvider interfaces
are already implemented by internal/ai.Service. No new
implementations.
- tools.ExecutorConfig + PulseToolExecutor gain three optional fields
(ReportNarrator, ReportFleetNarrator, ReportFindingsProvider).
Clone() copies them so per-session executors inherit the wiring.
- chat.Config gains the same three fields; NewService threads them
into ExecutorConfig.
- tools_summarize.go reads e.reportNarrator/FleetNarrator/
FindingsProvider and populates MetricReportRequest /
MultiReportRequest. The engine already accepts these on the request
and falls back to heuristic when they are nil — no engine changes
needed.
- AIHandler gains SetReportNarratorResolver(ctx -> narrators); both
per-tenant and default chat.Config construction sites invoke the
resolver. Router wires the resolver to AISettingsHandler.GetAIService
with the same Enabled-gate the reporting handler uses.
Unconfigured tenants are unchanged: the resolver returns nil, the
tool returns heuristic narrative — identical to today. Configured
tenants get AI synthesis in chat that matches what their report PDF
already carries, billed and budget-gated the same way.
Backup-failed was flapping detected → auto-resolved → re-detected
ten times in a single day. Each cycle the LLM saw "PBS backups
look healthy in my current snapshot" during a Patrol pass, called
patrol_resolve_finding(backup-failed), and the adapter at
patrol_findings.go:985 called Resolve(findingID, true) directly —
no category check, no evidence verification.
The contract docs at findings.go:52-67 explicitly say event /
persistent categories (backup, reliability, security, general)
"stay active until explicitly resolved — either by the LLM calling
patrol_resolve_finding with evidence, or by operator action." That
"with evidence" was never enforced.
This commit enforces it. The adapter now checks two conditions
before honoring an LLM resolve:
- finding.Category does NOT support stale-auto-resolve (per the
contract function CategorySupportsStaleAutoResolve), AND
- a deterministic verifier exists for finding.Key (currently
smart-failure and backup-failed)
When both are true, the adapter runs VerifyFixResolved on the
finding's resource. If the verifier still detects the failure
signal, the LLM gets an error explaining why the resolve was
rejected and that the underlying issue must be fixed first. If
the verifier confirms the signal has cleared, the resolve
proceeds with grounded evidence.
Categories that support stale-auto-resolve (performance, capacity)
bypass the gate entirely — the LLM can resolve them based on
absence per the existing contract. Keys without a verifier also
fall through to current behavior so we don't block resolves for
categories we haven't built verifiers for yet.
New PatrolService.hasDeterministicVerifierForKey() helper keeps
the gate's verifier list in lockstep with the switch in
verifyFixDeterministically.
Tests cover the three branches:
- performance category → gate skipped, resolve proceeds
- reliability + no verifier → gate falls through, resolve proceeds
- hasDeterministicVerifierForKey for known and unknown keys
ai-runtime contract updated.
Follow-up to e32d4ede4 (NarrativeFor + FleetNarrativeFor entrypoints)
and 1fe5d6853 (pulse_summarize tool). api-contracts.md gains a Current
State paragraph documenting that the Engine interface now exposes two
non-rendering entry points alongside Generate/GenerateMulti, with the
explicit invariant that test stubs implementing the interface must
implement these methods so the contract is honoured across the entire
surface, not just the export-shaped subset.
ai-runtime.md was updated in the parallel-agent commit ee2de2703
(which picked up the pulse_summarize paragraph when restating
auto-resolve gating), so no further edit is needed there.
The reporting synthesis layer (observations, recommendations,
outliers, period comparison) shipped trapped behind the PDF/CSV
export. Operators who chat with Assistant could not ask "what's been
happening with pve1 this week" — the data path existed but had no
non-PDF surface. This commit adds a single new tool, pulse_summarize,
that wraps the engine's non-rendering entry points (NarrativeFor /
FleetNarrativeFor) so that question gets answered in chat.
The tool takes an action parameter (resource | fleet) and routes
accordingly:
- resource mode requires resource_type + resource_id and returns the
same Narrative the single-resource report carries (health status,
observations, recommendations, period comparison).
- fleet mode requires resource_type + a comma-separated resource_ids
string (PropertySchema does not currently support array items, and
CSV is LLM-friendly enough) and returns the FleetNarrative
(outliers, patterns, recommendations). Capped at the same
multi-report ceiling (50) as the API endpoint.
The tool is read-only — no control level requirement, no approval
gate — and uses the global reporting engine the rest of the app
already shares. Returns a JSON envelope so chat can render it or
hand it back to the model for follow-up framing.
v1 ships with heuristic narrative only. The AI narrator wiring
through the chat session (Narrator/FleetNarrator/FindingsProvider
threaded via chat.Config -> tools.ExecutorConfig -> PulseToolExecutor)
is a focused follow-up; it lets the same tool inherit the per-tenant
AI service the report PDF endpoint already uses. The seam is
already in place because NarrativeFor/FleetNarrativeFor take an
optional narrator on the request — v1 passes nil, v2 populates it.
reconcileStaleFindings (commit b44d5892f) and the resource-absent
gate added in commit d6bb89a1c both use the same
CategorySupportsStaleAutoResolve helper, but the contract Current
State only documented the first path. Rewords the paragraph so
both are covered explicitly: stale-cleanup and resource-absent
both gate on the whitelist, both reject event/persistent
categories, and the bogus-absence examples extend to cover the
resource-absent failure mode (transient agent reconnect,
container churn, refresh gap).
A handful of helper texts, descriptions, and side counts used
text-slate-500 directly, so they didn't pick up the contrast fix in
e4f38d5. Switch each body-text caller to text-muted: AI settings
dialog helper text, AI provider helper text and link rows, ResourcePicker
empty-state copy / resource IDs / "+N more tags" / "N selected" footer,
AIModelSelectionSection "(loading...)" tag, ConfiguredNodeTables cluster
node count, and the PatrolIntelligenceHeader plan-restriction note.
Live contrast measurement after the change: every visible muted-style
text on the Patrol page now reads between 6.92 and 7.58 on its
resolved background — well above the WCAG AA 4.5 floor it was missing
on bg-surface-alt.
Icon-tint usages of text-slate-500 (Lucide icons, chevron rotations,
hover-state controls) are left as-is — those are deliberate color
choices, not muted-text intent.
text-muted = slate-500 against bg-surface-alt = slate-100 measured a
4.34 contrast ratio — below the WCAG AA threshold of 4.5 for normal
text. Move to slate-600. New ratios: 6.92 on bg-surface-alt and 7.58
on bg-surface — both pass comfortably. Dark mode already passed at
5.09 and stays untouched.