Workloads guest rows render — (or -) as a visual no-data signal in
every cell where a guest doesn't expose that metric (info, vmid,
disk, ip, uptime, node, image, namespace, context, backup). Each one
was a plain <span>, so screen readers narrated "dash" alongside every
cell label.
Mark every dash that conveys "no value" with aria-hidden="true" so SR
users hear the column label and skip the placeholder. Dashes that
carry an informational title attribute (e.g. "Disk stats
unavailable…") are intentionally left visible to assistive tech —
title is the accessible name and replacing it with aria-hidden would
drop real context.
Visual unchanged; tested live via DOM probe — 25 of 27 dash spans on
the Workloads page now carry aria-hidden, with the two title-bearing
dashes still announceable.
The overall-health "Recent Patrol errors" coverage factor in
summarizeRecentPatrolCoverage was anchoring the score to a
stale ratio: it counted errors across the last 10 runs without
weighting recency. After Pulse fixed two compounding Patrol
bugs today, four consecutive successful runs (50+ tool calls
each) followed six earlier failures. The assessment kept
showing C/65 with the prediction "most recent Patrol runs
encountered errors (6 of 10)" — directly contradicting the
fact that *every* recent run had succeeded.
Operators reading that score would conclude Pulse Patrol is
still broken. It isn't. The fix dragged the grade.
This commit adds a recovery-suppression check: count trailing
successful full Patrol runs from the most-recent end of the
window (GetAll returns newest-first), skipping non-full runs.
When three or more consecutive trailing successes exist —
roughly a 9-hour clean stretch at the default 3-hour cadence —
the error penalty drops entirely. The score reflects current
reality.
Three is conservative: a single recovery run could be a
transient win; three consecutive demonstrate the underlying
fix is sticking. Below the threshold, the existing ratio-tiered
penalty still applies so partially-recovered states still
register.
Two tests guard the boundary:
- 6 historical errors + 3 trailing successes → no coverage
factor (suppressed)
- 6 historical errors + 2 trailing successes → coverage
factor remains (recovery incomplete)
Live verified after this commit lands: the assessment that's
been stuck at C/65 since the malformed-history fix will
recompute to A/B grade as soon as the trailing 3 successful
runs are recognized by the same recent-runs query.
ai-runtime contract updated.
The Pulse Assistant briefing and prompt for a powered-off alert
rendered "Current value 0.0%; threshold 0.0%" because the backend
sends value=0 and threshold=0 for state alerts (which have no
metric semantics). That line is misleading to the operator and
gives the LLM no useful signal.
Adds isMetricAlertType / isStateAlertType helpers to
frontend-modern/src/utils/alerts.ts naming the state-alert set
(powered-off, unreachable, offline, host-offline, connectivity,
docker-container-state, docker-container-health,
docker-host-offline). State alerts represent binary or enumerated
conditions, not metric threshold crossings.
The alert handoff builder routes through that helper:
- Briefing detailLines omit the value/threshold line when the
alert is a state alert.
- Prompt omits the **Current Value:** and **Threshold:** lines.
- Prompt now includes **Message:** so the actual signal is
surfaced (was previously dropped from the prompt).
- Prompt step 2 swaps "Check related metrics" for "Check what
changed recently for this resource (state events, recent
commands, related alerts)" — the right question for a
binary-state alert.
Two new tests cover the state-alert and metric-alert branches.
The reporting feature now ships across two surfaces (PDF/CSV export
and pulse_summarize chat tool) and three modes (single-resource,
fleet, summarize). Without usage telemetry we can't tell whether the
work earns its place — operator demand, AI-vs-heuristic adoption,
range/format preferences are all invisible. Stops further feature
investment from being pure speculation.
Three new info-level log events, structured so an agent can grep
transcripts and group by dimension without a separate metrics
pipeline (matches the "agent owns ops analysis, human gets outcomes"
posture in MEMORY.md):
reporting.single.generated — single-resource PDF/CSV
reporting.fleet.generated — multi-resource fleet PDF/CSV
reporting.summarize.invoked — pulse_summarize chat tool (both modes)
Common dimensions: org_id, format/action, range, ai_configured,
findings_configured, window_start/end. Single-resource adds
resource_type + metric_type + bytes; fleet adds resource_count +
bytes; summarize adds resource_type + resource_count (fleet mode) +
narrative_source (so we can audit AI-fallback rate).
Includes rangeLabel() helper that maps a window to the canonical
catalog range token (24h/7d/30d) with a 1h tolerance, falling back
to "<hours>h" so non-standard windows still group. Tested.
TestReportingTelemetryEventNames pins the canonical event names as
a contract — an agent grepping logs depends on them being stable;
changing them silently would break audit tooling on the consumer
side.
The reporting engine already logs the resolved narrative source
(heuristic/ai) at debug level via the existing "Generating report"
line, useful for diagnosing why a specific report fell back. Kept
at debug; the new info-level events cover the operator surface.
Follow-up to 03463c1bf which threaded the per-tenant report
narrators through chat.Config -> tools.ExecutorConfig ->
PulseToolExecutor so pulse_summarize can produce AI-narrated
synthesis in chat instead of heuristic-only. ai-runtime.md's
Current State paragraph documents the wiring:
- chat.Config carries three optional fields (ReportNarrator,
ReportFleetNarrator, ReportFindingsProvider) threaded through
to the executor at session construction time.
- The router installs a SetReportNarratorResolver closure that
mirrors the reporting handler's pattern, asking the
AISettingsHandler for the per-tenant ai.Service and returning
it as the implementation for all three roles when AI is
enabled.
- Unconfigured tenants still get the heuristic fallback —
matching the report PDF's graceful-degradation posture.
- AI-narrated chat synthesis uses the same provider, sanitizer,
model selection, cost ledger (report_narrative /
report_narrative_fleet use-cases), and budget gate the report
PDF endpoint enforces, so there is exactly one canonical
synthesis path for both surfaces.
v1 of pulse_summarize (1fe5d6853) shipped with heuristic narrative
only. The follow-up wiring promised in that commit now lands: the
chat session carries optional report-narration providers that the
tool's handler reads when building requests, so AI-narrated synthesis
flows into chat using the same provider, sanitizer, model selection,
cost ledger, and budget gate the report PDF endpoint already uses.
Pipeline:
- pkg/reporting Narrator / FleetNarrator / FindingsProvider interfaces
are already implemented by internal/ai.Service. No new
implementations.
- tools.ExecutorConfig + PulseToolExecutor gain three optional fields
(ReportNarrator, ReportFleetNarrator, ReportFindingsProvider).
Clone() copies them so per-session executors inherit the wiring.
- chat.Config gains the same three fields; NewService threads them
into ExecutorConfig.
- tools_summarize.go reads e.reportNarrator/FleetNarrator/
FindingsProvider and populates MetricReportRequest /
MultiReportRequest. The engine already accepts these on the request
and falls back to heuristic when they are nil — no engine changes
needed.
- AIHandler gains SetReportNarratorResolver(ctx -> narrators); both
per-tenant and default chat.Config construction sites invoke the
resolver. Router wires the resolver to AISettingsHandler.GetAIService
with the same Enabled-gate the reporting handler uses.
Unconfigured tenants are unchanged: the resolver returns nil, the
tool returns heuristic narrative — identical to today. Configured
tenants get AI synthesis in chat that matches what their report PDF
already carries, billed and budget-gated the same way.
Backup-failed was flapping detected → auto-resolved → re-detected
ten times in a single day. Each cycle the LLM saw "PBS backups
look healthy in my current snapshot" during a Patrol pass, called
patrol_resolve_finding(backup-failed), and the adapter at
patrol_findings.go:985 called Resolve(findingID, true) directly —
no category check, no evidence verification.
The contract docs at findings.go:52-67 explicitly say event /
persistent categories (backup, reliability, security, general)
"stay active until explicitly resolved — either by the LLM calling
patrol_resolve_finding with evidence, or by operator action." That
"with evidence" was never enforced.
This commit enforces it. The adapter now checks two conditions
before honoring an LLM resolve:
- finding.Category does NOT support stale-auto-resolve (per the
contract function CategorySupportsStaleAutoResolve), AND
- a deterministic verifier exists for finding.Key (currently
smart-failure and backup-failed)
When both are true, the adapter runs VerifyFixResolved on the
finding's resource. If the verifier still detects the failure
signal, the LLM gets an error explaining why the resolve was
rejected and that the underlying issue must be fixed first. If
the verifier confirms the signal has cleared, the resolve
proceeds with grounded evidence.
Categories that support stale-auto-resolve (performance, capacity)
bypass the gate entirely — the LLM can resolve them based on
absence per the existing contract. Keys without a verifier also
fall through to current behavior so we don't block resolves for
categories we haven't built verifiers for yet.
New PatrolService.hasDeterministicVerifierForKey() helper keeps
the gate's verifier list in lockstep with the switch in
verifyFixDeterministically.
Tests cover the three branches:
- performance category → gate skipped, resolve proceeds
- reliability + no verifier → gate falls through, resolve proceeds
- hasDeterministicVerifierForKey for known and unknown keys
ai-runtime contract updated.
Follow-up to e32d4ede4 (NarrativeFor + FleetNarrativeFor entrypoints)
and 1fe5d6853 (pulse_summarize tool). api-contracts.md gains a Current
State paragraph documenting that the Engine interface now exposes two
non-rendering entry points alongside Generate/GenerateMulti, with the
explicit invariant that test stubs implementing the interface must
implement these methods so the contract is honoured across the entire
surface, not just the export-shaped subset.
ai-runtime.md was updated in the parallel-agent commit ee2de2703
(which picked up the pulse_summarize paragraph when restating
auto-resolve gating), so no further edit is needed there.
The reporting synthesis layer (observations, recommendations,
outliers, period comparison) shipped trapped behind the PDF/CSV
export. Operators who chat with Assistant could not ask "what's been
happening with pve1 this week" — the data path existed but had no
non-PDF surface. This commit adds a single new tool, pulse_summarize,
that wraps the engine's non-rendering entry points (NarrativeFor /
FleetNarrativeFor) so that question gets answered in chat.
The tool takes an action parameter (resource | fleet) and routes
accordingly:
- resource mode requires resource_type + resource_id and returns the
same Narrative the single-resource report carries (health status,
observations, recommendations, period comparison).
- fleet mode requires resource_type + a comma-separated resource_ids
string (PropertySchema does not currently support array items, and
CSV is LLM-friendly enough) and returns the FleetNarrative
(outliers, patterns, recommendations). Capped at the same
multi-report ceiling (50) as the API endpoint.
The tool is read-only — no control level requirement, no approval
gate — and uses the global reporting engine the rest of the app
already shares. Returns a JSON envelope so chat can render it or
hand it back to the model for follow-up framing.
v1 ships with heuristic narrative only. The AI narrator wiring
through the chat session (Narrator/FleetNarrator/FindingsProvider
threaded via chat.Config -> tools.ExecutorConfig -> PulseToolExecutor)
is a focused follow-up; it lets the same tool inherit the per-tenant
AI service the report PDF endpoint already uses. The seam is
already in place because NarrativeFor/FleetNarrativeFor take an
optional narrator on the request — v1 passes nil, v2 populates it.
reconcileStaleFindings (commit b44d5892f) and the resource-absent
gate added in commit d6bb89a1c both use the same
CategorySupportsStaleAutoResolve helper, but the contract Current
State only documented the first path. Rewords the paragraph so
both are covered explicitly: stale-cleanup and resource-absent
both gate on the whitelist, both reject event/persistent
categories, and the bogus-absence examples extend to cover the
resource-absent failure mode (transient agent reconnect,
container churn, refresh gap).
A handful of helper texts, descriptions, and side counts used
text-slate-500 directly, so they didn't pick up the contrast fix in
e4f38d5. Switch each body-text caller to text-muted: AI settings
dialog helper text, AI provider helper text and link rows, ResourcePicker
empty-state copy / resource IDs / "+N more tags" / "N selected" footer,
AIModelSelectionSection "(loading...)" tag, ConfiguredNodeTables cluster
node count, and the PatrolIntelligenceHeader plan-restriction note.
Live contrast measurement after the change: every visible muted-style
text on the Patrol page now reads between 6.92 and 7.58 on its
resolved background — well above the WCAG AA 4.5 floor it was missing
on bg-surface-alt.
Icon-tint usages of text-slate-500 (Lucide icons, chevron rotations,
hover-state controls) are left as-is — those are deliberate color
choices, not muted-text intent.
text-muted = slate-500 against bg-surface-alt = slate-100 measured a
4.34 contrast ratio — below the WCAG AA threshold of 4.5 for normal
text. Move to slate-600. New ratios: 6.92 on bg-surface-alt and 7.58
on bg-surface — both pass comfortably. Dark mode already passed at
5.09 and stays untouched.
Two visible issues in the Pulse Assistant drawer when opened on a
Patrol finding via "Discuss with Assistant":
1. The Attention line rendered the last-regression timestamp as a
raw ISO string with microseconds and timezone offset
("last regression 2026-05-10T22:02:11.519513+01:00"). The rest
of the UI uses relative time and the briefing copy was the
outlier. The LLM consuming the briefing handles "24 mins ago"
just as well as a raw timestamp, and the structured handoff
metadata still carries precise timestamps for any caller that
needs them.
2. The Attention line ended with "loop detected" on every active
finding because loop_state=detected is the default initial
state for any active finding. Rendering "loop detected" added
no information — only meaningful loop states (awaiting approval,
remediation failed, timed out, etc.) need surfacing in the
attention reason.
Adds a formatBriefingTimestamp helper that wraps formatRelativeTime
with sensible defaults and routes all three "last regression"
sites through it. Updates the briefing-test fixture to pin the
system clock so the relative-time assertion is deterministic
against the fixed regression timestamp.
The loopback gate from 586473ee3 rejected non-loopback setup requests
before the bootstrap-token check could run, so a Proxmox-LXC install
(install script prints URL + token; user opens URL on workstation,
pastes token) hit "only available from localhost" even with the correct
token. The token is the security boundary — only callers with
filesystem access to the data dir can read it — so a valid token now
authorizes setup from any origin. No-token requests still require
direct loopback.
Updates the two contract/setup tests that pinned the old behavior.
Fixes discussion #1459.
Each card in the Workloads guest drawer (System, Guest Info, Memory,
Backup, Tags, Filesystems, Network) was a plain <div> with uppercase
styling. They are subsections of the drawer's existing <h2>, so make
them <h3>. Visual styling is identical — same Tailwind classes — only
the tag changes. Screen-reader users now get a navigable heading
outline inside the drawer.
The reporting engine's synthesis layer was reachable only through
Generate/GenerateMulti, which always rendered PDF or CSV. Pulse
Assistant needs the same retrospective synthesis (per-resource
summary, fleet outliers, period comparison) in a form it can present
in chat, not as a downloaded artifact.
Add two non-rendering entry points to the Engine interface:
NarrativeFor(req MetricReportRequest) (*Narrative, error)
FleetNarrativeFor(req MultiReportRequest) (*FleetNarrative, error)
Both run the same query path and the same narrator resolution as their
rendering counterparts (heuristic by default, AI when the request
supplies a narrator, fail-closed-to-heuristic on any narrator error)
and return the structured narrative without invoking the fpdf/csv
output stage. Test stubs in pkg/reporting and internal/api are
updated to implement the extended interface.
These are the seams the upcoming pulse_summarize Assistant tools wrap
to answer questions like "what's hot on pve1 this week" or "where
should I look across my fleet" without round-tripping through report
generation. Same synthesis layer, no PDF involved.
Also fixes a pre-existing flake in TestEngineGenerate_UsesSuppliedNarrator
(metrics writes are async; the first Generate sometimes ran before
the raw tier flushed). Wrapped in the same eventually-pattern used by
the prior-period and findings-provider tests.
Follow-up to the narrator prompt changes that forbid acting as a
parallel detector. ai-runtime.md gains a Current State paragraph
documenting that both report narrator system prompts encode an
explicit detection-boundary invariant: warning or critical severity
classifications must be backed by a Patrol finding, an alert, or a
hard-threshold breach in the structured input, not by metric
inference alone. Patterns the narrator notices without that backing
are constrained to info severity. This keeps the narrative
retrospective on Patrol's work and prevents silent
shadow-classification competing with Patrol's detection rules.
api-contracts.md gains an equivalent paragraph in the reporting
contract section. The same rule applies to outliers and patterns
on the fleet path, and to recommendations on both. The deterministic
heuristic narrators were already constrained to the same threshold
rules; this aligns the AI path with the same evidence surface the
fallback uses, so the report PDF cannot become a back-door
detection surface that diverges from the findings store.
The single-resource and fleet narrator prompts both grounded their
claims in structured data, but neither prevented the model from
classifying observations at warning or critical severity based on
metric inference alone. That left a subtle gap: an AI narrator
noticing memory creep across a window could promote it to warning
even when Patrol — the canonical detection layer — had not flagged
it. That competes with Patrol rather than summarizing its work,
and it lets the report PDF silently shadow-classify in a way that
diverges from the findings store.
Add an explicit detection-boundary instruction to both prompts:
warning or critical severity may only be assigned when backed by
a Patrol finding, an alert, or a hard-threshold breach visible in
the input (cpu max > 90, memory avg > 85, disk avg > 85, failed
or high-wear disks, storage pools at >= 90%). Patterns the model
sees in metric data without that backing are constrained to info
severity. Recommendations follow the same rule. The narrative
remains a retrospective summary of Patrol's classified state, not
a parallel classifier.
This is a prompt-only change. The deterministic data surface and
the heuristic fallback narrator are unaffected; the heuristic
narrators already classify only on the same threshold rules listed
above, so the AI narrator is now constrained to the same evidence
surface its fallback uses.
Service.Narrate (b84b87d8d) and Service.NarrateFleet (d4463a615)
fixed missing cost-record calls in the report-narrative path. Auditing
the rest of internal/ai for the same bug class found one more:
Service.QuickAnalysis. It is used for alert auto-resolve and similar
lightweight decisions, so production token spend on auto-resolve
analysis was invisible in the AI usage dashboard.
Mirror the same fix: capture costStore under the read lock alongside
provider/cfg, and after provider.Chat returns, record a UsageEvent
labelled with the request's UseCase (defaulting to "quick_analysis"
when the caller leaves it blank). Recording happens before the
empty-content guard so failed-but-billed calls are still visible.
Adds cost_recording_audit_test.go: an AST-level audit that walks
internal/ai/*.go (excluding _test.go and sub-packages), finds every
function calling .Chat() on a providers.Provider value, and asserts
each function body also references .Record() on a cost store.
Exemption is allowed via a //cost-recording-exempt: <reason> doc
comment. RunPatrolToolPreflight is annotated as exempt — it is a
connectivity self-test, not user workload, and should not pollute
the operator's cost dashboard.
The audit is intentionally local (function-scoped, not
interprocedural). A passthrough wrapper that calls a recording
function rather than calling Record itself would need an explicit
exemption naming the wrapped callee. Keeping the scan local makes
new Chat callers loud rather than letting silent gaps creep in via
indirection.
Future Chat callers must either record cost or carry the exemption
marker. The audit fails CI otherwise, so the regression that shipped
in b2bd9d114 (Narrate) and would have shipped again in d4463a615
(fleet) cannot recur silently.
Initial detector (commit 942f9ca0f) only matched on the two legacy
absence-signature reason strings — but the Backup failed finding
on the live preview showed 6 auto_resolved events all with empty
messages, produced by the LLM patrol_resolve_finding tool via
Resolve(_, true). Counter stayed at 6× after the previous
migration ran.
New detector: any active finding whose category is NOT eligible
for stale-auto-resolve (i.e. anything other than performance or
capacity) AND has any auto_resolved event on its lifecycle is
treated as having an inflated counter. The rationale is the same
rule the category gate already established — for event/persistent
categories there is no legitimate absence-driven resolution path,
so any auto_resolved was either a removed-bogus-path stamp or an
LLM judgment call that repeatedly reverted through regressions on
the next run. The cumulative count is no signal either way.
Performance/capacity findings retain their counter because the
metric-cleared resolution model is sound there.
Test extended to cover four cases: LLM-driven cycle resets,
legacy-reason cycle resets, eligible-category preserves counter,
non-eligible category without any auto_resolved preserves counter.
Plus the existing idempotency case (already-reset finding stays
reset and is not re-applied).
The Backup failed finding on the live preview showed "regressed 6×"
when the actual regression count of genuine recurrences was at
most 1 or 2 — the rest were the system fighting itself, driven by
the absence-based auto_resolve paths that were gated (category
whitelist) or removed (alert-mirror rip) earlier in this branch.
Counter stayed sticky after those fixes landed, so the trust strip
and finding badges still surfaced the inflated number.
FindingsStore.SetPersistence load pass now scans each active
finding's lifecycle for the two known bogus-signature auto_resolved
reasons ("No longer detected by patrol", "Resource no longer
exists in infrastructure"). If found, RegressionCount is reset to
0 and LastRegressionAt is cleared, and a regression_counter_reset
lifecycle event is appended so the migration is idempotent. A
finding that already has a regression_counter_reset event is left
alone; any regressed events that accrued after the reset are
genuine and stand.
findingHasBogusAutoResolveCycle returns true only when the
lifecycle contains a bogus auto_resolved and no prior reset event,
so the function is the single point of truth for the migration
decision and is straightforward to test. Test covers three cases:
finding with bogus signature gets reset, finding with empty-message
auto_resolved (LLM-driven, legitimate) keeps its counter, finding
already migrated is not re-reset.
Updates ai-runtime Current State to document the second migration
on top of the alert-mirror retirement.
The previous commit removed the detectAlertSignals path so no NEW
alert-mirror findings are emitted, but the findings already
persisted from earlier builds stay in the store indefinitely —
nothing cleans them up (reconcileStaleFindings is gated on
performance/capacity categories, the LLM resolves them just to
have them re-detected next run except now the deterministic
emitter is gone so re-detection can't happen, but they're left
sitting as active findings draining the trust strip and score).
FindingsStore.SetPersistence now runs a one-shot retirement pass
on load: any active finding with title "Active alert detected",
source ai-analysis, and category general is auto-resolved with
reason "Patrol no longer mirrors alerts; the Alerts page is the
canonical surface for currently-firing alerts." The pass appends
an auto_resolved lifecycle event so the retirement is auditable,
syncs the loop state to resolved, and schedules a save so the
cleanup persists.
Idempotent: after the first load with this code, no findings
match the signature so the pass is a no-op. Defensive: the
signature requires all three fields (title + source + category)
to match before retiring, so an operator-authored finding that
happens to share the title is left untouched. Test covers the
mirror case, the matching-title-but-foreign-source case (must
NOT retire), and an unrelated active finding (must NOT retire),
plus verifies the retired state persists back through the
persistence layer.
Updates ai-runtime Current State to record the migration path.
The deterministic signal pipeline ran the pulse_alerts tool output
through detectAlertSignals and produced a SignalActiveAlert for
every firing alert, which Patrol then materialized as an
"Active alert detected" finding (source: ai-analysis, category:
general). The system prompt at the top of patrol_ai.go explicitly
tells the LLM not to duplicate alerts — but the deterministic
emitter was duplicating them anyway, behind the LLM's back.
Symptoms observed in the wild:
- 9 active "Active alert detected" findings in Patrol, every one a
duplicate of an existing alert already on the Alerts page.
- The LLM, doing what the prompt told it, resolved each mirrored
finding via patrol_resolve_finding. Next run the alert was still
firing and Patrol re-emitted the signal → finding regressed.
Lifecycle showed several auto_resolved → re-detected → regressed
cycles per finding within hours.
- Health score dragged down by issues the operator already saw on
the Alerts page, with no operator action possible from Patrol
that wasn't already available from Alerts.
Rip detectAlertSignals entirely, remove the pulse_alerts case from
the signal-extraction switch, drop SignalActiveAlert plus its key
/ title / recommendation entries. Convert the prior
TestDetectSignals_ActiveAlert into a regression guard that locks
in the no-mirror behavior.
Updates the ai-runtime subsystem Current State to record the
decision: Patrol does not duplicate the Alerts surface; alerts
own their own lifecycle, surface, and acknowledgement model.
Follow-up to the fleet-level AI narrative refactor: ai-runtime.md
gains a Current State paragraph documenting that internal/ai.Service
also implements pkg/reporting.FleetNarrator with its own use-case
label (report_narrative_fleet) so fleet vs single-resource spend is
distinguishable in the cost ledger and budget gate, and that the
single-resource narrator is intentionally not propagated through the
multi-report path. api-contracts.md gains a paragraph documenting
the new optional FleetNarrator field on MultiReportRequest, the new
FleetNarrative field on MultiReportData, the rendered fleet section
(executive prose, named outliers, cross-cutting patterns,
recommendations, optional period-comparison, AI provenance footer),
and the explicit invariant that the deterministic resource summary
table stays rendered from the same per-resource aggregates so every
named outlier is verifiable against the table below.
Dependent subsystems (agent-lifecycle, performance-and-scalability,
storage-recovery) remain unchanged: their Extension Points reference
internal/api/ broadly but agent lifecycle, perf scaling, and storage
recovery semantics have no delta from this change.
The single-resource AI narrative landed in b2bd9d114 but multi-resource
fleet reports stayed heuristic-only. That left a gap on the exact axis
where AI helps most: a 50-resource fleet PDF is where synthesis is the
difference between useful and unread.
Introduce FleetNarrator as a separate interface from Narrator. The
input shapes are different — single-resource takes one set of metric
stats with a prior window, fleet takes a denormalised cross-resource
view with per-resource summaries plus a fleet aggregate.
HeuristicFleetNarrator owns the deterministic fallback: ranks
resources by severity (critical alerts > unhealthy disks > storage
pressure > memory > CPU > non-critical alerts), picks up to 5
outliers, derives cross-cutting patterns by counting how many of N
resources share a hot signal, and emits fleet-scoped recommendations.
internal/ai.Service implements FleetNarrator through
report_fleet_narrator.go. Distinct use-case label
(report_narrative_fleet) so fleet vs single-resource spend is
separable in the cost ledger and budget gate. The fleet payload is
denormalised through buildReportFleetPayload so prompt cost scales
linearly with fleet size. Same fail-closed invariant — nil provider,
parse failure, or context cancellation falls through to the heuristic.
Single-resource Narrator is intentionally NOT propagated through
engine.GenerateMulti: a 50-resource fleet report performs one AI call
(fleet narrator), not 51. The router resolver returns the AI service
for all three roles (Narrator, FleetNarrator, FindingsProvider).
The fleet PDF renders the FleetNarrative in the fleet summary cover
when present: executive prose, named outliers with severity-coloured
bullets, cross-cutting patterns, recommendations, optional period
comparison, and an AI provenance footer. The deterministic resource
summary table is preserved above so every named outlier is verifiable
against the table immediately below it. Legacy "Highest CPU / Most
alerts" bullets remain as the fallback when no FleetNarrative is
attached.
The toast container had no live-region wiring, so screen readers
missed every notification — success or failure. Make the container a
named region landmark, and give each toast a role/aria-live derived
from its type: error and warning use role="alert" + assertive (they
interrupt), success and info use role="status" + polite. aria-atomic
ensures the whole toast (title plus message) is read as one unit.
Every drawer (Workloads guest, PMG instance, K8s namespaces and
deployments, Swarm services, generic resource detail) wrapped its body
in a plain <div>, so screen-reader users had no landmark to jump to
and no announced name when a row expanded. Convert each wrapping
<div> to <section aria-labelledby> with a heading inside.
For the generic resource detail drawer, the existing visible name
becomes an <h2> (visual styling unchanged via m-0). For the other
drawers, a screen-reader-only <h2> carries the entity name (guest
name, PMG hostname, cluster name) so the landmark is named without
visible duplication.
Service.Narrate consumed provider tokens without recording a
cost.UsageEvent, so AI-narrated reports were invisible in the operator
cost ledger. Every other Service call site in the AI runtime records
cost; the narrator omitted it.
Mirror the QuickAnalysis/chat pattern: capture the cost store under
the read lock alongside the provider/cfg snapshot, and after
provider.Chat returns, record a UsageEvent labelled
report_narrative with the resource type/id as the target. Recording
happens before parsing so a failed-but-billed call (e.g. provider
returned malformed JSON) still appears in the ledger — the operator
was billed regardless of whether we could use the response.
The use_case string lifts to a package-level constant so the budget
gate (enforceBudget), the cost label, and the dashboard taxonomy all
reference one identifier.
Follow-up to the report narrative refactor: ai-runtime.md gains a
Current State paragraph documenting that internal/ai.Service now
implements pkg/reporting.Narrator and pkg/reporting.FindingsProvider,
and that the narrator/findings surfaces inherit the same provider,
sanitizer, model selection, budget enforcement, and fail-closed
governance as the rest of the canonical AI runtime. api-contracts.md
gains a paragraph documenting the new optional Narrator and
FindingsProvider fields on MetricReportRequest, the new Narrative,
PriorPeriod, and Findings fields on ReportData, the three new PDF
sections (executive prose, Period-over-period changes, AI provenance
footer), and the explicit invariant that the deterministic data
surface (charts, stats, alerts, storage, disks) stays rendered from
the same aggregates so every AI claim is verifiable against adjacent
data. Multi-resource fleet reports intentionally remain heuristic-only
at this transport layer.
The dependent subsystems flagged by the staged-shape guard
(agent-lifecycle, performance-and-scalability, storage-recovery)
genuinely have no contract delta from the prior commit: their
Extension Points reference internal/api/ broadly, but agent
lifecycle, performance scaling, and storage recovery semantics are
unchanged.
The Recipients (one per line) textarea trimmed and filtered empty lines
on every keystroke. Pressing Enter at the end of a line ended up as a
trailing empty entry, the filter dropped it, and the controlled value
snapped back to the single line — so typing past line one was
impossible, and leading/trailing whitespace got eaten mid-type. Paste
still worked because it dropped multiple non-empty lines in one event.
Pass raw split lines up during edit and do the trim+filter inside
buildEmailConfigPayload at save time, so the textarea stops fighting
the cursor and the wire payload is still clean.
HandleListAuditEvents dropped the Query/Count error before writing the
500, so a user hitting "Failed to fetch audit events" produced no
server-side log line — diagnosing the failure was impossible without a
local repro. Log the error with the org ID so the next instance is
findable. Doesn't change the user-facing response.
processFingerprint ran v.Index(i).Interface() then reflect.ValueOf(item),
dropping addressability and making reflect.Call panic on every iteration
with "Container as type *Container". The defer/recover in
collectFingerprints swallowed it, so LXC and VM fingerprints never
landed in the store — change-detection and discovery for those resource
types have been broken since v6.
Pass the slice element's address straight through (.Addr()) so the
generator's pointer receiver gets the right type. Add a regression test
that fails if anyone goes back through .Interface().
The Configure Patrol drawer paired a master "Patrol Running" toggle
with a "Run every: Disabled" dropdown for the no-schedule case. Two
contradictory wordings on the same surface — easy to misread the
schedule label as "Patrol is disabled" when the master toggle is
actually on and triggered/manual runs are still working.
Rename the zero-interval preset to "Off" and rewrite the helper
text to spell out that manual runs and alert/anomaly triggers still
fire when the schedule is off. The change is naming only; the
underlying interval value (0) is unchanged so persisted settings
continue to round-trip.
Updates the canonical-presets test assertion to match.
The patrol run-row drill-down rendered "1 existing issue remain"
because the pluralization only toggled the noun, not the verb.
Visible whenever an errored or unchanged run still had exactly one
active finding (e.g. the Provider analysis error scenario observed
in the wild).
Adds a singular-case test alongside the existing plural one with a
regression guard against the previous wording.
The overall health score chip read "A · 95/100" while the same
assessment card said "Coverage incomplete · Recent Patrol runs
encountered errors · Verify full coverage." Those messages
contradicted because the "recent errors but had a successful run"
coverage factor used a flat -10 penalty regardless of the error
ratio. With one successful manual run among many failed startup
runs, the math stayed in the A band and the score directly
undermined the warning it sat next to.
The factor now tiers the impact by ratio of errored runs to
relevant runs in the scoring window:
>50% errored → -30, "Most recent Patrol runs encountered errors
(N of M); the current health summary is not
reliable until coverage stabilizes."
>25% errored → -20, "Recent Patrol runs encountered errors
(N of M), so the current health summary
may be incomplete."
else → -10, original light-tier description.
A single transient error stays in grade A; dominant-error periods
drop out of A so the grade matches the warning. Adds two tests
covering both ends of the new tiering and updates the ai-runtime
subsystem contract Current State section.
Performance reports rendered the Executive Summary, Observations, and
Recommendations sections from inline threshold rules in pdf.go. That
narrative looked intelligent but was static templating against alert
counts and metric percentiles, which felt off-brand alongside Patrol
and Pulse Assistant.
Introduce a Narrator interface in pkg/reporting and a FindingsProvider
counterpart that the engine consults at report time. The heuristic
rules are lifted into HeuristicNarrator unchanged so the deterministic
fallback still produces the same observations and recommendations.
The engine now also queries the comparable prior period and threads
its aggregate stats through the narrator so deltas can be expressed.
internal/ai.Service implements both interfaces via report_narrator.go
(single-turn JSON call grounded in the structured ReportData payload,
falling back to the heuristic on any error/timeout) and
report_findings.go (Patrol findings whose lifecycle overlaps the
report window). The reporting handler resolves the per-tenant AI
service when it is configured and supplies it in the request; absent
configuration, reports look identical to the prior heuristic output.
Charts, stats tables, alert lists, storage and disk sections stay
deterministic — sysadmins can verify every AI claim against the data
tables next to it. The PDF renders the AI prose between the health
card and Quick Stats, adds a Period-over-period section after
Recommendations, and prints a provenance footer when the narrative
came from the assistant.
ai-runtime.md and api-contracts.md updates land in a follow-up commit
on this branch; agent-lifecycle / performance-and-scalability /
storage-recovery have no contract delta from this change (router.go
is referenced in their Extension Points but their semantics are
unchanged).
reconcileStaleFindings was auto-resolving any seeded finding that the
LLM didn't re-report in a successful run. The function's own comment
acknowledges the LLM doesn't reliably use patrol_resolve_finding, so
this was built as a cleanup pass — but it cannot tell the difference
between "LLM correctly recognized this is fixed" and "LLM forgot to
re-mention it." For findings that represent discrete events or
persistent states (a backup task that failed, a service that
crashed, a security vulnerability that was found, a configuration
error), absence in a Patrol report is not evidence that the issue
has cleared. The result was bogus auto_resolved → re-detected →
regressed cycles, observed in the wild as "Backup failed" regressing
4× over 6 hours and "Provider analysis error" regressing 271×.
Those bogus auto-resolutions also inflated the trust strip with
fictional auto-resolved credit.
CategorySupportsStaleAutoResolve in findings.go gates the cleanup:
only `performance` and `capacity` findings — continuous current-state
metric thresholds — may be auto-resolved from absence. The other
four categories (reliability, backup, security, general) stay active
until explicitly resolved.
Updates the ai-runtime subsystem contract Current State section with
the whitelist and the adjacent lifecycle dedup rules already landed.
Adds TestReconcileStaleFindings_SkipsNonCurrentStateCategories with
table-driven subtests for all four event/persistent categories, and
TestCategorySupportsStaleAutoResolve to lock in the whitelist.
LabeledFilterSelect renders a visible <label for={id}> alongside the
<select>, but only callers that pass an explicit id get a
programmatic association — and most callers don't. Add an aria-label
fallback that reuses the visible label prop, so the select is named
regardless of caller usage. Callers can still override with an
explicit aria-label.
syncLoopStateLocked was emitting a generic "loop_state" lifecycle
event on every successful transition, duplicating the semantic
event the caller had just emitted. A finding that auto-resolved
showed two adjacent rows in the Lifecycle drawer:
Auto-resolved (detected -> resolved)
Loop state changed (detected -> resolved)
Same from/to, same timestamp, no extra information. Every
transition was paired with a duplicate.
Removed the generic loop_state emission. Every caller of
syncLoopStateLocked already emits the semantic event for the
transition it caused (auto_resolved, regressed, dismissed,
acknowledged, snoozed, suppression_lifted, reminded, etc.). The
loop_transition_violation branch stays — that's the only signal
that an invalid transition was rejected, not a duplicate.
Adds TestFindingsStore_TransitionDoesNotAlsoEmitGenericLoopStateEvent
to lock in the behavior.
Several inputs relied on placeholder text alone, leaving screen reader
users with no announced field name. Add aria-label to:
- Resource picker's tag filter input
- Webhook custom field key/value and custom header key/value inputs
(per-row indexed labels)
- AI default/chat/Patrol model fallback inputs (used when the picker
has no models to enumerate)
- AI provider credential inputs (API key or server URL, derived from
provider display name)
- Shared SearchField (defaults to title or placeholder)
- Shared TagInput entry field and per-tag remove button
Pure attribute-only change; no behavior, layout, or contract impact.
Patrol's "patrol-main" session was reused across every
scheduled run, so ExecutePatrolStream loaded the full
session history into the agentic loop's input. When any
prior run ended after the model emitted tool_calls but
before all tool results landed (provider error, timeout,
context cancellation), the orphan tool_calls were persisted
and every subsequent run inherited them. The provider then
rejected the conversation with:
An assistant message with 'tool_calls' must be followed
by tool messages responding to each 'tool_call_id'.
(insufficient tool messages following tool_calls message)
Patrol failed 271 consecutive runs with this error before
it was diagnosed. Each new run added another user prompt
on top of the broken structure, so the message slice grew
to 33+ messages with one assistant turn at position 23
holding 4 orphan tool_call_ids and 9 user prompts stacked
after it.
The "patrol runs need a clean slate" comment at line 1898
documents that the knowledge accumulator is freshened per
run; the conversation history was the matching gap.
ExecutePatrolStream now passes only this run's user prompt
to the agentic loop. The session is still written to for
audit/forensics, just no longer fed back into the model.
Live verified: Patrol now completes successfully (18 tool
calls, 3m23s) on a session that previously failed every
run with the malformed-history error. The runtime-failure
finding auto-resolved on this same successful run.
Adds a classifier bucket for "insufficient tool messages"
errors so any future regression in this area surfaces with
a meaningful diagnostic instead of the generic
"Provider analysis error" fallback. New
PatrolFailureCauseMalformedToolHistory cause; predicate
patrolMalformedToolHistory matches DeepSeek's exact phrasing
and OpenAI's similar variants.
ai-runtime contract updated.
Every Patrol scan that re-detected an already-active finding was
appending a "detected (same_state -> same_state)" lifecycle event
with the message "Detected by Pulse Patrol". A finding active for
6 scans rendered as 6 stacked rows reading
"Detected Detected by Pulse Patrol (detected -> detected)".
Backend: drop the unconditional re-detection lifecycle append in
findings.go. The lifecycle should record state transitions, not
heartbeats — TimesRaised and LastSeenAt already track recurrence,
and the genuine transition events ("regressed", "reminded",
"suppression_lifted") are emitted from their own branches upstream.
Frontend: defensively hide (from -> to) spans where from === to
and strip the type-label prefix from the message so already-
persisted polluted lifecycle entries also render cleanly until
they age out of the per-finding event cap.
Adds a test that locks in the new backend behavior.
The MCP adapter shipped in slice 51 with one install option:
clone the repo and go build. This slice integrates pulse-mcp
into Pulse's existing governed release pipeline so a Pulse
release publishes a pulse-mcp binary alongside the unified agent
and the install scripts that bring it home in one command.
What ships:
- scripts/build-release.sh extended to build pulse-mcp for
the same multi-OS matrix as the unified agent, package
per-platform tarballs and zips, and copy bare binaries to
RELEASE_DIR for /releases/latest/download/ redirect
compatibility.
- .github/workflows/create-release.yml extended to upload
the bare pulse-mcp binaries plus install-mcp.sh and
install-mcp.ps1 as release assets.
- scripts/install-mcp.sh: bash one-line installer that
detects platform/arch, downloads the matching binary from
the configured release (latest by default), verifies SHA256
against the published checksums.txt, places at
~/.local/bin/pulse-mcp (or /usr/local/bin if not writable).
Honors PULSE_MCP_VERSION, PULSE_MCP_BIN_DIR, PULSE_MCP_REPO,
PULSE_MCP_NO_VERIFY env vars; declines Windows shells with
a pointer at the .ps1 sibling.
- scripts/install-mcp.ps1: PowerShell installer for Windows,
placing pulse-mcp.exe at $LOCALAPPDATA\pulse-mcp.
Documentation aligned:
- cmd/pulse-mcp/README.md gains an Install section above
Quick start with three options: one-line installer,
GitHub Release download, go install. Documents the macOS
Gatekeeper bypass since v1 is unnotarized by design.
- The Settings -> API Access agent-integrations panel now
surfaces the curl|bash command above the config snippet so
operators see "install pulse-mcp" before "configure your
MCP client."
- docs/releases/AGENT_PARADIGM.md drops the "no published
distribution path" item from "what it does not do yet" and
documents the Gatekeeper / Homebrew gaps as next-tier
follow-ups.
Trade-offs surfaced and chosen:
- Same cadence as Pulse: pulse-mcp ships per Pulse release,
not on its own track. The MCP server reads the manifest
from the Pulse it talks to, so version alignment is the
natural model.
- No Homebrew tap or core formula in v1. Maintaining a tap
is real ongoing work; foundation supports adding Homebrew
later as a layer.
- No Docker image. Stdio JSON-RPC fights Docker's stdin
/stdout pattern.
- No notarization in v1. SHA256 verification through the
installer preserves the audit trail; README documents the
Gatekeeper bypass.
Subsystem contract: deployment-installability.md gains
scripts/install-mcp.sh, scripts/install-mcp.ps1, and
cmd/pulse-mcp/ in canonical files (mid-list entries
renumbered) plus a paragraph documenting the new MCP entry
point alongside the existing installer family.
Verification artifacts:
- scripts/installtests/build_release_assets_test.go gains
TestBuildReleasePackagesPulseMcpForAllPlatforms which pins
the build/package/copy wiring and the load-bearing
install-mcp.sh helpers (platform detection, SHA256
verification, install-dir resolution).
- scripts/release_control/render_release_body_test.py gains
test_agent_paradigm_release_notes_blurb_documents_-
distribution_path which pins the AGENT_PARADIGM.md draft's
install-mcp.sh reference and the four-axis frame so a
future edit cannot regress the install story silently.
Smoke-tested install-mcp.sh locally on darwin-arm64: platform
detection, install-dir resolution, URL building, and 404 error
handling all correct. The full end-to-end install path becomes
live the moment a Pulse release ships pulse-mcp binaries; the
next RC cut will exercise it.
Continues the labelling sweep: close X buttons in
UpdateConfirmationModal, ChangePasswordModal, SuggestProfileModal,
UserAssignmentsDialog, the Patrol configuration popover, the Toast
dismiss control, the SSO provider test result and metadata preview
dismiss buttons, and the compact ContainerUpdateBadge button (which
becomes icon-only when compact). Each gets aria-label + title; the
inline SVG icons get aria-hidden so the button label is the only
announced text.
The Patrol-enabled toggle and Configure Patrol button were wrapped in
a bordered card with a second inner card around the toggle alone.
That stacked two card-on-card surfaces between the page header and
the assessment summary. Removed both card wrappers; the toggle and
Configure button now sit on a flat row, reducing chrome between the
page title and the actual content.
Two real UX bugs in the Verify Patrol panel:
1. The button silently tested the previously-saved model
instead of the operator's pending dropdown selection.
Clicking Verify after changing the model would re-run
preflight against the OLD model and the operator would
believe the new selection was verified.
2. The result panel rendered the cached green badge even
when the form's current selection was a different model
from the one in the cache, with no visual cue that the
verified result was for something other than what they
were looking at.
Fixes:
- runPatrolToolPreflight now passes form.patrolModel as the
model override on the POST. Empty form value falls through
to the configured shared default on the backend.
- PatrolPreflightControl computes isStaleAgainstFormSelection
by comparing the bare model name in form.patrolModel to
the cached result.model. When stale, the panel switches
to amber tone with a headline that names both models and
a detail line prompting the operator to click Verify Patrol.
- stripModelProvider helper handles "provider:model" prefix.
Architecture guardrail extended with three assertions
covering the form-aware verify path and the staleness
indicator wiring. frontend-primitives contract updated.
Live verified: changing the dropdown from deepseek-v4-flash
to deepseek-v4-pro before saving switches the panel from
green "Tool calling verified" to amber "Verified result is
for deepseek-v4-flash, your current selection is
deepseek/deepseek-v4-pro · Click Verify Patrol to test the
pending selection."
Three close/remove buttons in Settings → RBAC role editor and
ResourcePicker rendered only a Lucide icon with no aria-label or title,
so assistive tech announced them as "button" with no purpose. Add
aria-label + title to each: dialog close, remove permission row, and
remove selected resource chip.
The Patrol Assessment card was rendering a Verification + Latest activity
panel and a metric chips grid for every Patrol view, pushing the trust
strip, tabs, and findings list further down the page. The recommended
next step and Discuss with Assistant entry points stay always-visible —
the rest folds behind a Show details / Hide details toggle, defaulting
to compact.
Sources cleanly into release announcements without forcing a
fresh draft when the next RC or GA cuts.
Sized as a focused source draft (under 130 lines): one-paragraph
headline, three operator-facing bullets, the four-axis frame
(discover, read, write, push) lifted and condensed from
docs/AGENT_SUBSTRATE.md, an integrators-pointing section that
names the two reference adapters and the canonical contract, an
honest "what it does not do yet" paragraph (no published
pulse-mcp distribution; no real-world consumer feedback yet),
and an audit-trail summary.
Style matches the existing docs/releases/RELEASE_NOTES_v6_*.md
draft pattern: technical, plainspoken, no marketing prose,
honest about scope. No em dashes per project rule.
Lives at docs/releases/AGENT_PARADIGM.md alongside the existing
release-notes drafts so a release author finds it where they
already look. The header explicitly frames it as a source
draft to drop into announcements, GitHub prerelease descriptions,
or a "What Changed" section in a versioned release notes file,
trimmed or expanded as the cut requires.
Contract-neutral commit: docs-only addition, no runtime code,
wire shape, manifest entry, error code, version artifact,
release workflow input, governed metadata, or canonical-files
contract changed. PULSE_ALLOW_CONTRACT_NEUTRAL_COMMIT used with
a documented reason since the deployment-installability
verification-artifact rule wants a Python release-promotion
test touched, which would be wrong scope for a markdown draft.
Previously the "Patrol tools" readiness check was static
model-name pattern matching: it told the operator whether
the selected model is on Pulse's tool-capable allowlist,
not whether tools have actually been verified to work.
After today's preflight cache, that's strictly less
information than what we already know.
resolvePatrolToolsCheck now consults
aiService.CachedPatrolPreflight() and grounds the check
in real evidence when a result for the configured
provider+model exists:
- cached green (success + tool_call_observed) →
"Tool calling verified <age> against <model>." (ready)
- cached failure → classified summary + "(last preflight
<age>)." (not_ready)
- cached soft warning (model_tool_support_unverified) →
same with warning status
- no cache or model mismatch → static fallback
formatPatrolPreflightAge produces stable English ("just
now", "5m ago", "2h ago", "3d ago") with full unit
coverage (11 cases).
HandleGetAISettings now also includes patrol_readiness in
its response — previously only the PUT response carried it,
so the Patrol page only got augmented readiness after a
save. The frontend already had patrol_readiness typed and
read it from useAISettingsState.
ai-runtime, api-contracts, agent-lifecycle (dep), and
storage-recovery (dep) contracts updated.