Commit graph

1213 commits

Author SHA1 Message Date
rcourtman
4dff26f728 Emit structured telemetry on reporting and summarize invocations
The reporting feature now ships across two surfaces (PDF/CSV export
and pulse_summarize chat tool) and three modes (single-resource,
fleet, summarize). Without usage telemetry we can't tell whether the
work earns its place — operator demand, AI-vs-heuristic adoption,
range/format preferences are all invisible. Stops further feature
investment from being pure speculation.

Three new info-level log events, structured so an agent can grep
transcripts and group by dimension without a separate metrics
pipeline (matches the "agent owns ops analysis, human gets outcomes"
posture in MEMORY.md):

  reporting.single.generated     — single-resource PDF/CSV
  reporting.fleet.generated      — multi-resource fleet PDF/CSV
  reporting.summarize.invoked    — pulse_summarize chat tool (both modes)

Common dimensions: org_id, format/action, range, ai_configured,
findings_configured, window_start/end. Single-resource adds
resource_type + metric_type + bytes; fleet adds resource_count +
bytes; summarize adds resource_type + resource_count (fleet mode) +
narrative_source (so we can audit AI-fallback rate).

Includes rangeLabel() helper that maps a window to the canonical
catalog range token (24h/7d/30d) with a 1h tolerance, falling back
to "<hours>h" so non-standard windows still group. Tested.

TestReportingTelemetryEventNames pins the canonical event names as
a contract — an agent grepping logs depends on them being stable;
changing them silently would break audit tooling on the consumer
side.

The reporting engine already logs the resolved narrative source
(heuristic/ai) at debug level via the existing "Generating report"
line, useful for diagnosing why a specific report fell back. Kept
at debug; the new info-level events cover the operator surface.
2026-05-10 22:59:23 +01:00
rcourtman
03463c1bfe Thread per-tenant AI narrators into pulse_summarize via chat session
v1 of pulse_summarize (1fe5d6853) shipped with heuristic narrative
only. The follow-up wiring promised in that commit now lands: the
chat session carries optional report-narration providers that the
tool's handler reads when building requests, so AI-narrated synthesis
flows into chat using the same provider, sanitizer, model selection,
cost ledger, and budget gate the report PDF endpoint already uses.

Pipeline:
- pkg/reporting Narrator / FleetNarrator / FindingsProvider interfaces
  are already implemented by internal/ai.Service. No new
  implementations.
- tools.ExecutorConfig + PulseToolExecutor gain three optional fields
  (ReportNarrator, ReportFleetNarrator, ReportFindingsProvider).
  Clone() copies them so per-session executors inherit the wiring.
- chat.Config gains the same three fields; NewService threads them
  into ExecutorConfig.
- tools_summarize.go reads e.reportNarrator/FleetNarrator/
  FindingsProvider and populates MetricReportRequest /
  MultiReportRequest. The engine already accepts these on the request
  and falls back to heuristic when they are nil — no engine changes
  needed.
- AIHandler gains SetReportNarratorResolver(ctx -> narrators); both
  per-tenant and default chat.Config construction sites invoke the
  resolver. Router wires the resolver to AISettingsHandler.GetAIService
  with the same Enabled-gate the reporting handler uses.

Unconfigured tenants are unchanged: the resolver returns nil, the
tool returns heuristic narrative — identical to today. Configured
tenants get AI synthesis in chat that matches what their report PDF
already carries, billed and budget-gated the same way.
2026-05-10 22:50:17 +01:00
rcourtman
20df3dcd2c Let a valid bootstrap token authorize initial setup from any origin
The loopback gate from 586473ee3 rejected non-loopback setup requests
before the bootstrap-token check could run, so a Proxmox-LXC install
(install script prints URL + token; user opens URL on workstation,
pastes token) hit "only available from localhost" even with the correct
token. The token is the security boundary — only callers with
filesystem access to the data dir can read it — so a valid token now
authorizes setup from any origin. No-token requests still require
direct loopback.

Updates the two contract/setup tests that pinned the old behavior.

Fixes discussion #1459.
2026-05-10 22:25:34 +01:00
rcourtman
e32d4ede44 Expose engine narrative entry points for non-rendering callers
The reporting engine's synthesis layer was reachable only through
Generate/GenerateMulti, which always rendered PDF or CSV. Pulse
Assistant needs the same retrospective synthesis (per-resource
summary, fleet outliers, period comparison) in a form it can present
in chat, not as a downloaded artifact.

Add two non-rendering entry points to the Engine interface:

  NarrativeFor(req MetricReportRequest) (*Narrative, error)
  FleetNarrativeFor(req MultiReportRequest) (*FleetNarrative, error)

Both run the same query path and the same narrator resolution as their
rendering counterparts (heuristic by default, AI when the request
supplies a narrator, fail-closed-to-heuristic on any narrator error)
and return the structured narrative without invoking the fpdf/csv
output stage. Test stubs in pkg/reporting and internal/api are
updated to implement the extended interface.

These are the seams the upcoming pulse_summarize Assistant tools wrap
to answer questions like "what's hot on pve1 this week" or "where
should I look across my fleet" without round-tripping through report
generation. Same synthesis layer, no PDF involved.

Also fixes a pre-existing flake in TestEngineGenerate_UsesSuppliedNarrator
(metrics writes are async; the first Generate sometimes ran before
the raw tier flushed). Wrapped in the same eventually-pattern used by
the prior-period and findings-provider tests.
2026-05-10 22:23:09 +01:00
rcourtman
d4463a615c Add fleet-level AI narrative for multi-resource reports
The single-resource AI narrative landed in b2bd9d114 but multi-resource
fleet reports stayed heuristic-only. That left a gap on the exact axis
where AI helps most: a 50-resource fleet PDF is where synthesis is the
difference between useful and unread.

Introduce FleetNarrator as a separate interface from Narrator. The
input shapes are different — single-resource takes one set of metric
stats with a prior window, fleet takes a denormalised cross-resource
view with per-resource summaries plus a fleet aggregate.
HeuristicFleetNarrator owns the deterministic fallback: ranks
resources by severity (critical alerts > unhealthy disks > storage
pressure > memory > CPU > non-critical alerts), picks up to 5
outliers, derives cross-cutting patterns by counting how many of N
resources share a hot signal, and emits fleet-scoped recommendations.

internal/ai.Service implements FleetNarrator through
report_fleet_narrator.go. Distinct use-case label
(report_narrative_fleet) so fleet vs single-resource spend is
separable in the cost ledger and budget gate. The fleet payload is
denormalised through buildReportFleetPayload so prompt cost scales
linearly with fleet size. Same fail-closed invariant — nil provider,
parse failure, or context cancellation falls through to the heuristic.

Single-resource Narrator is intentionally NOT propagated through
engine.GenerateMulti: a 50-resource fleet report performs one AI call
(fleet narrator), not 51. The router resolver returns the AI service
for all three roles (Narrator, FleetNarrator, FindingsProvider).

The fleet PDF renders the FleetNarrative in the fleet summary cover
when present: executive prose, named outliers with severity-coloured
bullets, cross-cutting patterns, recommendations, optional period
comparison, and an AI provenance footer. The deterministic resource
summary table is preserved above so every named outlier is verifiable
against the table immediately below it. Legacy "Highest CPU / Most
alerts" bullets remain as the fallback when no FleetNarrative is
attached.
2026-05-10 21:23:12 +01:00
rcourtman
27bd31684a Log the underlying error on audit list 500s
HandleListAuditEvents dropped the Query/Count error before writing the
500, so a user hitting "Failed to fetch audit events" produced no
server-side log line — diagnosing the failure was impossible without a
local repro. Log the error with the org ID so the next instance is
findable. Doesn't change the user-facing response.
2026-05-10 20:44:18 +01:00
rcourtman
b2bd9d1147 Replace heuristic report narrative with optional AI-generated layer
Performance reports rendered the Executive Summary, Observations, and
Recommendations sections from inline threshold rules in pdf.go. That
narrative looked intelligent but was static templating against alert
counts and metric percentiles, which felt off-brand alongside Patrol
and Pulse Assistant.

Introduce a Narrator interface in pkg/reporting and a FindingsProvider
counterpart that the engine consults at report time. The heuristic
rules are lifted into HeuristicNarrator unchanged so the deterministic
fallback still produces the same observations and recommendations.
The engine now also queries the comparable prior period and threads
its aggregate stats through the narrator so deltas can be expressed.

internal/ai.Service implements both interfaces via report_narrator.go
(single-turn JSON call grounded in the structured ReportData payload,
falling back to the heuristic on any error/timeout) and
report_findings.go (Patrol findings whose lifecycle overlaps the
report window). The reporting handler resolves the per-tenant AI
service when it is configured and supplies it in the request; absent
configuration, reports look identical to the prior heuristic output.

Charts, stats tables, alert lists, storage and disk sections stay
deterministic — sysadmins can verify every AI claim against the data
tables next to it. The PDF renders the AI prose between the health
card and Quick Stats, adds a Period-over-period section after
Recommendations, and prints a provenance footer when the narrative
came from the assistant.

ai-runtime.md and api-contracts.md updates land in a follow-up commit
on this branch; agent-lifecycle / performance-and-scalability /
storage-recovery have no contract delta from this change (router.go
is referenced in their Extension Points but their semantics are
unchanged).
2026-05-10 19:30:54 +01:00
rcourtman
297556fb65 Surface cached preflight in Patrol tools readiness check
Previously the "Patrol tools" readiness check was static
model-name pattern matching: it told the operator whether
the selected model is on Pulse's tool-capable allowlist,
not whether tools have actually been verified to work.
After today's preflight cache, that's strictly less
information than what we already know.

resolvePatrolToolsCheck now consults
aiService.CachedPatrolPreflight() and grounds the check
in real evidence when a result for the configured
provider+model exists:

  - cached green (success + tool_call_observed) →
    "Tool calling verified <age> against <model>." (ready)
  - cached failure → classified summary + "(last preflight
    <age>)." (not_ready)
  - cached soft warning (model_tool_support_unverified) →
    same with warning status
  - no cache or model mismatch → static fallback

formatPatrolPreflightAge produces stable English ("just
now", "5m ago", "2h ago", "3d ago") with full unit
coverage (11 cases).

HandleGetAISettings now also includes patrol_readiness in
its response — previously only the PUT response carried it,
so the Patrol page only got augmented readiness after a
save. The frontend already had patrol_readiness typed and
read it from useAISettingsState.

ai-runtime, api-contracts, agent-lifecycle (dep), and
storage-recovery (dep) contracts updated.
2026-05-10 16:41:16 +01:00
rcourtman
f74add8271 Auto-seed Patrol preflight cache on Pulse startup
Closes the cold-start gap in the preflight observability
layer: every Pulse restart blanked the cached "last verified"
indicator until the next save or manual click, which meant
operators saw "never verified" on every upgrade or process
restart even when the configured Patrol model was working
fine.

NewAISettingsHandler now reuses the existing
aiSettingsUpdateRequiresPatrolPreflight predicate with a nil
"prior config" — semantically "no in-memory cache yet, just
booted." When the loaded config has assistant enabled and a
Patrol model, the handler dispatches the same async
TriggerPatrolPreflightAsync the save path uses. Routine
boots where assistant is disabled (or no Patrol model is
selected) skip the dispatch so we never write a misleading
"Pulse Assistant is not enabled" entry into the cache.

Live verified: after Pulse restart, /api/settings/ai
surfaces a fresh patrol_preflight with success=true within
~6s of boot, no operator action required.

Predicate test extended with two named cases that document
the dual-purpose use (startup seed + skip-when-disabled).
ai-runtime, api-contracts, agent-lifecycle (dep), and
storage-recovery (dep) contracts updated.
2026-05-10 16:34:34 +01:00
rcourtman
33b6bf97d1 Auto-trigger Patrol preflight when settings save moves Patrol transport
Closes the resilience gap: today an operator picks a Patrol
model, saves, and Patrol silently fails on its first
scheduled run because nothing verified the model actually
calls tools. Now the save handler dispatches a one-shot
preflight in the background whenever the change actually
moved Patrol's transport, and the cached result surfaces on
the next /api/settings/ai poll via patrol_preflight.

Trigger conditions (aiSettingsUpdateRequiresPatrolPreflight):
  - new config has assistant enabled AND a Patrol model
  - AND any of:
      * no prior config (first save)
      * assistant was disabled, now enabled
      * Patrol model changed
      * API key for the new patrol model's provider changed

Routine saves that don't touch Patrol transport (theme,
control level, discovery toggles, unrelated provider keys)
skip preflight entirely so they don't burn provider tokens
or add 5-10s latency to every save.

Service-level TriggerPatrolPreflightAsync runs the call in a
goroutine with a 30s timeout. Detection helper has full
unit coverage including the negative paths.
2026-05-10 16:14:26 +01:00
rcourtman
728c42e47b Bring action endpoints onto the agent surface with the agent-stable envelope
Closes the last known gap in the agent substrate. The three
action endpoints (POST /api/actions/plan, /api/actions/{id}/decision,
/api/actions/{id}/execute) previously emitted the platform-wide
APIError shape (stable code under "code", human under "error").
The agent surface uses the inverted shape (stable code under
"error", human under "message"), so adding action capabilities
to the manifest as-is would have forced agents to remember which
envelope each capability uses.

The slice refactors actions.go to emit the agent-stable envelope
across all 42 writeErrorResponse call sites. writeJSONError gains
a writeJSONErrorWithDetails sibling so the 13 calls that pass
field-level reasons (validation failures) preserve that
information under a new optional `details` field. The action
endpoints' JSON shape becomes:

  {"error": "<stable_code>", "message": "<human>",
   "details"?: {"<field>": "<reason>"}}

Frontend impact: zero. Verified that no frontend code consumes
the three action endpoints; the refactor is API-only.

Three new manifest entries (plan_action, decide_action,
execute_action) under a new "action" category, with their
declared error codes pinned per capability. Internal-failure 5xx
codes (audit-store outages, encode failures) are not declared
per capability; agents branch on 5xx generically.
TestContract_AgentSurfaceErrorCodesMatchManifestDeclarations now
audits actions.go alongside the existing two handler files, with
a documented internal-only allowlist for the 5xx codes.

The TestAgentSubstrate_ActionEndpointsEmitAgentStableEnvelope e2e
test exercises one error path through each endpoint via the actual
HTTP boundary, asserting the agent-stable envelope reaches the
wire and the legacy APIError fields (code, status_code, timestamp)
do NOT — drift back would mean the refactor regressed.

The TestContract_ActionDryRunOnlyExecutionErrorJSONSnapshot pin
is updated to match the new envelope shape; the manifest's
category allowlist gains "action".

api-contracts.md documents the new envelope (with details map),
the action governance loop's place in the substrate, the
ai:execute scope distinction from monitoring:write, and the
"manifest projection has a footnote" trade-off: bringing an
existing endpoint into the agent surface may require migrating
its error envelope, but the substrate keeps a single envelope
contract rather than carrying a translation wrapper layer.
agent-lifecycle.md and storage-recovery.md document the action
surface joining the agent paradigm and its zero-new-persistence
posture respectively. AGENT_SUBSTRATE.md's "what it does not do
yet" no longer lists the action surface; it now reflects the
real outstanding items (consumer feedback, an in-Pulse agent
integrations panel, a distribution path for pulse-mcp).
2026-05-10 15:16:17 +01:00
rcourtman
add1096ec8 Cache Patrol preflight outcome and hydrate UI on settings load
The Verify Patrol button reset its result to empty on every
page load — the operator had to re-click to see the verified
state, even though nothing had changed. This commit adds the
observability layer of the auto-preflight plan: every
RunPatrolToolPreflight result is now cached on the AI Service
and surfaced through /api/settings/ai as patrol_preflight, so
the inline result panel rehydrates on page load with the
most-recent outcome and a "last verified Xs ago" indicator.

Backend: patrolPreflightCache (mutex-guarded) on Service with
defensive-copy CachedPatrolPreflight() accessor; every
RunPatrolToolPreflight branch (success, soft warning, classified
failure, validation early-return) records into the cache.
PatrolPreflightSnapshot projects the cached result onto the
AI settings response. Tests cover both success-then-failure
supersession and the defensive-copy invariant.

Frontend: PatrolPreflightSnapshot type mirrors the wire shape;
hydratePatrolPreflightFromSettings(data) projects the snapshot
into the same response shape the manual button writes;
loadSettings and updateSettings flows call it. The result
panel renders a "last verified Xs ago" line under the
provider/model row when recorded_at_unix is present.

End-to-end smoke verified against deepseek-v4-flash: panel
rehydrates as green "Tool calling verified · last verified
just now" after page reload.

Auto-preflight on save (the trigger half of the resilience
plan) follows in the next commit.

Contracts: ai-runtime, api-contracts, agent-lifecycle (dep),
storage-recovery (dep), frontend-primitives all updated to
reflect the new patrol_preflight surface and hydration
contract. Verification artifacts: settingsArchitecture +
patrolPreflight client tests.
2026-05-10 14:58:26 +01:00
rcourtman
404f87854e Pin cross-org and cross-resource isolation on the bundle's pending approvals
The AgentApprovalsProvider closure in router.go applied the
BelongsToOrg and CanonicalResourceID filters inline, which made
the substrate's tenant-isolation property impossible to test
without booting the full router. Drift in the closure (e.g.
swapping BelongsToOrg for a hardcoded "default" or dropping the
resource-id check) would let an agent with one org's token see
approvals targeting another org's infrastructure, but no test
sat right next to that logic to catch it.

Extracts the body into a named function in agent_resource_context.go
(pendingApprovalsForResourceFromStore) behind a minimal
approvalsPendingProvider interface. The closure in router.go
now delegates to it. Four unit tests pin the substrate's
isolation property:

  - FiltersByOrg: same resource id, two orgs, each query returns
    only its own org's approval.
  - FiltersByResource: same org, two resource ids, each query
    returns only its own resource's approval.
  - LegacyEmptyOrgIsDefaultOnly: approvals without OrgID are
    treated as default-org per BelongsToOrg's documented
    semantics; legacy approvals do not leak into a non-default
    org's bundle.
  - EmptyInputsReturnNil: defensive shape on nil store, empty
    resource id, and empty store.

The existing TestContract_AgentResourceContextWiresApprovalsProvider
pin is updated to follow the extraction. Both halves of the
wire-up are now pinned: router.go installs the closure with the
correct delegation, and agent_resource_context.go owns the
filter logic with both safety checks present.

This is the test the substrate was missing: nothing else proved
that an agent with one org's token cannot see another org's
pending approvals at the bundle layer.

Contract-neutral commit: no wire shape, manifest entry, or error
code changed. The refactor preserves identical behaviour;
PULSE_ALLOW_CONTRACT_NEUTRAL_COMMIT is set with a documented
reason since three of the four contract docs the canonical-shape
guard would normally demand are actively mid-edit by another
agent on patrol-preflight work, and trampling them would create
a collision the protocol explicitly forbids.
2026-05-10 14:38:10 +01:00
rcourtman
e26a57a157 Add POST /api/ai/patrol/preflight tool-call verification
The existing per-provider /api/ai/test endpoints only call
ListModels — they pass for every provider that returns a
catalog, even when Patrol fails 100% of runs because tools
aren't actually wired up. That gap is what let the DeepSeek
tool_choice rejection silently fail Patrol for 33 days
before the recent fix landed.

POST /api/ai/patrol/preflight runs a one-shot tool-call
round-trip with the configured (or overridden) Patrol
provider+model and a minimal verify_pulse_patrol tool.
Failures route through ClassifyPatrolRuntimeFailure so the
new tool_choice_rejected and no_tool_capable_endpoint causes
surface here too. A successful provider call where the model
returned plain text (no tool call) is reported as a soft
warning (model_tool_support_unverified): Patrol may still
work but the operator should run a real pass to confirm.

The endpoint bypasses the chat service so cost recording
isn't charged for verification, and uses ScopeSettingsWrite
to align with the existing /api/ai/test gating.

Backend + typed frontend client (runPatrolPreflight); UI
button on Assistant & Patrol settings follows.

Contracts updated:
- ai-runtime: completion obligation extended to cover the
  new verification surface
- api-contracts: payload shape (tool_call_observed,
  duration_ms) noted in obligations
- agent-lifecycle, storage-recovery: dependent-extension
  acknowledgment that ai-runtime owns the new route despite
  it living under internal/api/
2026-05-10 14:30:41 +01:00
rcourtman
f2d9d2aba8 Split overgreedy "tools not supported" classifier into three causes
The Patrol runtime classifier collapsed three distinct upstream
conditions into one misleading "Selected model does not support
Patrol tools" message:

  1. Provider rejected the *value* Pulse sent for tool selection
     (e.g. DeepSeek's "deepseek-reasoner does not support this
     tool_choice" — the model accepts tools, just not the forced
     coercion). The DeepSeek fix in 46145df9 dodges the symptom by
     coercing to auto, but the original misclassification pointed
     operators at the wrong remediation for 33 days.
  2. Provider has no tool-capable endpoint available for the
     selected model (OpenRouter's "No endpoints found …" surfaces
     this when account-level provider/data filters exclude every
     tool-capable route).
  3. Model truly lacks tool calling (the literal "tools are not
     supported" / "tool calling" cases).

Each now has its own PatrolFailureCause, title, summary,
description, and recommendation. summarizePatrolRuntimeFailureDetail
mirrors the split. Helper predicates patrolToolChoiceValueRejected
and patrolNoToolCapableEndpoint encapsulate the substring matching.

The OpenRouter "No endpoints found" test fixture now correctly
classifies as no_tool_capable_endpoint instead of
model_unsupported_tools — fixture updates in
patrol_runtime_failure_test.go, patrol_assistant_handoff_test.go,
and ai_handler_test.go reflect the more accurate diagnostic.
New tests cover the tool_choice_rejected and generic
model_unsupported_tools paths explicitly.

The ai-runtime contract is updated to note the classifier-split
obligation alongside the existing transport-shape obligation.
2026-05-10 14:10:18 +01:00
rcourtman
eeb2975d22 Stability sweep on the agent-substrate arc
Three things landed:

1. /api/agent/capabilities was missing from publicPathsAllowlist
   in router_public_paths_inventory_test.go. Slice 47 added the
   path to publicPaths in router.go and to publicRouteAllowlist
   in route_inventory_test.go but missed this second mirror,
   which scans publicPaths via go/ast. The test was failing on
   origin; this commit closes the gap.

2. The error-envelope paragraph in api-contracts.md now
   distinguishes capability-specific stable codes (the closed
   set declared per capability in the manifest) from
   cross-cutting codes the multi-tenant / auth middleware
   emits universally (invalid_org, org_suspended, access_denied).
   The previous wording implied all stable codes lived in
   per-capability errorCodes lists, which would have forced
   duplication on every capability or misled agents about which
   codes to expect.

3. New contract pin TestContract_AgentSurfaceErrorCodesMatch-
   ManifestDeclarations enforces the symmetry both directions:
   every code emitted by an agent-surface handler must be either
   declared in the matching capability or be one of the three
   cross-cutting codes; every manifest-declared code must have a
   matching emission. Drift either way is a contract regression.
   Pin verified clean against the current handler set.

Stale forward-reference fixed: the capabilities paragraph no
longer says "future MCP-server slices read the manifest" — slice
51 already shipped that adapter.

Sweep also surfaced two failures in internal/mock/ from
unrelated platform-support drift (unraid token set added in
ac82a2852 but the mock contract test wasn't updated). Those are
not part of the agent-substrate arc and not mine to fix; flagged
in the closing summary so they don't get lost.
2026-05-09 23:04:22 +01:00
rcourtman
8aa22d0605 Surface action verification on the action.completed SSE payload
Closes the certainty loop for agents watching the substrate's push
channel. The action audit's read-after-write probe outcome was
already persisted on the audit record, but agents watching
action.completed only learned "the action ran" — they had to fetch
/api/actions/{id} to know whether the read-back probe confirmed
the intended state. That defeated the substrate's
push-notification guarantee for dispatch certainty.

The new agent-stable AgentResourceActionVerification projection
(ran, success, command, note, ranAt — output stays in the audit
record, deliberately omitted from events to keep payloads small)
is now carried on both:

  - the action.completed SSE payload, projected from
    record.Result.Verification by the router-side bridge in
    wireAIChatDependenciesForService, and
  - the resource-context bundle's recentActions surface, via the
    same shared projectAgentResourceVerification helper

so the bundle (depth) and the doorbell (push) speak the same
vocabulary. Refused-before-dispatch failures omit verification
(the probe never runs) so agents branch on field presence to
distinguish "no probe attempted" from "probe ran with empty
result". Three contract pins lock the symmetry: payload field
present, router bridge populates it, bundle parallels.

The capabilities manifest's subscribe_events description now
mentions the verification block so external agents discover the
field through the same path they already use to learn the rest
of the agent surface.
2026-05-09 22:45:15 +01:00
rcourtman
5156c03eed End-to-end test the operator-state write loop through HTTP
Closes the e2e contract proof on the write side. The only write
capability the manifest declares is the operator-state intent
loop (set / get / clear), and this test boots the full router
stack to walk every state of that loop through the actual HTTP
boundary — proving the manifest's declared error codes for
set_operator_state and get_operator_state reach the wire from
the handlers, the URL canonical id authoritatively wins over
body-supplied ids (no scope-confusion writes), and SetAt/SetBy
are server-populated so attribution cannot be spoofed.

The flow exercised:
  GET unset → 404 operator_state_not_set
  PUT valid → 200 with persisted state + server SetAt
  GET → round-trips
  PUT invalid criticality → 400 operator_state_invalid
  DELETE → 204
  GET → 404 operator_state_not_set (loop closed)
  DELETE again → 204 (idempotent)

Two contract pins lock the audit-honesty and error-token
contracts so a future refactor of the handler can't silently
regress either: SetAt/SetBy populated server-side, URL-id wins
over body-id, and the validator's domain error maps to the
stable wire token via errors.Is rather than message-matching.

Together with the read-side e2e (slice 47), the agent surface —
read, write, push — has now been exercised end-to-end as one
substrate.
2026-05-09 22:22:47 +01:00
rcourtman
8cf15fe639 End-to-end test the agent substrate's discovery → triage → depth flow
The unit tests cover each piece in isolation; this test boots the
full router stack and proves the discovery → triage → depth chain
works as one substrate through the actual HTTP boundary an
external agent would hit. It found two real bugs slice 40
introduced and slice 45/46 didn't surface:

- /api/agent/capabilities was documented as unauthenticated but
  was missing from the router's publicPaths list, so the global
  auth middleware was 401'ing the discovery manifest. Fixed by
  adding the path to publicPaths and pinning the contract so it
  cannot regress.

- The error-envelope shape across the agent surface is
  {"error": "<stable_code>", "message": "<human>"}, written via
  writeJSONError — not the {"code": ...} shape I had assumed in
  the docs. Pinned the wire shape on api-contracts.md so the
  documented error contract matches what writeJSONError actually
  writes.

The e2e test exercises capabilities discovery, triage via
fleet-context, and depth via resource-context with an unknown id
to confirm the resource_not_found stable error code reaches the
wire under the canonical "error" key. The subscribe_events SSE
path is probed unauthenticated to confirm it's gated (401) rather
than 404 — discovery's claim is honest.
2026-05-09 22:16:32 +01:00
rcourtman
a168215f6a Add /api/agent/fleet-context for org-wide triage in one read
The substrate had a per-resource bundle but no fleet view, so
"where do I focus?" forced agents to walk every resource id and
bundle each — O(N) round trips that scale with fleet size. The
fleet endpoint returns a thin per-resource rollup in a single
read: identity, operator-intent flags (intentionallyOffline,
neverAutoRemediate, maintenanceWindowActive), per-severity
finding counts, and pending-approval count.

Same auth scope and same provider wiring as the per-resource
bundle — operator-state via the canonical unified store, findings
via AgentFindingsProvider, approvals via AgentApprovalsProvider —
so the fleet sweep is the per-resource bundle's wiring multiplied
by N with no new dependencies. Audit reads are deliberately
omitted from the rollup; agents that want depth on a flagged
resource follow up via /api/agent/resource-context/{id}.

The capabilities manifest declares get_fleet_context with
AgentFleetContext as the response shape so external agents
discover the triage entry point through the same path they
already use to learn the rest of the agent surface.
2026-05-09 22:08:50 +01:00
rcourtman
d8f6b1e508 Bundle pending approvals into the agent resource-context endpoint
The substrate's "everything an agent needs in one read" guarantee
covered identity, operator state, findings, and recent actions but
forced a separate /api/approvals call for pending governance
state. AgentResourceContext now carries pendingApprovals as a
lightweight AgentResourceApprovalSummary projection — same
vocabulary as approval.pending SSE events, so the doorbell and
the bundle agree on shape. AgentApprovalsProvider is the parallel
seam to AgentFindingsProvider; the router wires a closure that
resolves approval.GetStore() at request time, scopes via
BelongsToOrg, and filters by CanonicalResourceID so cross-tenant
or cross-resource pending requests don't leak. Empty arrays
preserve the iteration-safe contract the existing sections
already follow.
2026-05-09 22:00:53 +01:00
rcourtman
7fe9b1c492 Use cursor-help on TagBadges hover-only +N indicator
The "+N" overflow indicator on TagBadges was styled with
`cursor-pointer`, which signals a clickable affordance — but the
element only listens for mouseenter/mouseleave to show a tooltip and
has no click handler. Switch to `cursor-help` so the cursor matches
the actual interaction (hover for more info), avoiding a phantom
click expectation.
2026-05-09 21:52:34 +01:00
rcourtman
52669128e6 Drop redundant policy gates in resource-link routing
Tail of the operator-local-UI redaction sweep (abdde303a, a17f879a1).

resolveKubernetesContextForResource gated on requiresGovernedResourceDisplay
to choose between getPreferredInfrastructureDisplayName and a manual
displayName-or-name fallback. Both branches produce a raw infra name
once we trust that displayName never carries a redacted summary in
local rendering, so the gate is dead complexity. Collapse to a single
call and drop the now-unused requiresGovernedResourceDisplay import.

problemResourcePresentation.getProblemResourceDisplayName has no
production consumers today, but it still routes through the governed
helper. Reclassify it now (same as every other operator-local helper)
so the rule is consistent across the codebase if the surface ever gets
adopted.
2026-05-09 21:31:45 +01:00
rcourtman
51c5d344ce Plumb operator-state and operational memory into investigation findings
Closes the "has context vs uses context" gap that defines Pulse's
agent-paradigm differentiation. The orchestrator (in pulse-pro) used
to receive a Finding with no awareness of the operator's
commitments — Patrol could investigate a resource the operator had
marked never-auto-remediate and propose a restart fix that the
action broker would refuse downstream. The proposal shouldn't have
happened in the first place.

Adds two optional fields to aicontracts.Finding:

- OperatorContext: intentionally offline, never auto-remediate,
  maintenance window with computed active flag, criticality, note.
  Populated in MaybeInvestigateFinding from the same operator-state
  projection the suppression hot path consumes, so investigation
  reasoning and suppression behavior cannot drift apart.
- OperationalMemory: regression count, previous resolved fix
  summary, last regression timestamp, times raised. Populated in
  ToCoreFinding from fields the internal Finding already carries.

ResourceOperatorStateProjection grew a NeverAutoRemediate field —
the investigation read path needs it (so the orchestrator can avoid
proposing fixes the broker would refuse) even though the
suppression hot path doesn't. Same projection serves both reads.

Both fields are nil when there's no signal (fresh finding, no
operator state) so the orchestrator branches on absence rather
than parsing zero-valued structs. The pulse-pro orchestrator
consumes the fields in a separate slice; this slice ships the
in-repo half of the data path.
2026-05-09 21:03:15 +01:00
rcourtman
94bfd48a9d Add /api/agent/events SSE stream for real-time agent notifications
Third slice on the agent-paradigm pivot, closing the substrate
triangle (discovery + bundled reads + push). Agents subscribe once
to a long-lived SSE connection and receive real-time events instead
of polling: finding.created when a new finding is raised, heartbeat
every 15 seconds for keepalive. Each event carries a monotonic ID so
agents can dedupe and reason about ordering across reconnects.

The broadcaster fan-outs to multiple subscribers and drops events
for slow consumers rather than blocking the publish path —
publishers cannot stall on consumer slowness. The findings-runtime
hook in router.go publishes finding.created when the finding is new
AND not auto-dismissed by operator-state suppression (operator
already said to stay quiet about that resource); patrol-cycle
re-detection of existing findings doesn't fire the event.

Capabilities manifest declares the stream under subscribe_events so
external agents discover it through the same channel as the REST
surface. SSE chosen over WebSocket because it's simpler, works
through every HTTP proxy without special-casing, and matches the
existing deploy_handlers pattern; agents that need bidirectional
comms call REST endpoints in parallel.

Tests pin the broadcaster's pub/sub semantics (fan-out, unsubscribe,
slow-consumer drop, monotonic IDs), the SSE handler's stream
contract (text/event-stream, no-cache, X-Accel-Buffering=no), and
the connected/published-event delivery via httptest.NewServer. A
contract test pins the publish-gate semantics so operator-state
suppression and stream notifications stay aligned.
2026-05-09 20:13:31 +01:00
rcourtman
71797f9b21 Add /api/agent/capabilities discovery manifest for agent integrations
Second slice on the agent-paradigm pivot: the discovery document any
external agent (Claude Code, custom integrations, future MCP servers)
needs to learn what Pulse exposes. Each capability declares its
agent-stable name (snake_case), description, category, REST surface,
required scope, response shape, and the closed set of stable error
codes the response may carry. Agents branch on the codes
(operator_state_invalid, resource_not_found, etc.) rather than
parsing human messages.

The manifest is hand-authored, not auto-generated, because the
contract decisions (what's agent-stable, which categories, which
error codes) are product-shaping and must not drift behind code
changes. Adding a capability is a deliberate "this is part of the
agent surface" commitment.

v1 surface includes: get_resource_context (substrate from slice 39),
get/set/clear_operator_state (slice 30), and the finding-lifecycle
actions (acknowledge, snooze, dismiss, resolve). Action-broker
capabilities are not in v1 because they go through approval flow,
not direct dispatch — those need their own contract design.

Tests pin: stable shape, version contract, unique-and-snake_case
names, every capability has method/path/scope, closed category set,
required error codes for the most consequential capabilities. The
manifest is unauthenticated and cacheable (5min); the underlying
capabilities keep their own auth scopes.
2026-05-09 19:52:17 +01:00
rcourtman
14f9270a5e Add /api/agent/resource-context/{id} substrate endpoint for agents
First slice on the agent-paradigm pivot: instead of building more
human-glance UI, expose substrate that any agent (in-process Patrol,
external Claude Code, future MCP-driven setups) can consume in one
read. The endpoint returns the full situated picture of a resource —
identity, operator-set state with server-computed
maintenanceWindowActive flag, active findings as a lightweight
seven-question-schema projection, and recent action audits with
refusal tokens (resource_remediation_locked:, plan_drift:) preserved
verbatim for agent branching.

Substrate is the right shape here: an agent reasoning about a
resource gets everything it needs without chaining four or five calls,
and the projection types decouple agent-stable wire shape from
internal type evolution. Active findings flow through an
AgentFindingsProvider adapter wired in router.go from the patrol
service, keeping the api package free of an internal/ai import.

Always-array fields (activeFindings, recentActions) and
omitempty-on-absent (operatorState) give agents stable iteration and
clean field-presence branching. AgentContextHandler owns the agent
surface as its own type so it evolves independently of resource CRUD.
Each test pins a specific contract: identity round-trip, operator-state
projection with computed flag, empty-state shape, refusal-token
preservation, 404 shape, method gating.
2026-05-09 19:46:16 +01:00
rcourtman
8a70a2c23c Auto-acknowledge findings on intentionally-offline resources
Compounding slice on the per-resource operator-state feature: when an
operator has marked a resource as IntentionallyOffline (via the API
surface from slice 30), new findings against that resource get
auto-dismissed as expected_behavior with the lifecycle event tagged
operator_state_cause=intentionally_offline. Same shape as the
maintenance-window suppression from slice 31 but indefinite — no
scheduled end, no maintenance_end_at metadata.

Restructures the ResourceOperatorStateProvider interface to return a
single ResourceOperatorStateProjection per call rather than a narrow
ActiveMaintenanceWindow method. Adding the second signal would
otherwise have meant a second method and another call per finding;
the projection shape carries every signal in one round-trip and gives
future signals (NeverAutoRemediate, criticality) a stable extension
point.

Maintenance windows take priority over intentionally_offline when
both are active because the time-bounded suppression is more honest
to surface to the operator — they'll see it auto-clear when the
window ends rather than wondering when the indefinite suppression
will lift.
2026-05-09 14:54:45 +01:00
rcourtman
cf2e61ea22 Auto-acknowledge new findings during operator-set maintenance windows
Third slice on the per-resource operator-state feature delivers the
behavioral payoff: when an operator has set a maintenance window on a
resource (via the API surface from slice 30), new findings against
that resource get auto-dismissed as expected_behavior at creation time
rather than firing notifications. The finding still lands in durable
history with a UserNote naming the window and a "dismissed" lifecycle
event tagged operator_state_cause=maintenance_window so the operator
can audit what tripped during the window.

Adds a narrow ResourceOperatorStateProvider interface to internal/ai
so the findings runtime stays free of an internal/unifiedresources
import. The API layer wires an adapter in router.go that projects
unified.ResourceOperatorState through state.IsInMaintenanceAt into
the ActiveMaintenanceWindow shape the findings runtime consumes.

Suppression is opt-in: stores without a provider wired keep the
original new-finding behavior bit-for-bit, so deployments that
haven't adopted the operator-state feature see no behavioral change.
IntentionallyOffline and NeverAutoRemediate land as compounding
slices on the same provider interface.
2026-05-09 14:43:22 +01:00
rcourtman
46646b4293 Add /api/resources/{id}/operator-state GET / PUT / DELETE handlers
Second slice on the per-resource operator-state feature: the API
surface that the storage foundation from slice 29 was designed to
support. A frontend, pulse-cli, or Assistant tool call can now read
the operator-set state for a resource, replace it with PUT, or clear
it with DELETE.

Contract decisions worth preserving:
- GET 404s with stable error code operator_state_not_set when no
  entry exists, distinct from a 200 with default (all-zero) fields.
- PUT replaces the entire record. URL canonicalId wins over body to
  prevent body-manipulation retargeting; server-side setAt/setBy
  populate from request time and authenticated identity, ignoring
  client values so the audit trail stays honest.
- Validation rejections surface 400 with operator_state_invalid so
  frontend can branch on the code without string-matching messages.
- DELETE is idempotent — 204 whether or not an entry was present.
- GET runs under monitoring:read; PUT and DELETE under
  monitoring:write because the state modulates Patrol's behavior.

Finding-suppression and action-broker integrations land in subsequent
slices that consume the same ResourceOperatorState shape.
2026-05-09 14:34:43 +01:00
rcourtman
2bd4621b76 Attribute operator-driven Mark resolved closures as "Resolved by you"
Slice 22 added the manual Mark resolved button (auto_resolved=false) but the
resolution-reason copy still flattened every closure into "Condition
cleared" or "Issue no longer detected" — the operator couldn't tell from
the timeline whether they had closed the loop themselves or Pulse had
auto-detected the condition clearing.

Threads the existing Finding.AutoResolved flag through unified.UnifiedFinding
(Go), router.go conversions, UnifiedFindingRecord (TS), and the store-level
UnifiedFinding. The frontend resolution-reason helper now reads "Resolved by
you <time>" when autoResolved === false, while keeping Patrol's specific
fix outcomes (fix_verified, fix_executed, resolved) priority because those
describe Pulse's actual remediation rather than mere auto-detection.
2026-05-09 11:10:11 +01:00
rcourtman
5cc2f61be0 Surface will_fix_later remind-at on dismiss confirm and dismissed rows
Slice 18 made will_fix_later a real operational commitment server-side, but
the new RemindAt field stayed invisible to operators until the reminder fired
a week later. This wires it through the API surface and renders it where the
operator decides and where they later revisit.

UnifiedFinding (Go and TS) and the Patrol Finding TS shape now carry
RemindAt / remind_at; router.go and AddFromAI mirror it like the other
user-feedback fields. FindingsPanel previews "Pulse will stay quiet for 7
days, then surface again on <date>" on the dismiss confirmation panel before
the operator confirms, badges dismissed-as-will_fix_later rows with
"Reminding <date>" in amber, and adds explanatory copy for the other two
dismissal reasons so all three paths feel deliberate rather than
undifferentiated.
2026-05-09 10:47:21 +01:00
rcourtman
07d1ab51d1 Surface trust metrics on the Patrol page
Wire FindingsStore.GetTrustSummary through PatrolService and the
patrol-status API into the Patrol page so the operator can scan
"is Pulse useful?" at a glance. Adds a small Trust strip above the
Findings/Runs tab bar that renders compact signals: fixes verified,
auto-resolved, dismissed-as-noise, dismissed-as-expected, currently
active, and regressed-at-least-once. The strip is hidden when every
signal is zero so a fresh install sees no empty pill.

Plumbing:
- PatrolService.GetFindingsTrustSummary accessor (delegates to the
  store-level method shipped in the prior slice)
- PatrolStatusResponse carries Trust *FindingsTrustSummary; populated
  from the active patrol service, omitted when no service is available
  (snapshot semantics, not lifetime totals)
- TS FindingsTrustSummary mirror in api/patrol.ts and a trust field on
  PatrolStatus
- PatrolIntelligenceWorkspace reads state.patrolStatus()?.trust and
  conditionally renders the strip

Verification artifacts:
- internal/ai/patrol_test.go: TestPatrolService_GetFindingsTrustSummary
- internal/api/contract_test.go: TestContract_PatrolStatusTrustJSONSnapshot
  pinning the canonical wire shape
- frontend-modern/src/api/__tests__/patrol.test.ts: round-trip test for
  the trust block on the patrol-status response
- frontend-modern/src/features/patrol/__tests__/PatrolIntelligenceWorkspace.test.ts:
  source-text test pinning state.patrolStatus()?.trust read,
  aria-label, and field names so future strip additions go through
  the FindingsTrustSummary contract first.
- frontend-modern/src/features/patrol/__tests__/patrolInvestigationContextModel.test.ts:
  pins that the per-finding context model does not synthesize impact
  from trust counters; trust is an aggregate operator-page concern,
  not a per-finding text source.

Updates the api-contracts, ai-runtime, patrol-intelligence,
agent-lifecycle, frontend-primitives, and storage-recovery contracts
to pin the trust block's shape, the strip's contract-first rule, and
the scope boundary (trust counters are advisory operator context, not
enrollment/storage/recovery action authority).
2026-05-08 21:11:24 +01:00
rcourtman
cff4226531 Pass stored fingerprint into PVE diagnostic test client
The /api/diagnostics handler builds its own test client per PVE node
to run a live connectivity probe. The PBS branch already passed
node.Fingerprint into the test client config, but the PVE branch did
not. With VerifySSL=true and a self-signed Proxmox cert (the standard
configuration), tlsutil.CreateHTTPClientWithTimeout falls into
default-secure mode and validates against the system CA chain, which
fails the handshake even when the actual poller — which DOES pass
the fingerprint — is connecting fine.

The result was that /api/diagnostics reported delly + pi as
"Failed to connect to Proxmox API" while /api/resources was happily
ingesting all 27 workloads from the same hosts. Mirror the PBS
branch by passing node.Fingerprint into the PVE testCfg so the
diagnostic probe uses the same TLS verification path as the runtime
poller.

Add a regression test that spins up an httptest TLS server, captures
its leaf cert SHA-256, configures a PVE instance with VerifySSL=true
and that fingerprint, and asserts computeDiagnostics reports
Connected=true. The pre-fix code fails this with a "tls: bad
certificate" handshake error.
2026-05-08 20:58:32 +01:00
rcourtman
b5c8e00859 Preserve previous successful fix across regressions
Capture the prior InvestigationRecord.ProposedFix.Description into a
new Finding.PreviousResolvedFixSummary field at regression time, before
the InvestigationRecord is cleared. Without this capture the next
investigation starts from blank context whenever a finding regresses,
and operational memory of "what worked last time" is lost.

The summary propagates through:
- FindingsStore.Add regression branch (capture before clear)
- Finding.MarshalJSON / UnmarshalJSON (wire shape)
- Both Finding to UnifiedFinding conversion sites in router.go
- UnifiedFinding (struct + JSON shadow + Marshal/Unmarshal)
- UnifiedStore.AddFromAI update branch (non-empty overwrite)
- Assistant chat context as a "Previous Resolved Fix" line so the LLM
  sees what worked previously rather than blank-slate diagnosing each
  regression.

Adds a unit test that walks the full lifecycle (detect, resolve via
UpdateInvestigationRecord + ResolveWithReason, re-detect) and asserts
PreviousResolvedFixSummary is preserved while InvestigationRecord is
cleared, plus two chat-context tests covering the surfaces-when-set and
omits-when-empty cases. Adds a contract test pinning the canonical
"previous_resolved_fix_summary" JSON key. Updates the api-contracts,
ai-runtime, and the dependent agent-lifecycle, performance-and-
scalability, and storage-recovery contracts to pin the operational
memory propagation rule and its scope boundary.
2026-05-08 19:45:44 +01:00
rcourtman
10ff1c4dcf Surface Finding.Impact through UnifiedFinding to operators
Carry the Finding.Impact text added in the previous slice through the
Finding to UnifiedFinding boundary and onto the FindingsPanel surface
so the runtime-failure consequence-if-ignored copy is visible to the
operator. Add Impact to the UnifiedFinding struct, JSON snapshot, and
both Marshal/Unmarshal mirrors; copy f.Impact into both Finding to
UnifiedFinding conversion sites in router.go; mirror impact in the TS
UnifiedFindingRecord and Finding API types and the aiIntelligence
store normalizers; render an Impact line between Description and
Recommendation in FindingsPanel.

Also fix the FindingsStore.Add dedup-merge path so re-detected findings
overwrite existing.Impact alongside Description and Recommendation
rather than preserving the stale empty value left by an older binary.
Without this fix, a freshly-classified runtime failure with new Impact
text would be merged onto the persisted finding but the Impact field
would be silently dropped.

Verified end-to-end against the live runtime: triggered a Patrol run,
watched the runtime-failure finding regenerate, confirmed the
operator-visible card now renders "Impact: While Patrol cannot
analyze..." between Description and Recommendation. Updates the
api-contracts, ai-runtime, patrol-intelligence, and the dependent
agent-lifecycle, performance-and-scalability, and storage-recovery
contracts to pin the propagation rule and the dedup-merge invariant.
2026-05-08 17:31:19 +01:00
rcourtman
e7b5650233 Add impact and rollback to investigation records
Promote the seven-field investigation-record shape so Patrol findings
can carry consequence-if-ignored context and a record-level rollback
plan alongside the existing verification array. The shared
aicontracts.InvestigationRecord struct gains top-level Impact and
Rollback fields with matching TS mirrors, normalizes Rollback to an
empty slice, and the Patrol-owned investigation surface renders an
explicit "Impact not assessed" / "Rollback not specified" placeholder
so the operator-visible gap is conspicuous to both the operator and
Assistant when Patrol has not populated them. Backend default leaves
both empty rather than fabricating analysis from severity/category.
Also closes the existing Trigger.cause drift between Go and TS so
frontend handoff context preserves backend-attributed failure cause,
and updates the api-contracts, ai-runtime, frontend-primitives, and
patrol-intelligence subsystem contracts to pin the new shape.
2026-05-08 16:47:55 +01:00
rcourtman
f88f0fcf1a Align DeepSeek V4 Patrol readiness 2026-05-08 11:34:07 +01:00
rcourtman
ac82a28521 Fix Unraid agent host profile detection 2026-05-08 11:05:14 +01:00
rcourtman
d0940db33b Clamp Patrol monitor-only autonomy saves
Refs #1463
2026-05-08 02:21:31 +01:00
rcourtman
dd15d23a35 Hydrate Patrol requester in Assistant handoffs
- hydrate approval requester identity into backend handoff actions
- include requester provenance in refreshed finding action context
- document the backend requester authority boundary
2026-05-08 00:26:04 +01:00
rcourtman
ea3e1b216a Persist Patrol approval requester identity
- store requester provenance on approval records
- carry requester metadata through approval APIs and Assistant handoffs
- document the safe Patrol approval provenance boundary
2026-05-08 00:12:09 +01:00
rcourtman
f563adf136 Mark Patrol queued fixes as Patrol actions
- derive approval requester identity from investigation-fix approvals

- stamp pending action-audit lifecycle events with Patrol as the producer

- document the requester boundary for Patrol handoffs and timelines
2026-05-07 23:48:12 +01:00
rcourtman
4736358acc Drive agent host profiles from platform manifest 2026-05-07 23:42:15 +01:00
rcourtman
cf8931031d Record Patrol fix approvals in action audit
- seed queued investigation fixes with governed action plans

- persist planned and pending lifecycle evidence for Patrol approvals

- cover the approval adapter and subsystem contracts
2026-05-07 23:26:39 +01:00
rcourtman
bbaa6c0a6c Cover Patrol handoff approval mode
- force non-empty Patrol finding handoffs into approval-required mode

- add direct helper coverage for finding, scoped, resource, action, and metadata handoffs

- align subsystem contracts with the backend-owned clamp
2026-05-07 23:08:56 +01:00
rcourtman
f13a89db62 Clear Patrol runtime issue after scoped success
- mark DeepSeek V4 Flash and Pro tool-ready for Patrol readiness
- resolve synthetic Patrol runtime findings after provider-backed scoped runs
- cover readiness and run-history resolved counts
2026-05-07 22:42:38 +01:00
rcourtman
7da942226a Clarify Pulse Agent host profile support
Separate first-class platform support from Pulse Agent host profiles and classify Unraid as an agent-backed host profile while preserving it as presentation-only platform vocabulary.
2026-05-07 22:28:24 +01:00
rcourtman
d9331d39e4 Move Patrol run Assistant handoffs server-side
- rehydrate Patrol run context from Patrol history for chat requests and follow-up sessions
- send only browser-safe run metadata from the frontend
- classify and redact provider runtime failures in handoff prompts and briefings
2026-05-07 21:58:06 +01:00
rcourtman
f7992e8e78 Return safe provider preflight diagnostics
Classify Assistant and Patrol provider-test failures through the Patrol runtime failure taxonomy, redact secret-shaped provider evidence, and preserve safe recommendations in the settings shell.

Refs #1463
2026-05-07 20:45:18 +01:00