Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-05-20 09:23:27 +00:00

Author	SHA1	Message	Date
rcourtman	d4463a615c	Add fleet-level AI narrative for multi-resource reports The single-resource AI narrative landed in `b2bd9d114` but multi-resource fleet reports stayed heuristic-only. That left a gap on the exact axis where AI helps most: a 50-resource fleet PDF is where synthesis is the difference between useful and unread. Introduce FleetNarrator as a separate interface from Narrator. The input shapes are different — single-resource takes one set of metric stats with a prior window, fleet takes a denormalised cross-resource view with per-resource summaries plus a fleet aggregate. HeuristicFleetNarrator owns the deterministic fallback: ranks resources by severity (critical alerts > unhealthy disks > storage pressure > memory > CPU > non-critical alerts), picks up to 5 outliers, derives cross-cutting patterns by counting how many of N resources share a hot signal, and emits fleet-scoped recommendations. internal/ai.Service implements FleetNarrator through report_fleet_narrator.go. Distinct use-case label (report_narrative_fleet) so fleet vs single-resource spend is separable in the cost ledger and budget gate. The fleet payload is denormalised through buildReportFleetPayload so prompt cost scales linearly with fleet size. Same fail-closed invariant — nil provider, parse failure, or context cancellation falls through to the heuristic. Single-resource Narrator is intentionally NOT propagated through engine.GenerateMulti: a 50-resource fleet report performs one AI call (fleet narrator), not 51. The router resolver returns the AI service for all three roles (Narrator, FleetNarrator, FindingsProvider). The fleet PDF renders the FleetNarrative in the fleet summary cover when present: executive prose, named outliers with severity-coloured bullets, cross-cutting patterns, recommendations, optional period comparison, and an AI provenance footer. The deterministic resource summary table is preserved above so every named outlier is verifiable against the table immediately below it. Legacy "Highest CPU / Most alerts" bullets remain as the fallback when no FleetNarrative is attached.	2026-05-10 21:23:12 +01:00
rcourtman	b2bd9d1147	Replace heuristic report narrative with optional AI-generated layer Performance reports rendered the Executive Summary, Observations, and Recommendations sections from inline threshold rules in pdf.go. That narrative looked intelligent but was static templating against alert counts and metric percentiles, which felt off-brand alongside Patrol and Pulse Assistant. Introduce a Narrator interface in pkg/reporting and a FindingsProvider counterpart that the engine consults at report time. The heuristic rules are lifted into HeuristicNarrator unchanged so the deterministic fallback still produces the same observations and recommendations. The engine now also queries the comparable prior period and threads its aggregate stats through the narrator so deltas can be expressed. internal/ai.Service implements both interfaces via report_narrator.go (single-turn JSON call grounded in the structured ReportData payload, falling back to the heuristic on any error/timeout) and report_findings.go (Patrol findings whose lifecycle overlaps the report window). The reporting handler resolves the per-tenant AI service when it is configured and supplies it in the request; absent configuration, reports look identical to the prior heuristic output. Charts, stats tables, alert lists, storage and disk sections stay deterministic — sysadmins can verify every AI claim against the data tables next to it. The PDF renders the AI prose between the health card and Quick Stats, adds a Period-over-period section after Recommendations, and prints a provenance footer when the narrative came from the assistant. ai-runtime.md and api-contracts.md updates land in a follow-up commit on this branch; agent-lifecycle / performance-and-scalability / storage-recovery have no contract delta from this change (router.go is referenced in their Extension Points but their semantics are unchanged).	2026-05-10 19:30:54 +01:00
rcourtman	51c5d344ce	Plumb operator-state and operational memory into investigation findings Closes the "has context vs uses context" gap that defines Pulse's agent-paradigm differentiation. The orchestrator (in pulse-pro) used to receive a Finding with no awareness of the operator's commitments — Patrol could investigate a resource the operator had marked never-auto-remediate and propose a restart fix that the action broker would refuse downstream. The proposal shouldn't have happened in the first place. Adds two optional fields to aicontracts.Finding: - OperatorContext: intentionally offline, never auto-remediate, maintenance window with computed active flag, criticality, note. Populated in MaybeInvestigateFinding from the same operator-state projection the suppression hot path consumes, so investigation reasoning and suppression behavior cannot drift apart. - OperationalMemory: regression count, previous resolved fix summary, last regression timestamp, times raised. Populated in ToCoreFinding from fields the internal Finding already carries. ResourceOperatorStateProjection grew a NeverAutoRemediate field — the investigation read path needs it (so the orchestrator can avoid proposing fixes the broker would refuse) even though the suppression hot path doesn't. Same projection serves both reads. Both fields are nil when there's no signal (fresh finding, no operator state) so the orchestrator branches on absence rather than parsing zero-valued structs. The pulse-pro orchestrator consumes the fields in a separate slice; this slice ships the in-repo half of the data path.	2026-05-09 21:03:15 +01:00
rcourtman	0dd3f8bedb	Surface per-endpoint reasons in cluster "no healthy nodes" error When every cluster endpoint failed health, getHealthyClient wrapped the failure as `no healthy nodes available in cluster X (all N endpoints unreachable: [...])`, dropping the per-endpoint reason from cc.lastError. The connections aggregator's auth-error regex (401/403/unauthorized/forbidden/authentication/...) only sees the outer message, so a token rejected with 401 on every endpoint of a clustered PVE connection surfaced as `state: "unreachable"` / `adapterHealth: "blocked"` instead of `state: "unauthorized"` / `credentialStatus: "invalid"` — the same Settings → Connections brokenness the rest of today's commits set out to remove. Single-node `pve:pi` already classified the same kind of failure correctly because its error came straight from the per-instance client; only the cluster wrapper masked it. Surface each unhealthy endpoint's already-sanitized reason in the outer error. The "no healthy nodes available" prefix is preserved so existing callers that test for it (monitor_polling_storage.go, internal cluster_client passthroughs, existing tests) keep working. Add a regression test covering both shapes: - all endpoints failed auth → wrapped error contains "Authentication failed" so the aggregator regex now matches. - endpoint with no recorded reason → wrapped error includes the fallback "no recorded reason" text rather than a bare URL.	2026-05-08 21:10:14 +01:00
rcourtman	e7b5650233	Add impact and rollback to investigation records Promote the seven-field investigation-record shape so Patrol findings can carry consequence-if-ignored context and a record-level rollback plan alongside the existing verification array. The shared aicontracts.InvestigationRecord struct gains top-level Impact and Rollback fields with matching TS mirrors, normalizes Rollback to an empty slice, and the Patrol-owned investigation surface renders an explicit "Impact not assessed" / "Rollback not specified" placeholder so the operator-visible gap is conspicuous to both the operator and Assistant when Patrol has not populated them. Backend default leaves both empty rather than fabricating analysis from severity/category. Also closes the existing Trigger.cause drift between Go and TS so frontend handoff context preserves backend-attributed failure cause, and updates the api-contracts, ai-runtime, frontend-primitives, and patrol-intelligence subsystem contracts to pin the new shape.	2026-05-08 16:47:55 +01:00
rcourtman	ea3e1b216a	Persist Patrol approval requester identity - store requester provenance on approval records - carry requester metadata through approval APIs and Assistant handoffs - document the safe Patrol approval provenance boundary	2026-05-08 00:12:09 +01:00
rcourtman	d2625c4dfb	Persist Patrol settings with readiness handoff Refs #1463	2026-05-07 19:26:00 +01:00
rcourtman	86244d8c13	Track runtime build in license activation	2026-05-06 23:45:37 +01:00
rcourtman	df71bcdf09	Restore commercial monitored-system admission hook contract	2026-05-06 18:04:59 +01:00
rcourtman	b84fc2301a	Surface paid runtime mismatch in licensing	2026-05-06 17:18:35 +01:00
rcourtman	75e3cb76fd	Add structured Patrol investigation records	2026-05-06 16:31:51 +01:00
rcourtman	edae6d1edc	refactor: split alert config and callbacks Extract alert config types, normalization, and identity helpers into internal/alerts/config while preserving the existing alerts package API through aliases and wrappers. Move Manager callback lifecycle state into a same-package callbackBus, keeping public Set/Subscribe methods unchanged. Harden metrics SQLite artifacts to owner-only permissions and cover permissive umask behavior. Proof: go test -json ./internal/api -count=1; go test ./internal/alerts/... ./internal/monitoring ./internal/ai/... ./internal/websocket ./internal/config ./pkg/metrics; go test ./internal/alerts/... ./pkg/metrics	2026-05-06 13:01:32 +01:00
rcourtman	d6ca8b12e6	Add agentless availability targets Refs #1460	2026-05-06 10:35:34 +01:00
rcourtman	0895916283	Fix self-hosted startup web listener fail-fast Refs #1461	2026-05-06 09:16:54 +01:00
rcourtman	d7225a45a0	Fix Proxmox guest memory fallbacks Also fixes Ceph pool threshold resource identity. Refs #1341	2026-05-05 14:59:29 +01:00
rcourtman	81b31e4d3b	Remove monitored-system volume caps Retire runtime/API/UI monitored-system volume enforcement now that infrastructure monitoring is no longer capped. Keep only legacy metadata scrubbing and purchase-start compatibility for old max_monitored_systems references. Rename the remaining preview surface to monitored-system impact and make previews explanatory rather than save-blocking. Update subsystem contracts and RA7 evidence for the caps-retired invariant.	2026-05-05 12:59:59 +01:00
rcourtman	632f0af7f3	Keep uncapped continuity from writing raw caps	2026-05-05 09:33:44 +01:00
rcourtman	82a2494ffa	Add action execution safety contract	2026-05-04 23:19:58 +01:00
rcourtman	2040285085	Add action decision API	2026-05-04 22:56:55 +01:00
rcourtman	c436e1a2a2	Add CLI fleet connection reads	2026-05-04 08:40:34 +01:00
rcourtman	863f214c10	Add CLI action audit reads	2026-05-04 00:18:19 +01:00
rcourtman	f0bf88a89d	Add CLI action capability discovery	2026-05-04 00:10:15 +01:00
rcourtman	5fbe723ad9	Add CLI action planning adapter	2026-05-04 00:05:21 +01:00
rcourtman	db97478566	Reduce metrics rollup write amplification Refs #1124	2026-05-03 21:43:20 +01:00
rcourtman	82c54cc39b	Make self-hosted SSO Community-tier Treat OIDC, SAML, and multi-provider SSO as included Community capabilities while retaining advanced_sso as a compatibility key. Remove SAML-specific paywalls and paid-upgrade copy from runtime, settings UI, entitlement snapshots, docs, journey proof, and subsystem contracts. Refs #1449	2026-05-03 12:48:01 +01:00
rcourtman	a3617b923a	Fix remaining RC3 backend CI races	2026-05-01 22:03:22 +01:00
rcourtman	3146d83701	Count Ceph monitors from monitor arrays Refs #1290	2026-05-01 20:28:11 +01:00
rcourtman	575f432183	Make metrics writes idempotent for duplicate samples Refs #1442	2026-05-01 20:28:11 +01:00
rcourtman	1267a817c7	Gate cloud provisioning to hosted checkouts	2026-05-01 14:13:08 +01:00
rcourtman	af7d727d45	Gate RAID rebuild alerts on mdstat operation Parse the /proc/mdstat operation keyword for mdadm arrays and propagate it through host reports, models, unified resources, monitoring views, alert metadata, and AI storage summaries. Treat recovery and reshape as rebuild signals while silencing routine check and resync maintenance, with fallback rebuild detection only when no mdstat operation is available. Tests cover mdstat operation parsing plus recovery, check, and resync alert behavior. Fixes #1446	2026-04-30 14:31:14 +01:00
rcourtman	c7164c2906	Clarify Relay mobile handoff paid copy	2026-04-30 13:18:04 +01:00
rcourtman	99129d0c09	Retire product upgrade metrics runtime Remove local upgrade-metrics API registration, settings payload wiring, startup store migration, and backend conversion recorder hooks from the normal product runtime. Delete the retired conversion/funnel and metering packages from compiled licensing code, and extend diagnostics boundary audits and governance contracts so maintainer commercial analytics cannot return through Settings or diagnostics.	2026-04-30 12:24:22 +01:00
rcourtman	daf825dee6	Remove customer commercial analytics wrappers	2026-04-30 11:46:16 +01:00
rcourtman	48c8d26198	Add paid feature claim proof bundle	2026-04-29 14:18:43 +01:00
rcourtman	f060f261cd	Present Relay as annual-first support tier	2026-04-29 12:49:20 +01:00
rcourtman	0dd3cd804e	Hide MSP-only features from self-hosted Pro plans	2026-04-29 01:02:10 +01:00
rcourtman	5f0078b0d0	Keep synthetic modes out of entitlement payloads	2026-04-29 00:33:53 +01:00
rcourtman	08fef313eb	Rename hosted capacity marker copy	2026-04-29 00:07:18 +01:00
rcourtman	c0ef2d44f3	Keep compatibility-only features out of upgrade URLs	2026-04-28 23:22:20 +01:00
rcourtman	937696508c	Guard self-hosted feature metadata drift	2026-04-28 23:16:12 +01:00
rcourtman	a67845ada0	Retire self-hosted volume caps	2026-04-28 20:36:37 +01:00
rcourtman	c197f6a7a5	Move license test signers to testsupport	2026-04-28 19:12:21 +01:00
rcourtman	b29f398b9d	Fix release-mode licensing test expectations	2026-04-28 18:58:35 +01:00
rcourtman	1d189d3343	Clarify hosted entitlement signing compatibility	2026-04-28 18:47:19 +01:00
rcourtman	2b1d82d965	Retire self-hosted trial posture prompts	2026-04-28 17:39:09 +01:00
rcourtman	7cc980ad1d	Retire self-hosted trial signup control plane	2026-04-28 17:02:04 +01:00
rcourtman	ded190dcab	Retire hosted AI quickstart runtime	2026-04-28 16:11:27 +01:00
rcourtman	b1e179479d	Retire self-hosted AI quickstart surfaces	2026-04-28 15:49:18 +01:00
rcourtman	ecf8fd4299	Keep self-hosted Pro prompts opt-in	2026-04-28 11:23:49 +01:00
rcourtman	fab0e77800	Refine self-hosted Pro value copy	2026-04-28 09:56:03 +01:00

1 2 3 4 5 ...

363 commits