Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-05-19 07:54:10 +00:00

Author	SHA1	Message	Date
rcourtman	4cf16ec9cb	Stabilize summary chart SLOs	2026-05-12 14:40:55 +01:00
rcourtman	1726cf47b4	Harden Patrol and Assistant action boundaries	2026-05-12 12:06:27 +01:00
rcourtman	5ac0484e06	Drop install.sh-smoke push self-test (v5.1.30 fallback was unviable) Some checks are pending Build and Test / Secret Scan (push) Waiting to run Details Build and Test / Frontend & Backend (push) Waiting to run Details Commit `590818744` added a push trigger that re-ran the gate against v5.1.30 on every workflow edit, aiming to register the workflow for API dispatch and to validate it before the next release depended on it. The registration goal was achieved: workflow ID 275278570 is now active and the gate is dispatchable via `gh workflow run install-sh-smoke.yml` and the REST API. The self-test itself was unviable: v5.1.30 doesn't ship `install.sh.sshsig` (v5 didn't sign installers), so the signature-verify step 404s on every run. No current published release is a valid known- good smoke target — rc.5's `install.sh` has the wrong banner / agent installer (the regression this gate exists to catch), and rc.6 doesn't exist yet. The push-triggered run would fail forever, drowning real signals. Drop the push trigger. The workflow is registered, dispatch is verified working, and a dispatch against rc.5 just confirmed the gate correctly fires the banner check ("install.sh banner is not the Pulse server installer") against the broken release. The first time the gate's container portion runs end-to-end will be on rc.6 through the create-release.yml workflow_call. Make the resolve-inputs step a hard fail if tag or version is empty so any future regression that drops inputs surfaces explicitly instead of running against a silent fallback.	2026-05-12 11:55:03 +01:00
rcourtman	5908187445	Self-test install.sh smoke gate on every workflow edit against v5.1.30 Commit `7c0f65425` wired install-sh-smoke.yml into create-release.yml but the workflow has never actually executed — the pre-install structural checks were validated locally against rc.5, but the privileged systemd container portion is unproven on GitHub's cgroup-v2 runners. The first real release through the pipeline would be its trial run, and a bug at the container layer would block the release. Add a push-event self-test that re-runs the full gate against v5.1.30 (a known-good release with the same server-installer banner, the same --version arg handler, and the same ed25519 signing key as v6 RCs) whenever this workflow file changes on pulse/v6-release or main. This both validates the gate continuously and registers the workflow with GitHub's actions/workflows API so it becomes dispatchable via gh CLI and the REST endpoint — workflows on non-default branches with only workflow_call + workflow_dispatch never appear in the API until they have been triggered by a non-dispatch event. Replace direct `${{ inputs.* }}` references with a single resolve step that falls back to v5.1.30 / 5.1.30 / github.repository when no inputs are supplied (push trigger). Drop the now-redundant Resolve release repository step. Behavior under workflow_call from create-release.yml is unchanged: the create-release-supplied tag/version/repository win.	2026-05-12 11:50:19 +01:00
rcourtman	7c0f654253	Wire install.sh smoke gate into create-release.yml release pipeline The smoke gate workflow exists from commit `065ebdb27` but until it is called from create-release.yml it does not actually protect any release. That is exactly the regression class that let rc.1 → rc.5 ship with a broken install.sh: nothing in the release pipeline exercised the documented secure-install flow against the published GitHub Release URL. Wire install-sh-smoke.yml as a downstream workflow_call after validate_release_assets succeeds. Gated on historical_asset_backfill_only != 'true' since asset-backfill flows re-upload to an already-published release and the smoke would just re-confirm what hasn't changed. Pre-install structural checks were verified locally against rc.5 — the gate correctly fires the banner / agent-banner / --version handler assertions against the broken release. The end-to-end container portion (privileged systemd boot, install.sh execution, /api/health, /api/version match) will run for the first time on the next release that publishes through this workflow; existing retry loops on systemd readiness, service activation, and health endpoint absorb transient runner flakes. Add install-sh-smoke.yml to the deployment-installability canonical files and to the release-promotion proof policy's match_files, and add scripts/installtests/build_release_assets_test.go to that policy's exact_files (matching the existing pin set for related policies in the deployment-installability subsystem). Update subsystem_lookup_test.py fixtures that pinned the exact_files list literally. Pinned the create-release.yml wiring in build_release_assets_test.go alongside the validate-release-assets wiring so the smoke step cannot silently be unwired. Document the gate's contract responsibilities in deployment-installability Extension Point 2.	2026-05-12 11:44:04 +01:00
rcourtman	065ebdb276	Add install.sh end-to-end smoke gate against published release Across v6 rc.1 → rc.5 the published install.sh asset was the agent installer rather than the server installer, and the README's pinned ed25519 key did not verify what the pipeline actually signed. The first broke `bash install.sh --version` and the in-product Update button; the second silently failed the README's secure-install ssh-keygen step. Neither was caught by CI because every existing gate operated on the local release/ build, the Docker image, or the helm chart — nothing exercised the documented LXC/systemd install commands against the published release URL. scripts/validate-release.sh now catches asset-identity drift at build time. This workflow catches the rest of the regression class — anything that breaks the actual install at runtime — by running the documented flow end-to-end against the published release. What it does: - Downloads install.sh, install.sh.sshsig, and the linux-amd64 tarball from releases/download/<tag>/. - Extracts the README's pinned pulse-installer ed25519 key and runs the exact ssh-keygen -Y verify command from the README's secure-install snippet against the downloaded asset. - Re-checks the server-installer banner, the --version) arg handler, and the absence of the agent banner — same pins as validate-release.sh, but now against what GitHub is actually serving (not just what was built locally). - Boots jrei/systemd-debian:12 privileged, runs `bash install.sh --archive <tarball> --disable-auto-updates` from inside, waits for systemd pulse.service to become active, hits /api/health, and asserts /api/version reports the expected version. --archive mode is used rather than --version so the workflow doesn't depend on install.sh's self-refetch loop (the re-fetched bytes are the ones we already validated). Auto-updates are disabled to avoid the timer unit doing anything during the smoke run. Triggers are workflow_dispatch + workflow_call only. Wire it into create-release.yml after the next RC validates it green. Pinned in build_release_assets_test.go so silent deletion or weakening of any critical assertion (signature verify, banner check, /api/health hit, version match) trips the test.	2026-05-12 11:25:46 +01:00
rcourtman	b69c8c8007	Wire PULSE_RELAY_ENABLED and PULSE_RELAY_SERVER as real env overrides These two env vars were documented as relay overrides in v6 docs since March 18 (CONFIGURATION.md, RELAY.md, and the frontend-served doc copy) but no code ever read them. Operators trying to bootstrap relay headlessly saw no effect. Implement them rather than remove the documentation. Headless and container deployments now have a real path to enable relay and point it at a private endpoint without going through Settings → Relay. internal/relay/config_env.go: - ApplyEnvOverrides(*Config) mutates relay.Config in place. - PULSE_RELAY_ENABLED accepts true/false/yes/no/1/0/on/off (case- insensitive). Unrecognized values log a warning and leave the file value untouched — important so "unset" reads differently from "explicit false." - PULSE_RELAY_SERVER goes through the existing validateRelayServerURL check; invalid URLs log a warning and fall through. internal/config/persistence_relay.go: LoadRelayConfig calls ApplyEnvOverrides after the file load and after the default-fallback when relay.enc is absent, so the env override applies on every load. Tests cover unset / true / false / garbage-bool / valid-URL / invalid-URL / both-together / nil-config paths in the relay package, plus two end-to-end tests in internal/config that prove the override flows through LoadRelayConfig against a real persisted file and against the missing-file default branch. Restore the env-var docs with the correct default URL (the full wss://relay.pulserelay.pro/ws/instance, not the bare hostname the original aspirational table claimed) and add an explicit precedence note: saving from the UI after an env override persists the env-effective state to disk, so clearing the env alone does not revert. Add internal/relay/config_env_test.go to the relay-runtime registry's desktop-relay-runtime exact_files so the new code surface is proof-tracked. Update the matching pin in subsystem_lookup_test.py. Extend the relay-runtime contract Extension Point 3 to document the override semantics LoadRelayConfig must satisfy.	2026-05-12 11:18:31 +01:00
rcourtman	7cc90a8969	Remove fictional PULSE_RELAY_ENABLED / PULSE_RELAY_SERVER env overrides CONFIGURATION.md and its frontend-served copy advertised an "Environment Overrides" table for relay with two env vars, but neither has ever existed in code. git log -S "PULSE_RELAY_ENABLED" -- 'internal/*.go' is empty; relay config is entirely file-driven (relay.enc) and configured from Settings → Relay. The documented default "relay.pulserelay.pro" was also incorrect — the actual code default is the full ws URL "wss://relay.pulserelay.pro/ws/instance". Replace the misleading table with a clarifying paragraph that states the truth: relay has no env-var overrides, it's UI-configured, and the actual default server URL. Operators trying to deploy headlessly with the documented env vars would silently get the default config because the runtime never reads them; better to say so explicitly. Found during a sweep of all 29 PULSE_ env vars documented in CONFIGURATION.md against the codebase. These two were the only doc-only drift; the other 27 all have real production references.	2026-05-12 11:05:07 +01:00
rcourtman	3b57c880ba	Fix manual systemd install snippet binary path The "Manual systemd install (advanced)" example in docs/INSTALL.md told users to run `sudo install -m 0755 pulse /usr/local/bin/pulse`, but the release tarballs extract to ./bin/pulse, not ./pulse. Following the documented command literally produced "install: cannot stat 'pulse'". Show the download/extract commands explicitly so users know the tarball shape, and correct the install path to `bin/pulse`. validate-release.sh already pins the ./bin/pulse tarball layout, so this is the matching documentation side of that contract.	2026-05-12 10:51:03 +01:00
rcourtman	ce7d7c1956	Fix stale README signature key and guard against future drift The README's secure-install snippet has pinned the wrong ed25519 key since commit `a60fa03d7` (April 22, 2026), so v6 rc.2 through rc.5 all shipped with a documented verification step that does not work. I downloaded the published rc.5 install.sh + install.sh.sshsig and ran ssh-keygen -Y verify with both candidate keys: Ds21c5... (README's pinned key) -> Could not verify signature MZd/DaH... (key embedded in install.sh and pulse-auto-update.sh) -> OK Customers who actually followed the README's secure-install path saw "Could not verify signature" and aborted. Most users curl-pipe the script unverified so the drift went unreported. Replace the stale key in README.md and docs/INSTALL.md with the actual pipeline signing key (MZd/...). Add a validate-release.sh smoke that extracts the README's pinned key and runs the exact ssh-keygen -Y verify command against the signed install.sh.sshsig. Any future drift between documented key and actual signing key fails the release before publish. Lock both the correct-key presence and the stale-key absence in build_release_assets_test.go for README and docs/INSTALL.md so a manual edit cannot regress the docs back to the broken state.	2026-05-12 10:30:42 +01:00
rcourtman	49412357af	Ship the Pulse server install.sh as the GitHub Release asset Every v6 RC (rc.1 through rc.5, ~30 days) shipped the wrong install.sh. build-release.sh was copying the rendered AGENT installer into release/install.sh, but adapter_installsh, scripts/pulse-auto-update.sh, the root install.sh's own --rc/--stable/--version flows, and the README quickstart all fetch that asset and run `bash install.sh --version vX.Y.Z`. Since the agent installer rejects --version with "Unknown argument", the LXC quickstart, the in-product "Update Pulse" button on systemd/proxmoxve deployments, and the pulse-auto-update.sh systemd timer were all broken for every RC. Fix the build-release.sh copy to publish the root server installer. The agent installer continues to ship inside tarballs at ./scripts/install.sh and inside Docker images at /opt/pulse/scripts/install.sh, and is served at the running Pulse server's /install.sh endpoint — none of those paths change. Only the top-level GitHub Releases asset moves from agent to server. Update build_release_assets_test.go to lock in the new publishing rule and ban the reverse drift, replacing the March 18 "legacy root install.sh" guard that was the original mistake. Add a validate-release.sh smoke that catches the regression mode this hid: the published install.sh must have the Pulse server banner, the --version) arg handler, must not contain the agent banner, and `bash install.sh --help` must print the server installer's version-pinning help line. These checks run as part of validate-release-assets.yml against the post-publish asset bundle so a future swap back cannot slip through. Document the asset identity rule and the validate-release.sh guard in the deployment-installability contract so any future change to the publishing pipeline has to update the contract or trip the shape guard.	2026-05-12 10:24:28 +01:00
rcourtman	89379c4b5c	Use effectiveLoadP95Budget for metrics-history load test CI variance	2026-05-12 09:19:58 +01:00
rcourtman	4ccbbf4267	Lazy-load 4 modals behind Show gates to slim index bundle further	2026-05-12 09:16:00 +01:00
rcourtman	da7969fb48	Drop pulse-agent attestation expectations from TestReleaseWorkflowsUseSecretSafeAttestedImageBuilds Some checks are pending Build and Test / Secret Scan (push) Waiting to run Details Build and Test / Frontend & Backend (push) Waiting to run Details	2026-05-12 02:01:45 +01:00
rcourtman	a5d8b43088	Let assertJSONSnapshot exclude dynamic top-level fields for patrol_preflight	2026-05-12 01:38:59 +01:00
rcourtman	ec72977d3e	Skip runtime-defaults raw-node TS imports when integration node_modules absent	2026-05-12 01:14:41 +01:00
rcourtman	8e49e68393	Pre-check integration node_modules before root-playwright wrapper assertion	2026-05-12 01:05:54 +01:00
rcourtman	216b3cae38	Skip root-playwright wrapper check on any tsx/playwright eval failure	2026-05-12 00:57:24 +01:00
rcourtman	86f9159bae	Skip root-playwright wrapper check when @playwright/test isn't installed	2026-05-12 00:49:21 +01:00
rcourtman	659018ed28	Skip acceptance-doc wording-pin when the working-draft doc isn't checked out	2026-05-12 00:40:05 +01:00
rcourtman	96660e7586	Switch script-reference integrity test from rg to git grep for portable CI	2026-05-12 00:30:43 +01:00
rcourtman	5fd05efa83	Add connection-degraded alert for wedged platform connections A Proxmox host wedged on a ZFS deadlock yesterday took the cluster API poll with it (context deadline exceeded). The unified connections aggregator flipped the Connection from active to stale to unreachable, and the Settings / Infrastructure page rendered the right badges, but no top-nav alert ever fired because nothing was actively notifying off that derived state. Patrol's deterministic triage flagged it every minute, but its LLM investigation stage has been broken since 2026-02-26 so flags never escalated into user-visible findings. Result: a 3 hour outage I only noticed because I happened to open Settings. This wires an active notification off the same connection state the Settings badges already use: - internal/alerts/connection.go: new CheckConnection + clearConnectionDegradedAlert that fire connection-degraded after three consecutive stale or unreachable observations. Severity scales: stale warning, unreachable / unauthorized critical. Clear runs through the same recovery-confirmation gate as clearNodeOfflineAlert so a single flap back to active doesn't silently resolve a real outage. Paused, disabled, and non-platform connections are no-ops. - internal/api/connections_alerts.go: snapshot translator that turns api.Connection into the narrow alerts.ConnectionSnapshot view. Keeping the snapshot type inside the alerts package preserves the existing api -> monitoring import direction; the monitor would have cycled if it called back into api directly. - internal/monitoring: new SetConnectionsSnapshotLister hook + a per-tick checkConnectionAlerts call in the main poll loop, alongside the existing evaluate*Agents passes. - internal/api/router.go: register the lister closure on r.monitor so the alerts loop sees the same Connection rows the HTTP handler does. - internal/alerts/specs/types.go: add "connection" to the migration bridge list of accepted ResourceTypes, alongside node / docker-host / proxmox-disk / etc. The connection concept doesn't have a canonical unified resource type yet; this matches the existing pattern for alert-keyed resources that aren't first-class canonical. Test coverage in internal/alerts/connection_test.go covers active never fires, three stale observations escalate from pending to warning, unreachable escalates warning to critical, unauthorized fires critical cold, paused / disabled / agent never fire, recovery confirmation gate, and a stale flap during recovery resets the gate. TestResourceAlertSpecValidateAllowsConnectionMigrationBridgeType mirrors the existing migration-bridge proof tests for the new type.	2026-05-12 00:29:04 +01:00
rcourtman	3323cba053	Stop install-mcp scripts from linking to GitHub blob/main docs	2026-05-11 23:58:45 +01:00
rcourtman	5e2dfe48c7	Lazy-load AIChat to cut frontend entry bundle by 29 percent	2026-05-11 23:58:12 +01:00
rcourtman	62cd23b97b	Refresh frontend bundle-size baseline for post-rc.5 chunk shape	2026-05-11 23:48:00 +01:00
rcourtman	edda19aadf	Catch up frontend tests with collapsed Patrol details and refactored FindingsPanel	2026-05-11 23:37:21 +01:00
rcourtman	2725a325f2	Consolidate AgentIntegrationsPanel doc links into shipped AGENT_SUBSTRATE.md	2026-05-11 23:25:55 +01:00
rcourtman	e1a6dff2a1	Document Pulse Cloud launch in v6 release notes	2026-05-11 23:18:05 +01:00
rcourtman	16963e415c	Drop t.Parallel from dismiss/snooze finding tests that race on global session store	2026-05-11 23:10:35 +01:00
rcourtman	ff65551a1a	Fix expired agent_preflight test fixture by using now-relative claim window	2026-05-11 22:57:57 +01:00
rcourtman	7951da526b	Add release_cycle_artifact_globs so RC ceremony skips contract-update requirement	2026-05-11 22:55:29 +01:00
rcourtman	9a20bbd0b2	Derive RC packet paths from VERSION + glob to eliminate per-RC test churn	2026-05-11 22:36:03 +01:00
rcourtman	816b3985ba	Make blocked-record drift self-fixable via BLESS_GOVERNANCE_FIXTURES env var	2026-05-11 22:31:03 +01:00
rcourtman	41dd867037	Stop alerts manager in TestPollCephClusterChecksPoolStorageThresholds to fix TempDir cleanup flake	2026-05-11 22:30:30 +01:00
rcourtman	920e88ede9	Schedule weekly release-dry-run watchdog against pulse/v6-release Some checks are pending Build and Test / Secret Scan (push) Waiting to run Details Build and Test / Frontend & Backend (push) Waiting to run Details	2026-05-11 22:21:59 +01:00
rcourtman	d38f3d9217	Update release-control fixtures after pulse-agent Docker removal	2026-05-11 22:21:31 +01:00
rcourtman	3b2eef4984	Run build-and-test on pulse/v6-release commits to catch fixture/contract drift continuously	2026-05-11 22:19:45 +01:00
rcourtman	e69da0069f	Stop pushing pulse-agent Docker image; ship agent only as release-asset binaries	2026-05-11 22:19:09 +01:00
rcourtman	e32db04543	Recalibrate CI 500-node load floor after rc.5 operator-state and agent-substrate plumbing	2026-05-11 19:07:44 +01:00
rcourtman	8ff69daa43	Bump install pins to rc.5 and refresh test fixtures for Patrol readiness + Unraid host profile tokens	2026-05-11 18:02:52 +01:00
rcourtman	894ea89af9	Refresh RC5 packet validation range for plain-JSON tool-call sanitisation	2026-05-11 17:09:57 +01:00
rcourtman	e36945741e	Sanitise plain-JSON tool-call leaks from weak local models Small Ollama models (qwen2.5:11b, qwen2.5:14b, similar) frequently emit Pulse tool invocations as plain JSON inside content instead of routing through the structured tool_calls channel. Users saw raw payloads like `{"name": "pulse_query", "parameters": {...}}` as the assistant's final response. Extend cleanToolCallArtifacts and containsToolCallMarker with a new pass that detects this leak shape, gated on a closed allowlist of canonical tool names sourced from the runtime registry. The allowlist auto-syncs when registerTools() gains a tool, so no separate hand-maintained list. Anchored on (?:^\|\n) and a leading `"name"` key so prose containing JSON fragments, unrelated objects (`{"foo":"bar"}`), or named resources (`{"name":"my-vm","cpu":50}`) are left untouched.	2026-05-11 17:02:07 +01:00
rcourtman	366bf8d127	Prepare v6.0.0-rc.5 release packet	2026-05-11 16:52:31 +01:00
rcourtman	52416cec6f	Reword cluster deploy banner from "unmonitored" to "ready for Pulse Agent" Cluster peers without a Pulse Unified Agent are still monitored via the cluster's PVE API token — the agent adds richer telemetry (hardware, sensors, OS metadata) but isn't the only source of monitoring. The previous "N node(s) unmonitored" copy misrepresented the state. Reframes the banner as an action opportunity, matching the adjacent "Review & Deploy" CTA. Gate logic unchanged.	2026-05-11 16:02:53 +01:00
rcourtman	07d73843f0	Hide cluster deploy banner for offline PVE nodes The "N nodes unmonitored" banner gated purely on absence of a Pulse Unified Agent (r.agent?.agentId). Offline cluster members (status 'offline') were counted as deploy candidates, which is wrong on two fronts: those nodes are typically still covered by the cluster's PVE API token (so they aren't truly unmonitored, just unreachable), and an offline host is precisely the case where deploying an agent cannot succeed. Exclude status === 'offline' from the unmonitored count.	2026-05-11 15:59:43 +01:00
rcourtman	9329258f8b	Correct the DeepSeek tool_choice coercion rationale The previous comment claimed DeepSeek's API aliases v4-flash/v4-pro to deepseek-reasoner, justifying the auto coercion via the legacy reasoner's known 400 behavior. That had the alias direction inverted: per DeepSeek's pricing page, deepseek-chat and deepseek-reasoner are deprecated aliases for v4-flash's non-thinking and thinking modes respectively, not the other way around. The coercion itself is empirically correct, though. Live preflight against deepseek-v4-flash with tool_choice=required produces a deterministic HTTP 400 ("provider rejected forced tool selection") in 275ms. The behavior is consistent across the DeepSeek user community - multiple downstream projects (pydantic-ai #5193, claude-code-router #1378, opencode #24190, others) confirm V4 models reject forced tool selection despite DeepSeek's chat-completion docs listing required as a supported value. Server reality disagrees with documentation. This commit updates only the comments in openai.go and openai_test.go to point at the empirical evidence and the community confirmation. The coercion behavior is unchanged.	2026-05-11 14:46:08 +01:00
rcourtman	3031c218cb	Surface preflight diagnosis on Patrol readiness banner The readiness banner previously rendered only the summary string ("Provider connection issue (last preflight 2h ago)"), even though the underlying payload already carried the provider, model, preflight duration, tool-call observation, and recommendation. Operators had to open dev tools to find out which provider failed and what to check. Now the banner inlines: - Provider and model that preflight tried to reach - Preflight duration and whether a tool call was observed - The preflight recommendation text The state hook gains a patrolPreflight memo that reads from the same AISettings payload the rest of the patrol surface already consumes.	2026-05-11 14:31:19 +01:00
rcourtman	2a9afb1112	Sanitise double-pipe DeepSeek DSML tool-call markers in chat Found by exercising pulse_summarize in real chat: the user asked a question, the model called the tool successfully (response came back with narrative_source: ai), but the chat panel ended with raw DSML text and "Assistant response is ready" — no actual prose answer. Root cause was in agentic_sanitize.go. The fast-path string list only checked single-pipe DSML variants ("<｜DSML｜...>"), but deepseek-v4-flash emits the double-pipe form ("<｜｜DSML｜｜...>"). The opening sequence didn't match any marker, so cleanToolCallArtifacts returned the content unchanged. The chat orchestrator then showed the raw DSML to the user as if it were the assistant's answer. Fix: - Add double-pipe variants (Unicode and ASCII) to the fast-path marker list. Six new entries, mirroring the existing single-pipe entries. - Add a backstop regex (dsmlRe) that matches any pipe-count variant via `</?[\\|｜]+/?DSML[\\|｜]*`. Future model behaviour with triple or higher pipe counts gets caught without another fast-path edit. - containsToolCallMarker gets the same coverage so streaming detection stops forwarding content the moment the marker appears, regardless of pipe count. Tests in agentic_sanitize_test.go gain three new cases for cleanup (double-pipe Unicode, double-pipe ASCII, triple-pipe regex backstop) and two for detection (double-pipe Unicode, double-pipe ASCII). All passing. Process note: this bug was invisible to existing tests because the test suite only covered the single-pipe variants that were documented in the marker list. The double-pipe form only appears when an actual model emits it. Same pattern as the JSON-casing fix earlier — exercise the surface against a real LLM, find what tests in isolation can't see.	2026-05-11 13:42:11 +01:00
rcourtman	e22113230a	Purge resolved legacy alert-mirror findings on load The previous rip retired only active "Active alert detected" findings from the now-removed detectAlertSignals -> SignalActiveAlert emitter. Resolved instances were left in place, polluting the Resolved tab and inflating the regressed total on the trust strip (a stale 8 resources each marked "regressed 3x" with descriptions like "Active warning alert: Container 'ollama' is powered off"). They have no canonical operator value -- the Alerts surface is the source of truth for currently-firing alerts -- so on load we now purge them entirely rather than keeping them around as Resolved noise. Active mirrors are still retired (auto-resolved with a clear reason) so operators see why the finding closed; resolved mirrors disappear silently because they were already in the terminal state. Idempotent. Extends TestFindingsStore_SetPersistence_RetiresLegacyAlertMirrorFindings with a fixture for the resolved-mirror case and asserts both the in-memory purge and the persisted state no longer carries it.	2026-05-11 11:40:07 +01:00
rcourtman	c3319b6304	Make Narrative / FleetOutlier JSON shape consistent with prompt schema Found by exercising pulse_summarize in a real chat session. The chat-tool response surfaced: "observations": [{"Text": "...", "Severity": "info"}] The AI narrator's system prompt (report_narrator.go) tells the model to emit lowercase keys: {"text": "...", "severity": "..."} The model was being taught one schema and shown a different one in the tool response for the same shape. NarrativeBullet, FleetOutlier, Narrative, and FleetNarrative had no JSON tags, so embedded struct fields serialised with their Go names. Add struct tags so the wire shape matches the prompt schema. Pure marshaling change — JSON tags don't affect Go field access, so PDF rendering (which reads fields directly) is unchanged. Tests in narrative_json_test.go pin the shape so the inconsistency can't reappear silently. Process note: this is a class of bug that only appears when an LLM actually consumes the output. No unit test caught it; no review of the code showed it; the model running through the chat path is what surfaced it. Another argument for "exercise the artifact" — even the tool surface that looks correct in isolation has hidden inconsistencies you only see when something external reads it.	2026-05-11 11:29:35 +01:00

1 2 3 4 5 ...

6062 commits