Commit graph

2212 commits

Author SHA1 Message Date
rcourtman
bb77f42868 Harden Workloads refresh stability
Prevent Workloads connection-ledger refreshes from tripping the app-level loading fallback, and suppress stale or duplicate auto-registration success notifications.

Add regression coverage for both refresh stability and replayed lifecycle events.
2026-05-14 09:35:02 +01:00
rcourtman
8554754f3e Harden managed dev backend recovery
Fix the local hot-dev backend monitor so a missing Pulse process is counted safely under pipefail, keep backend launch stderr in the debug log, and govern the new runtime helper with focused smoke coverage.
2026-05-14 09:19:12 +01:00
rcourtman
546f805823 Harden infrastructure ledger posture tests
Add pure model coverage for passive agent fingerprint handshakes and source-manager coverage for actionable member posture counts.
2026-05-14 09:03:02 +01:00
rcourtman
a798289a6a Clean up infrastructure ledger posture
Keep cluster source rows focused on source health and hide passive attached-agent config fingerprint handshakes from visible attention counts.
2026-05-14 00:46:43 +01:00
rcourtman
321f563a52 Surface blocked workload inventory sources
Show workload-capable source failures on Workloads and keep matching Proxmox host agents attached to their API source when inventory collection is blocked.
2026-05-14 00:22:18 +01:00
rcourtman
649a601dca Hide resource privacy settings sidebar entry
Keep the resource privacy policy route available for direct governance proof, but remove it from the normal Settings sidebar while it remains an informational read-only surface. Clarify the direct route copy and empty state so it does not appear as a broken settings page.
2026-05-13 23:52:06 +01:00
rcourtman
2be14562ee Preserve infrastructure continuity on first login
Ensure unified resource snapshots include recent standalone host-agent continuity so Infrastructure does not briefly undercount connected systems after login or restart.
2026-05-13 23:36:17 +01:00
rcourtman
cbe595572b Improve action audit refusal presentation
Present stable refusal prefixes through shared frontend helpers and show verification outcomes in resource action history without exposing raw refusal tokens. Extend action audit tests, drawer coverage, and VMware history spec evidence.
2026-05-13 23:03:12 +01:00
rcourtman
0d2f192f78 Align fleet command policy contract docs 2026-05-13 22:26:33 +01:00
rcourtman
bc307b01a8 Align action audit redaction contract docs 2026-05-13 21:59:25 +01:00
rcourtman
ceb9b87cfb Correct fleet config drift truth 2026-05-13 20:38:26 +01:00
rcourtman
17253d27fd Surface connection rollout posture 2026-05-13 20:16:35 +01:00
rcourtman
427b7ff5c6 fix: route patrol update safety through read state 2026-05-13 19:24:25 +01:00
rcourtman
acacb0be9c Fix unified resources contract numbering 2026-05-13 19:14:23 +01:00
rcourtman
adc46e2d0a Harden action audit verification redaction 2026-05-13 19:10:17 +01:00
rcourtman
e8b3c7fcf7 Restore remote config signature compatibility
Keep desired config fingerprints as response metadata derived from the signed command and settings payload.

Use merged agent profile settings when building remote config fingerprints.
2026-05-13 19:00:02 +01:00
rcourtman
554158c575 Add desired config fingerprint metadata 2026-05-13 18:51:24 +01:00
rcourtman
53ebbf97f8 Harden recovery activity timeline ranges 2026-05-13 18:32:25 +01:00
rcourtman
d55888fb7f Correct PBS job health evidence boundaries 2026-05-13 17:05:03 +01:00
rcourtman
fdcbf0f2d3 Align first-session onboarding handoff 2026-05-13 16:18:50 +01:00
rcourtman
1a47c03b2b Fix auto-register refresh notifications 2026-05-13 13:59:11 +01:00
rcourtman
5513781193 Ensure cloud emails use support reply-to 2026-05-13 12:10:02 +01:00
rcourtman
b2e437b198 Separate Patrol finding controls from disclosure 2026-05-12 22:52:36 +01:00
rcourtman
b213b59f7a Prioritize Patrol findings in Assistant handoffs 2026-05-12 22:46:53 +01:00
rcourtman
1e33d3089c Bound Patrol refresh state 2026-05-12 22:35:50 +01:00
rcourtman
bb04bf9042 Keep Patrol findings visible during refresh 2026-05-12 22:13:19 +01:00
rcourtman
5465852be0 Use Patrol source loading for findings panel 2026-05-12 22:06:09 +01:00
rcourtman
3cbd49cb54 Align Patrol coverage caveats with full runs 2026-05-12 21:33:44 +01:00
rcourtman
5b18438528 Use checked wording for limited Patrol recency 2026-05-12 17:53:53 +01:00
rcourtman
80c061cd45 Separate Patrol runtime issues from findings copy 2026-05-12 17:43:28 +01:00
rcourtman
450e7516f0 Keep Patrol findings from showing all-clear copy 2026-05-12 17:34:30 +01:00
rcourtman
22a94f47d9 Skip release publish downstreams for drafts 2026-05-12 17:32:11 +01:00
rcourtman
8b0f3564f6 Fail closed on stale API action plans 2026-05-12 17:32:11 +01:00
rcourtman
3566a4d61d Drive promote-floating-tags via workflow_call from create-release
promote-floating-tags.yml's `workflow_run` chain off publish-docker.yml
silently stopped firing for rc.3 → rc.5 because publish-docker failed at
the now-removed pulse-agent push step. Customers pulling
rcourtman/pulse:latest, :6, or :6.0 stayed on whatever the previous
successful release had tagged — there was no warning anywhere that the
floating tags were stale.

Same fix pattern as install-sh-smoke (commit 7c0f65425) and
publish-helm-chart (commit 14c79a28e): add a workflow_call trigger to
promote-floating-tags.yml and call it explicitly from create-release.yml
after validate_release_assets succeeds.

Gating on validate_release_assets is intentional: that workflow waits
for the docker image to be pullable from the registry (with retry
backoff), so by the time it succeeds the image manifest exists and
re-tagging it to latest/major/minor cannot point at vapor.

The legacy workflow_run trigger stays as the primary path; this just
guarantees promotion even when the chain doesn't fire.

Tag-resolver step now accepts inputs from workflow_call / workflow_dispatch
and only falls back to the workflow_run derivation when inputs are absent,
so all three entry paths converge on the same identity.

Pinned in build_release_assets_test.go:
- new TestPromoteFloatingTagsReachableViaWorkflowCall pins the trigger
  declaration and the input-priority resolver
- existing TestCreateReleaseUploadsPowerShellInstaller extended to pin
  the promote_floating_tags job wiring (uses, tag, prerelease)

Contract delta in deployment-installability.md Extension Point 7
documents the same explicit-workflow_call requirement that applies to
publish-helm-chart, extended to promote-floating-tags.
2026-05-12 16:47:51 +01:00
rcourtman
14c79a28e7 Trigger publish-helm-chart via workflow_call from create-release
v6 rc.1 → rc.5 published successfully but the Helm chart never landed on
rcourtman.github.io/Pulse/index.yaml — the index still ends at v5.1.30.
`helm install pulse pulse/pulse --version 6.0.0-rc.5` returns
chart-not-found; without `--version` helm pulls the latest published
chart (v5.1.30) into a customer's v6 cluster.

Root cause: GitHub does not fire `release: published` for releases that
were created as drafts and later PATCHed to draft=false. create-release.yml
deliberately uses that path so it can upload assets and run
validate-release-assets against the draft before promoting. Inspection of
the workflow run history confirms: every gh-API `release: published` event
since 2026-03-02 has been from manually-dispatched v5 stable cuts; zero
fired for v6 RCs published through the create-release pipeline.

Fix the same way install-sh-smoke was wired in commit 7c0f65425: add a
`workflow_call` trigger to publish-helm-chart.yml and call it explicitly
from create-release.yml as a downstream of validate_release_assets. The
chart-version resolver in publish-helm-chart now accepts inputs from
either workflow_call or workflow_dispatch and only falls back to the
release-event tag when no inputs are present, keeping the legacy
release-event path working for forks / manual gh-CLI publishes that
create with draft=false from the start.

Pinned in build_release_assets_test.go:
- create-release.yml wiring (publish_helm_chart job, version inputs)
- publish-helm-chart.yml workflow_call trigger declaration
- chart-version resolver's input-priority logic

Contract delta in deployment-installability.md Extension Point 7
documents the workflow_call requirement and forbids relying on the
release-published webhook for the create-release.yml draft-promotion
path.

The fix takes effect on the next release through the pipeline. Backfill
of the v6.0.0-rc.5 chart needs a one-time manual dispatch of
publish-helm-chart.yml against chart_version=6.0.0-rc.5.
2026-05-12 16:30:53 +01:00
rcourtman
c6d5c4590a Keep agent heartbeats stream local 2026-05-12 16:14:28 +01:00
rcourtman
ab62b46c1f Fix helm chart agent.enabled by routing through main pulse image
The chart's agent.image.repository defaulted to ghcr.io/rcourtman/pulse-agent,
an image that has never been published. publish-docker.yml only pushes
rcourtman/pulse; the Dockerfile defines an agent_runtime stage that
*could* be published but it isn't, and commit da7969fb4 from earlier in
this session removed the corresponding pulse-agent attestation
expectations — a clear signal the separate agent image was intentionally
dropped without updating the chart. Customers running
`helm install pulse pulse/pulse --set agent.enabled=true` were silently
hitting ImagePullBackOff on the agent DaemonSet.

Route the chart through the main rcourtman/pulse image instead. To make
that work without per-arch chart overrides, the runtime stage in the
Dockerfile now creates an arch-resolved /usr/local/bin/pulse-agent
symlink to the right /opt/pulse/bin/pulse-agent-linux-{amd64,arm64,armv7}
binary. The chart's agent.command default is /usr/local/bin/pulse-agent,
which overrides the server ENTRYPOINT and runs the pod as a unified
agent on whichever arch the node provides. agent.yaml renders the
command via toYaml so list values pass through cleanly.

KUBERNETES.md's DaemonSet example switches from the arch-hardcoded
/opt/pulse/bin/pulse-agent-linux-amd64 to the new arch-resolved path,
restoring multi-arch portability of the docs example.
validate-release.sh asserts the symlink exists, points at one of the
three supported Linux arch binaries, and is executable in the published
image. A new TestHelmAgentRuntimePointsAtRealImage pins the chart
defaults, the template wiring, the Dockerfile symlink, and the
validate-release.sh guard so the regression class can't quietly
resurface.

Governance: extend the helm-chart-release-runtime verification policy's
exact_files to include scripts/installtests/build_release_assets_test.go
(matching its existing pin set for related deployment-installability
policies); update the subsystem_lookup_test.py fixture that pins the
exact_files list; document the agent-image and pulse-agent symlink
contract in deployment-installability.md Extension Point 7.

Verified locally: `helm lint` passes; `helm template --set agent.enabled=true`
renders a DaemonSet with image rcourtman/pulse:6.0.0,
command ["/usr/local/bin/pulse-agent"], args ["--enable-docker", "--enable-host=false"].
End-to-end image build + agent DaemonSet smoke will run via helm_smoke
on the next release once rcourtman/pulse:6.0.0 is published.
2026-05-12 16:11:56 +01:00
rcourtman
0b98cded45 Bulk count agent fleet approvals 2026-05-12 16:00:31 +01:00
rcourtman
f16aa8a65c Parse stored subscription states for entitlement refresh 2026-05-12 15:17:53 +01:00
rcourtman
4cf16ec9cb Stabilize summary chart SLOs 2026-05-12 14:40:55 +01:00
rcourtman
1726cf47b4 Harden Patrol and Assistant action boundaries 2026-05-12 12:06:27 +01:00
rcourtman
7c0f654253 Wire install.sh smoke gate into create-release.yml release pipeline
The smoke gate workflow exists from commit 065ebdb27 but until it is
called from create-release.yml it does not actually protect any release.
That is exactly the regression class that let rc.1 → rc.5 ship with a
broken install.sh: nothing in the release pipeline exercised the
documented secure-install flow against the published GitHub Release URL.

Wire install-sh-smoke.yml as a downstream workflow_call after
validate_release_assets succeeds. Gated on
historical_asset_backfill_only != 'true' since asset-backfill flows
re-upload to an already-published release and the smoke would just
re-confirm what hasn't changed.

Pre-install structural checks were verified locally against rc.5 — the
gate correctly fires the banner / agent-banner / --version handler
assertions against the broken release. The end-to-end container portion
(privileged systemd boot, install.sh execution, /api/health, /api/version
match) will run for the first time on the next release that publishes
through this workflow; existing retry loops on systemd readiness,
service activation, and health endpoint absorb transient runner flakes.

Add install-sh-smoke.yml to the deployment-installability canonical files
and to the release-promotion proof policy's match_files, and add
scripts/installtests/build_release_assets_test.go to that policy's
exact_files (matching the existing pin set for related policies in the
deployment-installability subsystem). Update subsystem_lookup_test.py
fixtures that pinned the exact_files list literally.

Pinned the create-release.yml wiring in build_release_assets_test.go
alongside the validate-release-assets wiring so the smoke step cannot
silently be unwired.

Document the gate's contract responsibilities in
deployment-installability Extension Point 2.
2026-05-12 11:44:04 +01:00
rcourtman
b69c8c8007 Wire PULSE_RELAY_ENABLED and PULSE_RELAY_SERVER as real env overrides
These two env vars were documented as relay overrides in v6 docs since
March 18 (CONFIGURATION.md, RELAY.md, and the frontend-served doc copy)
but no code ever read them. Operators trying to bootstrap relay headlessly
saw no effect.

Implement them rather than remove the documentation. Headless and
container deployments now have a real path to enable relay and point it
at a private endpoint without going through Settings → Relay.

internal/relay/config_env.go:
  - ApplyEnvOverrides(*Config) mutates relay.Config in place.
  - PULSE_RELAY_ENABLED accepts true/false/yes/no/1/0/on/off (case-
    insensitive). Unrecognized values log a warning and leave the file
    value untouched — important so "unset" reads differently from
    "explicit false."
  - PULSE_RELAY_SERVER goes through the existing validateRelayServerURL
    check; invalid URLs log a warning and fall through.

internal/config/persistence_relay.go:
  LoadRelayConfig calls ApplyEnvOverrides after the file load and after
  the default-fallback when relay.enc is absent, so the env override
  applies on every load.

Tests cover unset / true / false / garbage-bool / valid-URL / invalid-URL
/ both-together / nil-config paths in the relay package, plus two
end-to-end tests in internal/config that prove the override flows through
LoadRelayConfig against a real persisted file and against the
missing-file default branch.

Restore the env-var docs with the correct default URL (the full
wss://relay.pulserelay.pro/ws/instance, not the bare hostname the
original aspirational table claimed) and add an explicit precedence note:
saving from the UI after an env override persists the env-effective state
to disk, so clearing the env alone does not revert.

Add internal/relay/config_env_test.go to the relay-runtime registry's
desktop-relay-runtime exact_files so the new code surface is proof-tracked.
Update the matching pin in subsystem_lookup_test.py. Extend the
relay-runtime contract Extension Point 3 to document the override
semantics LoadRelayConfig must satisfy.
2026-05-12 11:18:31 +01:00
rcourtman
49412357af Ship the Pulse server install.sh as the GitHub Release asset
Every v6 RC (rc.1 through rc.5, ~30 days) shipped the wrong install.sh.
build-release.sh was copying the rendered AGENT installer into
release/install.sh, but adapter_installsh, scripts/pulse-auto-update.sh,
the root install.sh's own --rc/--stable/--version flows, and the README
quickstart all fetch that asset and run `bash install.sh --version vX.Y.Z`.
Since the agent installer rejects --version with "Unknown argument", the
LXC quickstart, the in-product "Update Pulse" button on systemd/proxmoxve
deployments, and the pulse-auto-update.sh systemd timer were all broken
for every RC.

Fix the build-release.sh copy to publish the root server installer.
The agent installer continues to ship inside tarballs at ./scripts/install.sh
and inside Docker images at /opt/pulse/scripts/install.sh, and is served
at the running Pulse server's /install.sh endpoint — none of those paths
change. Only the top-level GitHub Releases asset moves from agent to server.

Update build_release_assets_test.go to lock in the new publishing rule
and ban the reverse drift, replacing the March 18 "legacy root install.sh"
guard that was the original mistake.

Add a validate-release.sh smoke that catches the regression mode this hid:
the published install.sh must have the Pulse server banner, the --version)
arg handler, must not contain the agent banner, and `bash install.sh --help`
must print the server installer's version-pinning help line. These checks
run as part of validate-release-assets.yml against the post-publish asset
bundle so a future swap back cannot slip through.

Document the asset identity rule and the validate-release.sh guard in the
deployment-installability contract so any future change to the publishing
pipeline has to update the contract or trip the shape guard.
2026-05-12 10:24:28 +01:00
rcourtman
7951da526b Add release_cycle_artifact_globs so RC ceremony skips contract-update requirement 2026-05-11 22:55:29 +01:00
rcourtman
816b3985ba Make blocked-record drift self-fixable via BLESS_GOVERNANCE_FIXTURES env var 2026-05-11 22:31:03 +01:00
rcourtman
d38f3d9217 Update release-control fixtures after pulse-agent Docker removal 2026-05-11 22:21:31 +01:00
rcourtman
8ff69daa43 Bump install pins to rc.5 and refresh test fixtures for Patrol readiness + Unraid host profile tokens 2026-05-11 18:02:52 +01:00
rcourtman
366bf8d127 Prepare v6.0.0-rc.5 release packet 2026-05-11 16:52:31 +01:00
rcourtman
d02255907c Fail closed when patrol_resolve_finding verifier is inconclusive
ResolveFinding adapter previously logged a warning and allowed the
LLM's resolve to proceed when the deterministic verifier returned
an error (timeout, executor unavailable, etc.). That's fail-open:
any verifier failure let the auto_resolved → re-detected cycle
continue, exactly the pattern the rest of this branch's
patrol_resolve_finding work spent commits closing. The "Backup
failed" finding on the live preview still cycled once post-
migration because of this path — verifier returned an
ErrVerificationUnknown and resolve was permitted.

Resolution of an event/persistent category finding is effectively
permanent (next detection registers as a regression and inflates
counters and pollutes the trust strip). When the deterministic
verifier cannot confidently say the failure signal is gone, we
don't have grounds to honor the LLM's judgment — the LLM's
"current investigation didn't surface a fresh failure" is exactly
the unreliable signal that produced bogus cycles.

Switches the inconclusive-verifier branch from log-and-allow to
log-and-reject, returning an error to the tool so the LLM can
retry or escalate to the operator. The verifier-still-detects-
signal path stays as-is (it was already fail-closed).

Test: TestPatrolFindingCreatorAdapter_ResolveFinding_RejectsWhenVerifierIsInconclusive
exercises the path by calling ResolveFinding on a backup-failed
finding through a PatrolService with no chat service wired
(getExecutorForVerification returns ErrVerificationUnknown). Asserts
the error mentions 'inconclusive' and that ResolvedAt remains nil.

Contract: extends the deterministic-resolve-gate clause in the
ai-runtime canonical-files completion-obligations to name the
fail-closed-on-inconclusive policy explicitly.
2026-05-11 10:53:06 +01:00