Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-05-22 03:02:35 +00:00

Author	SHA1	Message	Date
rcourtman	7475c8a238	Auto-update Helm chart version to 5.1.30	2026-05-03 19:07:40 +00:00
rcourtman	719e78ce2f	Auto-update Helm chart documentation	2026-05-03 19:07:39 +00:00
rcourtman	8071758ce3	Prepare v5.1.30 release Refs #1454	2026-05-03 19:25:54 +01:00
rcourtman	8337cbc4c9	Fix v5 diagnostics GitHub export Normalize diagnostics collection fields to empty arrays before encoding and harden the sanitized GitHub export path against null arrays so empty v5 installs can still produce issue attachments. Refs #1454	2026-05-03 19:12:24 +01:00
rcourtman	9bfef81d93	Fix v5 update helper installer URL Render the maintenance installer URL into the generated update helper so it does not depend on installer-only shell functions after installation. Add a smoke test that executes the generated helper with fake curl and bash to preserve source-build forwarding.\n\nRefs #1454	2026-05-03 18:57:28 +01:00
rcourtman	80adfe848c	Bump postcss to 8.5.13 on release/5.1 Some checks failed Core E2E Tests / Playwright Core E2E (push) Has been cancelled Details Build and Test / Secret Scan (push) Has been cancelled Details Build and Test / Frontend & Backend (push) Has been cancelled Details Helm CI / Lint and Render Chart (push) Has been cancelled Details Keeps the release/5.1 frontend lockfile above the patched floor for GHSA-qx2v-qp2m-jg93 and aligned with the default-branch Dependabot fix. Refs Dependabot alert #83.	2026-05-01 20:18:00 +01:00
rcourtman	7294f795cb	Auto-update Helm chart version to 5.1.29	2026-05-01 14:44:25 +00:00
rcourtman	08fd10188e	Auto-update Helm chart documentation	2026-05-01 14:44:23 +00:00
rcourtman	858c894023	Prepare v5.1.29 release	2026-05-01 15:04:48 +01:00
rcourtman	84d6aa7ba8	Document issue-first contribution policy Pulse is a single-maintainer project and does not accept unsolicited external pull requests. README, CONTRIBUTING, and a new PULL_REQUEST_TEMPLATE now state this directly so contributors hit the policy before investing time in code, and so PRs opened in error point to issues and discussions as the correct intake. CONTRIBUTING is rewritten end-to-end around the new policy: how to file bugs, feature requests, support questions, and security reports; where to look for context (README, ARCHITECTURE, docs/); and the maintainer-direction carve-out for PRs explicitly requested against tracked issues.	2026-05-01 15:04:41 +01:00
rcourtman	3d3b1a9642	Stop re-notification spam when alert cooldown is disabled (Fixes #1444 ) shouldNotifyAfterCooldown previously returned true on every call when Schedule.Cooldown was 0 or negative, which the alert evaluation loop runs on every metric tick. With cooldown disabled, an active alert was re-notified on each tick. The UI labels cooldown=0 as "Disabled," so the intuitive contract is "do not re-notify," not "re-notify continuously." Treat <=0 as "first-time only": fire the initial notification, then suppress subsequent re-notifications until the alert clears or the cooldown is configured to a positive value. Level escalation re-notifications remain handled at the call site and are unaffected. Tests cover all three branches: first-time fire with cooldown=0, re-notification suppression with cooldown=0 (named regression guard for #1444), and the same behavior for negative values.	2026-05-01 15:04:27 +01:00
rcourtman	f0f20422da	Always make UpdateProgressModal closable so a stuck update can't lock the UI Some checks failed Build and Test / Secret Scan (push) Has been cancelled Details Build and Test / Frontend & Backend (push) Has been cancelled Details Core E2E Tests / Playwright Core E2E (push) Has been cancelled Details Update Integration Tests / Update Flow Integration Tests (push) Has been cancelled Details The modal had no close path when isComplete() was false: the X button was Show-gated on isComplete(), there was no Escape handler, and the backdrop had no onClick. So if the SSE stream dropped, the polling fallback failed, or the update process crashed before writing a terminal status, the modal stayed open with a black backdrop covering the page and no way to dismiss it except a hard browser refresh — the "page is blacked out and you can't press anything" symptom. Make the close path always available: - The X button in the header is no longer Show-gated. Its tooltip and aria-label adapt to clarify that closing during an active update only hides the modal — the update keeps running. - Escape on the document closes the modal while it is open. - Clicking on the backdrop (and only the backdrop, not the modal body) closes the modal. The actual update process is server-side and unaffected: closing just unmounts the modal's local SSE/polling. GlobalUpdateProgressWatcher keeps polling /api/updates/status independently and will surface completion via the existing reload path or via the Updates settings page. Frontend type-check passes and the 447-test vitest suite is green.	2026-04-30 12:01:25 +01:00
rcourtman	611ae5b9f8	Add --agent-id-file so containerized agents keep a stable identity Pulse agents derive their identity from /etc/machine-id by default. In Docker containers (especially nested in LXCs), /etc/machine-id is not guaranteed stable across container recreation: a fresh image instance gets a new machine-id, and the resulting AgentID drift causes the server to reject reports with 401 because the API token is bound to the original AgentID via the bound_agent_id token-metadata check (internal/api/router.go:1448-1458). Refs #1447. Add a --agent-id-file (and PULSE_AGENT_ID_FILE env var) flag that: 1. Reads the persisted AgentID from the file on start, when present, and short-circuits machine-id detection. The user mounts the file as a Docker volume (e.g. -v pulse-agent-id:/var/lib/pulse-agent) so it survives container recreation. 2. On first start (or when the file is missing/empty), the existing machine-id derivation runs and the resolved ID is written to the file atomically (tmp + rename, 0600 perms, parent dir created). Subsequent restarts of the container — even after `docker rm -f` and a fresh `docker run` — read the same ID from the volume and the server keeps recognising the agent. Default is no flag set, which preserves the current /etc/machine-id-derived behaviour for non-containerized installs.	2026-04-30 11:50:08 +01:00
rcourtman	4a5e234c12	Carry forward previous snapshots for guests we cannot poll this cycle When the snapshot-polling budget runs out mid-loop, or a single guest's GetVMSnapshots/GetContainerSnapshots call returns an error, the polling function used to early-return without writing any state. That meant: 1. snapshots successfully fetched for earlier guests in the same cycle were thrown away, and 2. on the next successful cycle, the freshly-polled snapshots replaced the entire instance's snapshot list — wiping out any snapshots whose owning VM had failed to respond this round. For users with a busy production cluster (many guests, intermittent per-VM API failures), this manifests as "new snapshots never appear in the Backups tab" because the failing VM keeps blanking the list the moment a successful poll lands (#1437). Now we read the previous snapshots for the instance up front, track which guests we successfully polled this cycle, and at the end merge the fresh data with previously-known snapshots for any guest we couldn't reach. Successfully-polled guests get their fresh data so new snapshots appear; failed guests keep their last-known list so transient errors do not blank state. The early-return on deadline is removed so the merge runs even on partial-failure cycles. Tests cover the carry-forward path: a fresh successful poll for one VM lands a new snapshot, and a concurrent failed poll for a second VM preserves its previously-known snapshot rather than dropping it.	2026-04-30 11:43:01 +01:00
rcourtman	a53de0fc53	Surface unified-agent filesystems in linked VM/container Overview The qemu-guest-agent's get-fsinfo cannot reliably report ZFS mounts on some guest configurations (notably Proxmox Backup Server), so VMs that have ZFS-formatted partitions show only their EXT4 root and datastore in the VM Overview FILESYSTEMS card while the much larger ZFS dataset holding the actual backups is missing entirely (Fixes #1438). The unified pulse-agent running inside the same guest already has direct OS-level visibility into every mounted filesystem, including ZFS, and Pulse already knows the link between the host agent and its guest via Host.LinkedVMID / Host.LinkedContainerID (set in findLinkedProxmoxEntity by hostname match). GetState now calls StateSnapshot.MergeLinkedHostDisksIntoGuests after producing the snapshot. For each Host with a linked VM or container, that helper: 1. appends host-agent disks to the guest's Disks slice, deduped by mountpoint (qemu-guest-agent entries take precedence so we don't overwrite per-VM-perspective values), and 2. updates the guest's aggregate Disk.{Total,Used,Free,Usage} to include the newly-added partitions so the row total stays consistent with the partitions visible in the FILESYSTEMS card. The merge runs on a defensive copy of the disks slice to avoid mutating the underlying State slice that GetSnapshot shallow-copies. Tests cover the happy path (PBS-shaped fixture mirroring the issue screenshots), the no-link no-op, container linking, empty-mountpoint filtering, and the slice-isolation invariant.	2026-04-30 11:24:47 +01:00
rcourtman	5c65f65a90	Pass keep_alive=30s to Ollama so the model unloads between Patrol runs Ollama keeps the loaded model in RAM for 5 minutes by default after each request, and every new request refreshes that 5-minute window. Pulse never passed keep_alive, so any Ollama traffic (Patrol, alert analysis, AI chat) within 5 minutes of the previous request kept the model warm — and on a server with continuous Pulse activity that meant the model never unloaded, even with Patrol set to a 24-hour interval (Fixes #1425). Pass keep_alive=30s on every Chat and ChatStream request. Short enough that the model unloads shortly after a Patrol burst or one-shot analysis ends, long enough to span the small gaps between sequential calls within a single analysis session (so the model is not reloaded mid-burst). Tests assert that both the streaming and non-streaming Chat paths include the keep_alive field in the Ollama request body.	2026-04-30 10:59:04 +01:00
rcourtman	012c25d604	Use /proc/mdstat operation type to gate RAID rebuilding alerts Distinguish a real rebuild ("recovery" after disk replacement) from routine maintenance ("check" data scrubs, "resync" after unclean shutdown) using the in-progress sync action from /proc/mdstat. The mdadm --detail State field does not reliably surface scrub state on all kernel/distribution combinations (notably Synology DSM), which is why scheduled scrubs were firing "RAID array is rebuilding" warnings every 30 seconds (Fixes #1446). The mdadm parser now extracts the operation keyword from the /proc/mdstat progress line and surfaces it as RAIDArray.Operation alongside the existing speed parse. The alert layer treats "recovery" and "reshape" as rebuild signals; "check" and "resync" are treated as maintenance and do not fire an alert. Stringy State matching is kept as a backstop for arrays without a /proc/mdstat progress line, but "resync" alone in State no longer counts as a rebuild signal. Threaded the new field through the host-agent report, the resources converter, and the monitor's models conversion. Added /proc/mdstat parser tests covering recovery/check/resync/reshape/idle, and end-to-end alert tests for recovery (alerts), check (silent scrub), and resync (silent maintenance).	2026-04-30 10:37:47 +01:00
rcourtman	0464bdbad0	Stop test-config sends from leaking stale auth into shared SMTP manager When the email config passed to sendHTMLEmailWithError differs from the manager's persisted config (a test send with edited but unsaved settings), build a fresh manager so stale Username, Password, AuthRequired, SMTPHost, SMTPPort, TLS, StartTLS, or Provider fields cannot leak into the SMTP exchange. The shared production manager is left untouched. Without this, a relay-mode test (port 25, no credentials) on a deployment that previously had authenticated SMTP saved would still attempt AUTH and fail with "AUTH not available" because the manager's old AuthRequired and credentials persisted (Fixes #1440). When the configs match, the existing reuse path is preserved so the production manager's rate limiter keeps working across grouped sends.	2026-04-30 10:28:47 +01:00
kanylbullen	4557fb8159	Refactor: extract emitFinalToolCalls helper, add EOF tests Address review feedback: - Extract shared tool-call finalization into emitFinalToolCalls closure to eliminate duplication between [DONE] and EOF-fallback paths - Build tool calls in deterministic index order (sorted) - Normalize stopReason consistently in both paths - Add unit tests: - TestOpenAIClient_ChatStream_ToolCallWithSimultaneousEOF: verifies tool calls are parsed when Read returns n>0 and io.EOF together - TestOpenAIClient_ChatStream_ToolCallWithoutDONE: verifies fallback emission when stream ends without [DONE] Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-30 10:05:59 +01:00
kanylbullen	c9bbe8b3a8	Fix SSE stream parser dropping tool calls on EOF The read loop in ChatStream breaks immediately on io.EOF without processing remaining buffered data. Per Go's io.Reader contract, Read may return both n > 0 and io.EOF simultaneously, so the final bytes (which may contain tool call deltas and [DONE]) are silently discarded. This causes the agentic loop to see tool_calls=0 even though the model correctly produced tool calls in the stream. Changes: - Process pendingData when EOF is received before breaking - Add fallback: emit accumulated tool calls if [DONE] was never reached (server closed connection early) Fixes #1411 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-30 10:05:59 +01:00
rcourtman	94b5bb1b28	Pin Go toolchain to 1.25.9 and bump x/net to 0.51.0 Clears 11 govulncheck findings on release/5.1: - 10 in the Go standard library (crypto/x509 auth bypass and panics, crypto/tls TLS 1.3 KeyUpdate DoS, archive/tar unbounded allocation, html/template XSS, os.Root filesystem escape, net/url IPv6 parse) — fixed by 1.25.9 - 1 in golang.org/x/net (HTTP/2 frame panic, GO-2026-4559) — fixed by 0.51.0 CI uses go-version-file: go.mod with setup-go@v5, which honors the toolchain directive, so workflow builds will pick up 1.25.9. Verified govulncheck reports no vulnerabilities and full Go test suite outcome is unchanged from the v5.1.28 baseline (the TestSetupScriptTokenLifecycleIntegration_PVE failures are pre-existing on release/5.1 and unrelated to the bump).	2026-04-30 10:05:46 +01:00
rcourtman	570fd31548	Bump dompurify to 3.4.1 to fix four DOMPurify advisories Dependabot #79–#82 (CVE-2026-41238/41239/41240, GHSA-39q2-94rc-95cp) all flag dompurify@3.3.3 sanitizer bypasses. Bumps the constraint to ^3.4.0 and locks at 3.4.1. Verified frontend-modern type-check, vite build, and the 447-test vitest suite all pass on the new version.	2026-04-30 09:31:11 +01:00
rcourtman	b204bed8c7	Fix release/5.1 LXC installs defaulting to RC Some checks failed Build and Test / Secret Scan (push) Has been cancelled Details Build and Test / Frontend & Backend (push) Has been cancelled Details Core E2E Tests / Playwright Core E2E (push) Has been cancelled Details Refs #1435	2026-04-21 17:18:42 +01:00
rcourtman	9fe622b885	Defer QNAP autorun until encrypted volume unlocks (Fixes #1422 ) QNAP's autorun.sh fires well before encrypted data volumes are unlocked, so the previous one-line entry that invoked start-pulse-agent.sh on the encrypted volume failed immediately — the wrapper did not exist yet, and the agent never started after reboot. Replace the entry with a backgrounded waiter that polls for the wrapper (every 2 s, up to 30 min) and execs it once the volume comes up. On unencrypted volumes the loop exits on the first check, so behaviour is unchanged. A timeout message is logged to /var/log/pulse-agent.log if the volume never unlocks within the window. The block is uninstall-safe: no internal blank lines, so the existing sed marker-to-blank-line range still removes it cleanly.	2026-04-17 11:46:23 +01:00
rcourtman	7e4d4e07bf	Persist QNAP agent updates to data volume (Fixes #1420 ) On QNAP, /usr/local/bin is a tiny RAM disk that gets wiped on every reboot. The install wrapper stores the real binary under ${QNAP_VOL}/.pulse-agent/<name> and a boot script copies it back into /usr/local/bin. Without refreshing the stored copy, auto-updates applied to the RAM disk were silently reverted on the next reboot. Mirror the Unraid persistence pattern: after the atomic in-place swap, when running on QNAP, rewrite the stored binary via a temp-file rename. Skip when the running binary already is the persistent copy (fallback mode, where the rename step already updated it).	2026-04-17 11:44:17 +01:00
rcourtman	8c8641e5f2	Merge unified host/docker rows when IDs diverge (Fixes #1421 ) The host-side identifier path applies sanitizeDockerHostSuffix before storing Host.ID, while the docker-side uses AgentKey() raw. For a QNAP unified agent those two derivations can produce different IDs, so the UnifiedAgents merge keyed on d.id === h.id split the single install into two rows. Add a 1:1 hostname fallback: if exactly one unmerged host row and one unmerged docker row share the same hostname, merge them. The strict 1:1 constraint prevents distinct machines that happen to share a hostname from being collapsed together.	2026-04-17 11:38:39 +01:00
rcourtman	6bc3d30548	Preserve Proxmox guest drawer state across refresh ticks Dashboard's group-level <For> iterated over Object.entries(groupedGuests()).sort(...), which produces brand-new tuple arrays on every refresh. Solid's <For> diffs by reference, so every tick it destroyed and recreated all child rows — wiping out GuestDrawer's activeTab signal (snapping Discovery back to Overview), graph hover tooltips, and scroll position inside the expanded row. Iterate over a memoized array of instance-ID strings instead. Primitive equality keeps the outer For stable, so only the guest data inside each group updates on each tick and the drawer's local state survives. Fixes #1427	2026-04-17 11:15:50 +01:00
rcourtman	e1011230b9	Align infra discovery with Patrol interval The infra discovery service auto-started with a hardcoded 5-minute ticker the moment the AI service initialized, regardless of the user's Patrol schedule. Each tick called AnalyzeForDiscovery, which hit the Ollama chat endpoint and reset Ollama's keep_alive (5 min default), so the model never had a chance to unload between requests. Default the discovery interval to 24h and align it with the user's Patrol preset (GetPatrolInterval) when the AI service constructs the discovery service. With Patrol at its 6h default, the LLM now sits idle long enough for Ollama to release it. Fixes #1425	2026-04-17 11:10:14 +01:00
rcourtman	4de1c3745a	Preflight disk space before Pulse updates Some checks failed Build and Test / Secret Scan (push) Has been cancelled Details Build and Test / Frontend & Backend (push) Has been cancelled Details Core E2E Tests / Playwright Core E2E (push) Has been cancelled Details Update Integration Tests / Update Flow Integration Tests (push) Has been cancelled Details	2026-04-15 20:56:58 +01:00
rcourtman	0b836aa3af	Fix v5 integration update test defaults	2026-04-14 20:24:58 +01:00
rcourtman	80dfd43f8c	Fix release dry-run integration image build	2026-04-14 20:06:27 +01:00
rcourtman	65670ca011	Make v5 release automation branch-owned	2026-04-14 19:48:25 +01:00
rcourtman	10d0803262	Auto-update Helm chart version to 5.1.28	2026-04-14 19:21:20 +01:00
rcourtman	3a04896e92	Auto-update Helm chart documentation	2026-04-14 19:21:20 +01:00
rcourtman	81661a934a	Move v5 maintenance flow onto release/5.1	2026-04-14 18:34:41 +01:00
rcourtman	c8f1ad75cf	Bump version to 5.1.28	2026-04-14 16:58:58 +01:00
rcourtman	a24af45c67	Add v6 RC announcement surfaces to v5	2026-04-14 16:51:19 +01:00
rcourtman	dfbe2eb873	Suppress noisy recovery notifications Some checks are pending Build and Test / Secret Scan (push) Waiting to run Details Build and Test / Frontend & Backend (push) Waiting to run Details Core E2E Tests / Playwright Core E2E (push) Waiting to run Details	2026-04-13 14:40:12 +01:00
rcourtman	19b2a4e4c4	Clear stale guest per-disk alerts	2026-04-13 14:20:54 +01:00
rcourtman	efb840deae	Fix installer universal bundle fallback	2026-04-13 14:13:11 +01:00
rcourtman	1f0dfd60fc	Lock SAML metadata public URL refresh	2026-04-13 13:48:27 +01:00
rcourtman	5a17456a60	Fix Ceph manager standby parsing Some checks are pending Build and Test / Secret Scan (push) Waiting to run Details Build and Test / Frontend & Backend (push) Waiting to run Details Core E2E Tests / Playwright Core E2E (push) Waiting to run Details	2026-04-13 11:57:12 +01:00
rcourtman	9fb76579cc	Fix backup type-aware orphan detection	2026-04-13 11:54:46 +01:00
rcourtman	3981df57a2	Detect NAS host vendors from platform files	2026-04-13 11:25:27 +01:00
rcourtman	754aa0e39c	Fix linked host agent threshold overrides Some checks are pending Build and Test / Secret Scan (push) Waiting to run Details Build and Test / Frontend & Backend (push) Waiting to run Details Core E2E Tests / Playwright Core E2E (push) Waiting to run Details	2026-04-12 22:47:34 +01:00
rcourtman	5f3a4b79ba	Fix oversized AI discovery responses	2026-04-12 22:33:48 +01:00
rcourtman	2ad288c091	Fix streamed installer entrypoint	2026-04-12 22:30:58 +01:00
rcourtman	95409985b5	Normalize vendor-managed NAS RAID arrays	2026-04-12 22:20:04 +01:00
rcourtman	a86c7120cf	Debounce recovery for poll-driven offline alerts	2026-04-12 22:04:10 +01:00
rcourtman	005f64182f	Respect quiet hours for escalation alerts Apply quiet-hours suppression to escalation notifications so offline and other suppressed categories do not bypass the normal notification rules during escalation. Fixes #1398.	2026-04-12 21:29:32 +01:00

1 2 3 4 5 ...

3440 commits