Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-04-28 19:41:17 +00:00

Author	SHA1	Message	Date
rcourtman	48689137ec	Migrate Docker metadata on observed container recreation (#1054 )	2026-03-27 22:50:19 +00:00
rcourtman	8c1d4dcc04	Honor discovery subnet policy for cluster endpoints (#1319 )	2026-03-27 16:30:21 +00:00
rcourtman	01f916dcb5	Use linked host-agent disk data for guest fallback (#1319 )	2026-03-27 15:56:20 +00:00
rcourtman	ad10e1f116	Discover controller-backed SMART wearout paths (#1368 )	2026-03-27 15:42:44 +00:00
rcourtman	2a4432048a	Continue guest-agent polling after transient status failures (#1319 )	2026-03-27 14:50:28 +00:00
rcourtman	51abca6421	Treat available guest agents as healthy for VM memory carry-forward (#1319 )	2026-03-27 11:04:07 +00:00
rcourtman	963670f01c	Serve fresh alert snapshots from monitor state reads (#1365 )	2026-03-27 10:47:56 +00:00
rcourtman	ae66647eb1	Preserve VM memory when healthy guests fall back to false 100% usage (#1319 )	2026-03-27 08:27:14 +00:00
rcourtman	f344938403	Retry Linux guest meminfo sooner after transient failures (#1319 ) Some checks are pending Build and Test / Secret Scan (push) Waiting to run Details Build and Test / Frontend & Backend (push) Waiting to run Details Core E2E Tests / Playwright Core E2E (push) Waiting to run Details	2026-03-26 23:27:54 +00:00
rcourtman	d9b7c99f02	Rotate guest-agent poll priority across QEMU polls (#1319 )	2026-03-26 22:20:27 +00:00
rcourtman	fcd2384dd5	Stabilize transient VM disk fallbacks (#1319 ) Some checks are pending Build and Test / Secret Scan (push) Waiting to run Details Build and Test / Frontend & Backend (push) Waiting to run Details Core E2E Tests / Playwright Core E2E (push) Waiting to run Details	2026-03-26 11:12:23 +00:00
rcourtman	e9bbc35bae	Stabilize repeated low-trust VM memory fallbacks (#1319 )	2026-03-26 00:23:29 +00:00
rcourtman	2196327769	Preserve VM guest metadata across transient agent gaps (#1319 )	2026-03-26 00:12:19 +00:00
rcourtman	0f70aa053e	Honor disk-exclude for sleeping and Proxmox disks (#1142 )	2026-03-26 00:01:59 +00:00
rcourtman	333e66a8e9	Reject shared Docker token host identity collisions (#1366 )	2026-03-25 23:36:57 +00:00
rcourtman	48f4438d23	Scale v5 Proxmox guest disk polling	2026-03-25 18:24:47 +00:00
rcourtman	fba1fadccd	Make alert node display name resolution instance-aware (#1218 )	2026-03-25 12:44:22 +00:00
rcourtman	9c2a56d351	Respect quiet hours for recovery notifications (#1068 )	2026-03-25 12:27:35 +00:00
rcourtman	ffaeea18d6	Scope cluster TLS fingerprints to their own endpoints (#1199 )	2026-03-25 12:10:09 +00:00
rcourtman	40249947ed	Fix template backup orphan detection race (#1352 )	2026-03-25 10:36:33 +00:00
rcourtman	2fe22c3308	fix(backups): prevent template backups from being flagged as orphaned Some checks failed Build and Test / Secret Scan (push) Failing after 5s Details Build and Test / Frontend & Backend (push) Failing after 1m8s Details Core E2E Tests / Playwright Core E2E (push) Failing after 4m38s Details Proxmox VM/LXC templates are intentionally excluded from the monitored guest list, but their backup files exist on storage. The orphan-detection logic was firing for every template backup because the VMID was never in the guest lookup maps. Fix: track template VMID→node pairs in State.templateVMIDs (unexported, not serialised to API/frontend) during the resources poll loop, expose via StateSnapshot.TemplateVMIDs, and use in both buildGuestLookups() and the storage backup node-resolution map so orphan detection treats template backups as valid. Also preserves the template map through the cluster health grace-period path (zero-resource preservation), the partial-node grace-period path, and clears it on instance removal. Closes #1352	2026-03-17 09:04:22 +00:00
rcourtman	caff845c1a	fix(ui): use Proxmox tag colours from datacenter config Pulse was generating tag colours from a hash of the tag name instead of using the colours configured in Proxmox. Now polls /cluster/options once per PVE instance and merges the tag-style colour map into state, which the frontend uses as the first-priority colour source for tag badges. Falls back to the existing special-tag and hash-based colours when Proxmox hasn't set a custom colour for a tag.	2026-03-15 19:49:46 +00:00
rcourtman	d05a00b931	fix(monitoring): smooth transient VM memory fallback spikes	2026-03-10 23:06:17 +00:00
rcourtman	afcfb23a30	fix(monitoring): retain intermittent FreeBSD SMART data	2026-03-10 22:52:25 +00:00
rcourtman	7dab977d91	Add split memory bar showing Used \| Cache \| Free segments (#1302 ) Show reclaimable buff/cache as a distinct amber segment between used (green) and free (gray) in the memory bar. This explains why Pulse's memory percentage differs from Proxmox: Pulse reports cache-aware usage (MemAvailable) while Proxmox includes cache as used (Total-Free). Backend: add Cache field to Memory model, derived from MemInfo (Available - Free). Only uses MemInfo.Free (not FreeMem fallback) to avoid inflating cache by the balloon gap on ballooned VMs. Frontend: StackedMemoryBar renders three segments with tooltip breakdown. Tooltip Free accounts for balloon limit when active. Percentage label and alerts remain cache-aware (unchanged).	2026-03-10 10:16:14 +00:00
rcourtman	7a394ed724	Use explicit success flag for disk carry-forward guard (#1319 ) Replace the diskUsage <= 0 heuristic with a diskFromAgent bool that is only set when the guest agent actually returns valid filesystem data. Prevents carry-forward from firing on a genuine 0% disk reading.	2026-03-09 18:54:27 +00:00
rcourtman	9c279732f7	Skip disk carry-forward when guest agent is explicitly disabled (#1319 ) Prevents stale disk data from persisting indefinitely in the efficient poller when a user disables the guest agent after it had been providing data. Matches the fallback poller's agent-disabled exclusion.	2026-03-09 18:37:38 +00:00
rcourtman	abbd0df609	Fix disk metric spikes when guest agent intermittently fails (#1319 ) Carry forward previous cycle's disk data when the QEMU guest agent times out or errors, instead of falling back to Proxmox cluster/resources which always reports 0 for VM disk usage. Applied to both polling paths (pollVMsAndContainersEfficient and pollVMsWithNodes) with safety guards against uint64 underflow and permanent-failure exclusions.	2026-03-09 18:23:15 +00:00
rcourtman	a4b0771974	Prevent removed host agents from resurrecting via in-flight reports (#1331 ) Host agents removed from the UI would reappear on the next report cycle because there was no rejection mechanism — unlike Docker agents which already had resurrection prevention. Mirror the Docker agent pattern: - Track removed host IDs in a `removedHosts` map with 24hr TTL - Persist removal records in `State.RemovedHosts` for frontend display - Reject reports from removed hosts in `ApplyHostReport()` - Add `AllowHostReenroll()` + API route to clear the block - Show removed host agents in the Settings UI with "Allow re-enroll" - Sync removed-agent maps from state on startup for all agent types - Fix mock integration snapshot missing `RemovedDockerHosts` field	2026-03-09 17:52:34 +00:00
rcourtman	572520ebc6	Promote guest-agent /proc/meminfo fallback for accurate VM memory (#1270 ) Move the guest-agent file-read of /proc/meminfo earlier in the memory fallback chain so it runs before RRD, giving real-time MemAvailable that correctly excludes reclaimable buff/cache on Linux VMs. Also add VM.GuestAgent.FileRead permission for PVE 9 and fix install.sh to use comma-separated privilege strings.	2026-03-09 10:04:28 +00:00
rcourtman	aa139b73fb	Fix intermittent VM disappearance from dashboard (#555 ) Two root causes: (1) When Proxmox cluster/resources returns a partial response (e.g. during migration or transient API issue), VMs missing from a responsive node were silently dropped because the node appeared in nodesWithResources, bypassing grace-period preservation. Now preserves recently-seen guests from online nodes for up to the grace window. (2) The task queue allowed overlapping polls for the same PVE instance — a slower stale poll could overwrite a newer complete VM list. Added per-instance execution lock to skip duplicate scheduled tasks.	2026-03-08 22:16:24 +00:00
rcourtman	ff1bbe2fb8	Guard per-VM guest agent calls with timeout and panic recovery (#1319 ) A broken or hung qemu-agent on one VM could stall the entire polling loop, preventing higher-VMID VMs from being detected. Wrap all guest agent work in a 10s per-VM budget with panic recovery, and add a 2s timeout to GetVMStatus in the efficient poller to match the legacy path.	2026-03-07 22:30:18 +00:00
rcourtman	0dd3fc779b	Fix alert disable notification suppression Some checks failed Build and Test / Secret Scan (push) Has been cancelled Details Build and Test / Frontend & Backend (push) Has been cancelled Details Core E2E Tests / Playwright Core E2E (push) Has been cancelled Details	2026-03-07 18:40:08 +00:00
rcourtman	499ab812e3	Fix post-release regressions and lock v5 to single-tenant runtime	2026-03-05 23:46:35 +00:00
rcourtman	a4571f580b	fix(monitoring): harden VM memory selection and flag repeated VM usage	2026-03-03 16:19:17 +00:00
rcourtman	ff9dc34687	Fix offline host visibility/alerting across restarts (#1311 )	2026-03-03 15:43:29 +00:00
rcourtman	60bdc9a101	fix(memory): skip meminfo-derived when balloon lacks cache metrics (#1302 ) When the balloon driver reports Free but not Buffers or Cached, the meminfo-derived fallback computed memAvailable = Free alone, counting all reclaimable page cache as used memory. This caused Linux VMs to show wildly inflated usage (e.g. 93% when actual is 21%). Now meminfo-derived requires at least one cache metric (Buffers > 0 or Cached > 0) before trusting the value. When missing, the code falls through to RRD/guest-agent/Total-Used fallbacks which provide accurate cache-aware data. Both efficient and traditional polling paths are now consistent.	2026-03-02 11:48:18 +00:00
rcourtman	eb2397d99a	fix(notifications): route escalation notifications to selected channels only (#1259 ) Escalation was calling SendAlert() which always sends to all enabled channels, ignoring the per-level channel selection (email/webhook/all). Add SendAlertToChannels() that snapshots only the requested channel configs and uses a distinct "_escalation" queue type so the dequeue handler skips cooldown writes — preventing interference with the alert manager's own re-notify cadence.	2026-02-26 20:49:10 +00:00
rcourtman	32746e2d2a	fix(monitoring): use RRD memavailable fallback when PVE node cache metrics missing (#1270 ) When Proxmox /nodes/{node}/status returns only total/used/free without available/buffers/cached, EffectiveAvailable() returns Free (non-zero), causing the RRD fallback gate to be skipped. This results in inflated node memory where cache/buffers are counted as "used." Widen the RRD fallback condition from requiring effectiveAvailable == 0 to triggering whenever missingCacheMetrics is true. Add negative caching for failed RRD lookups (2-minute backoff) to avoid repeated retries.	2026-02-21 22:47:20 +00:00
rcourtman	0ae2806f18	fix(memory): add guest agent /proc/meminfo fallback to avoid VM memory inflation (#1270 ) Proxmox status.Mem includes page cache as "used" memory, inflating reported VM usage. The existing fallbacks (balloon meminfo, RRD, linked host agent) were frequently unavailable, causing most VMs to fall through to the inflated status-mem source. Adds a new last-resort fallback that reads /proc/meminfo via the QEMU guest agent file-read endpoint to get accurate MemAvailable. Results are cached (60s positive, 5min negative backoff for unsupported VMs). Also fixes: RRD memavailable fallback missing from traditional polling path, cache key collisions in multi-PVE setups, FreeMem underflow guard inconsistency, and integer overflow in kB-to-bytes conversion.	2026-02-20 13:31:52 +00:00
rcourtman	8c7d507ea4	fix(alerts): make --disk-exclude suppress Proxmox SSD wear/health alerts (#1142 ) The --disk-exclude agent flag only filtered local metric collection but had no effect on server-side Proxmox disk health and SSD wearout alerts, which poll the Proxmox API directly. Users excluding disks (e.g. --disk-exclude sda) still received alerts for those disks. Agent now sends its DiskExclude patterns in each report. The server stores them on the Host model and consults them during Proxmox disk polling — excluded disks get a synthetic healthy status passed to CheckDiskHealth so any existing alerts clear immediately. Also adds FreeBSD pseudo-filesystem types (fdescfs, devfs, linprocfs, linsysfs) to the virtual FS filter and /var/run/ to special mount prefixes, fixing false disk-full alerts on FreeBSD for fdescfs mounts.	2026-02-20 13:31:52 +00:00
rcourtman	fb7582c7e4	fix(memory): use linked Pulse host agent memory to avoid VM inflation (#1270 ) When no guest agent MemInfo or RRD data is available, prefer the linked Pulse host agent's memory (read from /proc/meminfo via gopsutil, which excludes page cache) over Proxmox's status.Mem (total - free, inflated by reclaimable cache). Applied to both efficient and traditional polling paths. Diagnostic fields added to VMMemoryRaw for visibility.	2026-02-19 19:04:19 +00:00
rcourtman	71b8b81af5	fix(monitoring): cache per-VM RRD memory lookups to avoid serial HTTP calls Windows VMs and VMs without qemu-guest-agent triggered an uncached GetVMRRDData HTTP call on every poll cycle. Add vmRRDMemCache using the same read-through cache pattern as nodeRRDMemCache (shared rrdCacheMu, same TTL, same cleanup path). (cherry picked from commit 582f16004a0f275de4c458e5d288be70eee613e4)	2026-02-18 12:57:15 +00:00
rcourtman	efa916ee2a	fix(memory): correct memory reporting for Linux VMs and FreeBSD ZFS ARC Linux VM page cache (#1270): QEMU VM memory now falls back to Proxmox RRD's memavailable metric (which excludes reclaimable page cache) when the qemu-guest-agent doesn't provide MemInfo.Available. Previously the fallback was detailedStatus.Mem (total - MemFree), inflating usage to 80%+ on VMs with normal Linux page cache. Mirrors the existing LXC rrd-memavailable path. FreeBSD ZFS ARC (#1264, #1051): The host agent now reads kstat.zfs.misc.arcstats.size via SysctlRaw on FreeBSD and subtracts the ARC size from reported memory usage. ZFS ARC is reclaimable under memory pressure (like Linux SReclaimable) but gopsutil counts it as wired/non-reclaimable, causing false 90%+ memory alerts on TrueNAS and FreeBSD hosts. Build-tagged so it compiles cleanly on all platforms. Fixes #1270 Fixes #1264 Fixes #1051 (cherry picked from commit 94502f83ff9ffc6da28aaadc946a2f7d8b4e9bac)	2026-02-18 12:56:53 +00:00
rcourtman	df23d80919	fix(alerts): always send recovery notifications regardless of quiet hours Recovery (all-clear) notifications were being silently suppressed during quiet hours for any non-critical alert. Since powered-off alerts default to Warning level, users who received an alert at 2pm would never get the recovery notification if the VM came back during quiet hours. Quiet hours are intended to suppress noisy firing alerts, not to hide the fact that an issue has resolved. If you got the alert, you should always get the all-clear. Remove the ShouldSuppressResolvedNotification gate from handleAlertResolved. The notifyOnResolve toggle (explicit user preference) is still respected. Fixes #1259	2026-02-18 12:53:09 +00:00
rcourtman	03939c3f9e	fix: deduplicate bind-mounted volumes in disk total calculation The dedup logic only handled btrfs/zfs subvolumes, but Kubernetes bind-mounts the same device at both pod and plugin paths, causing xfs/ext4 volumes to be double-counted. Now deduplicates by device+totalBytes for all filesystem types. Fixes #1158	2026-02-10 21:52:25 +00:00
rcourtman	26776b2075	fix(agent): apply --disk-exclude to Docker agent disk metrics (#1237 ) The Docker agent was not passing the disk exclusion list to hostmetricsCollect(), so excluded mounts appeared in the Docker tab disk totals. Also add server-side fsfilters filtering to Docker report processing for parity with the host agent path.	2026-02-10 16:59:35 +00:00
rcourtman	8a48acef1d	fix: hotfix 5.1.5 — node duplication, alert scrambling, ntfy resolved formatting - fix(models): filter nodes by instance in UpdateNodesForInstance to prevent PVE node duplication across poll cycles (#1214, #1192, #1217) - fix(alerts): sort GetActiveAlerts output for stable ordering, preventing hostname scrambling in frontend (#1218) - fix(notifications): add ntfy-specific resolved webhook formatting with plain-text body and proper headers (#1213) - fix(frontend): respect "hide Docker update actions" setting in DockerFilter Update All button (#1219) - fix(frontend): add missing v prefix to GitHub release tag URLs (#1195) - fix(monitoring): reduce disk detection warning from Warn to Debug to eliminate log spam for pass-through disks (#1216) - chore: bump VERSION to 5.1.5	2026-02-08 11:48:22 +00:00
rcourtman	d1e61d8a8a	fix: ship alerting hotfixes and prepare 5.1.4	2026-02-07 22:05:55 +00:00
rcourtman	13af83f3fc	fix(monitoring): preserve recent PVE nodes on empty polls (#1094 )	2026-02-07 14:18:33 +00:00

1 2 3 4 5 ...

260 commits