Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-04-29 03:50:18 +00:00

Author	SHA1	Message	Date
rcourtman	f344938403	Retry Linux guest meminfo sooner after transient failures (#1319 ) Some checks are pending Build and Test / Secret Scan (push) Waiting to run Details Build and Test / Frontend & Backend (push) Waiting to run Details Core E2E Tests / Playwright Core E2E (push) Waiting to run Details	2026-03-26 23:27:54 +00:00
rcourtman	4ad7e51875	Prefer linked host disk metrics for v5 Proxmox nodes	2026-03-25 16:54:00 +00:00
rcourtman	32746e2d2a	fix(monitoring): use RRD memavailable fallback when PVE node cache metrics missing (#1270 ) When Proxmox /nodes/{node}/status returns only total/used/free without available/buffers/cached, EffectiveAvailable() returns Free (non-zero), causing the RRD fallback gate to be skipped. This results in inflated node memory where cache/buffers are counted as "used." Widen the RRD fallback condition from requiring effectiveAvailable == 0 to triggering whenever missingCacheMetrics is true. Add negative caching for failed RRD lookups (2-minute backoff) to avoid repeated retries.	2026-02-21 22:47:20 +00:00
rcourtman	0ae2806f18	fix(memory): add guest agent /proc/meminfo fallback to avoid VM memory inflation (#1270 ) Proxmox status.Mem includes page cache as "used" memory, inflating reported VM usage. The existing fallbacks (balloon meminfo, RRD, linked host agent) were frequently unavailable, causing most VMs to fall through to the inflated status-mem source. Adds a new last-resort fallback that reads /proc/meminfo via the QEMU guest agent file-read endpoint to get accurate MemAvailable. Results are cached (60s positive, 5min negative backoff for unsupported VMs). Also fixes: RRD memavailable fallback missing from traditional polling path, cache key collisions in multi-PVE setups, FreeMem underflow guard inconsistency, and integer overflow in kB-to-bytes conversion.	2026-02-20 13:31:52 +00:00
rcourtman	efa916ee2a	fix(memory): correct memory reporting for Linux VMs and FreeBSD ZFS ARC Linux VM page cache (#1270): QEMU VM memory now falls back to Proxmox RRD's memavailable metric (which excludes reclaimable page cache) when the qemu-guest-agent doesn't provide MemInfo.Available. Previously the fallback was detailedStatus.Mem (total - MemFree), inflating usage to 80%+ on VMs with normal Linux page cache. Mirrors the existing LXC rrd-memavailable path. FreeBSD ZFS ARC (#1264, #1051): The host agent now reads kstat.zfs.misc.arcstats.size via SysctlRaw on FreeBSD and subtracts the ARC size from reported memory usage. ZFS ARC is reclaimable under memory pressure (like Linux SReclaimable) but gopsutil counts it as wired/non-reclaimable, causing false 90%+ memory alerts on TrueNAS and FreeBSD hosts. Build-tagged so it compiles cleanly on all platforms. Fixes #1270 Fixes #1264 Fixes #1051 (cherry picked from commit 94502f83ff9ffc6da28aaadc946a2f7d8b4e9bac)	2026-02-18 12:56:53 +00:00
rcourtman	13af83f3fc	fix(monitoring): preserve recent PVE nodes on empty polls (#1094 )	2026-02-07 14:18:33 +00:00
rcourtman	ebc29b4fdb	feat: show pending apt updates for Proxmox nodes (#1083 ) - Add PendingUpdates and PendingUpdatesCheckedAt fields to Node model - Add GetNodePendingUpdates method to Proxmox client (calls /nodes/{node}/apt/update) - Add 30-minute polling cache to avoid excessive API calls - Add pendingUpdates to frontend Node type - Add color-coded badge in NodeSummaryTable (yellow: 1-9, orange: 10+) - Update test stubs for interface compliance Requires Sys.Audit permission on Proxmox API token to read apt updates.	2026-01-21 10:53:36 +00:00
rcourtman	754e9d1abd	Fix monitoring test panic and goroutine leaks Two critical fixes to prevent test timeouts: 1. Nil map panic in TestPollPVEInstanceUsesRRDMemUsedFallback: - Test monitor was missing nodeLastOnline map initialization - Panic occurred when pollPVEInstance tried to update nodeLastOnline[nodeID] - Caused deadlock when panic recovery tried to acquire already-held mutex - Added nodeLastOnline: make(map[string]time.Time) to test monitor 2. Alert manager goroutine leak in Docker tests: - newTestMonitor() created alert manager but never stopped it - Background goroutines (escalationChecker, periodicSaveAlerts) kept running - Added t.Cleanup(func() { m.alertManager.Stop() }) to test helper These fixes resolve the 10+ minute test timeouts in CI workflows. Related to workflow run 19281508603.	2025-11-11 23:52:24 +00:00
rcourtman	d7766af799	Fix backend test failures blocking release workflow Three categories of fixes: 1. Goroutine leak causing 10-minute timeout: - Add defer mon.notificationMgr.Stop() in monitor_memory_test.go - Background goroutines from notification manager weren't being stopped 2. Database NULL column scanning errors: - Change LastError from string to string in queue.go - Change PayloadBytes from int to int in queue.go - SQL NULL values require pointer types in Go 3. SSRF protection blocking test servers: - Check allowlist for localhost before rejecting in notifications.go - Set PULSE_DATA_DIR to temp directory in tests - Add defer nm.Stop() calls to prevent goroutine leaks Fixes for preflight test failures in workflow run 19280879903.	2025-11-11 23:27:03 +00:00
rcourtman	af55362009	Fix inflated RAM usage reporting for LXC containers Related to #553 ## Problem LXC containers showed inflated memory usage (e.g., 90%+ when actual usage was 50-60%, 96% when actual was 61%) because the code used the raw `mem` value from Proxmox's `/cluster/resources` API endpoint. This value comes from cgroup `memory.current` which includes reclaimable cache and buffers, making memory appear nearly full even when plenty is available. ## Root Cause - Nodes: Had sophisticated cache-aware memory calculation with RRD fallbacks - VMs (qemu): Had detailed memory calculation using guest agent meminfo - LXCs: Naively used `res.Mem` directly without any cache-aware correction The Proxmox cluster resources API's `mem` field for LXCs includes cache/buffers (from cgroup memory accounting), which should be excluded for accurate "used" memory. ## Solution Implement cache-aware memory calculation for LXC containers by: 1. Adding `GetLXCRRDData()` method to fetch RRD metrics for LXC containers from `/nodes/{node}/lxc/{vmid}/rrddata` 2. Using RRD `memavailable` to calculate actual used memory (total - available) 3. Falling back to RRD `memused` if `memavailable` is not available 4. Only using cluster resources `mem` value as last resort This matches the approach already used for nodes and VMs, providing consistent cache-aware memory reporting across all resource types. ## Changes - Added `GuestRRDPoint` type and `GetLXCRRDData()` method to pkg/proxmox - Added `GetLXCRRDData()` to ClusterClient for cluster-aware operations - Modified LXC memory calculation in `pollPVEInstance()` to use RRD data when available - Added guest memory snapshot recording for LXC containers - Updated test stubs to implement the new interface method ## Testing - Code compiles successfully - Follows the same proven pattern used for nodes and VMs - Includes diagnostic snapshot recording for troubleshooting	2025-11-06 00:16:18 +00:00
rcourtman	a885fb5472	Surface LXC interface IPs via PVE interfaces API (#596 )	2025-10-23 08:07:32 +00:00
rcourtman	b95c01066e	Capture dynamic LXC IP metrics (#596 )	2025-10-23 07:50:45 +00:00
rcourtman	be85459db2	Add LXC config metadata for guest drawers (#596 )	2025-10-23 07:30:32 +00:00
rcourtman	20ff56aceb	Add coverage for PVE memused fallback #553	2025-10-22 17:14:12 +00:00
rcourtman	7ae393c8ec	Refine Proxmox node memory fallback (#582 )	2025-10-22 15:36:26 +00:00

15 commits