Proxmox VM/LXC templates are intentionally excluded from the monitored
guest list, but their backup files exist on storage. The orphan-detection
logic was firing for every template backup because the VMID was never
in the guest lookup maps.
Fix: track template VMID→node pairs in State.templateVMIDs (unexported,
not serialised to API/frontend) during the resources poll loop, expose
via StateSnapshot.TemplateVMIDs, and use in both buildGuestLookups() and
the storage backup node-resolution map so orphan detection treats template
backups as valid. Also preserves the template map through the cluster
health grace-period path (zero-resource preservation), the partial-node
grace-period path, and clears it on instance removal.
Closes#1352
Pulse was generating tag colours from a hash of the tag name instead
of using the colours configured in Proxmox. Now polls /cluster/options
once per PVE instance and merges the tag-style colour map into state,
which the frontend uses as the first-priority colour source for tag
badges. Falls back to the existing special-tag and hash-based colours
when Proxmox hasn't set a custom colour for a tag.
Show reclaimable buff/cache as a distinct amber segment between used
(green) and free (gray) in the memory bar. This explains why Pulse's
memory percentage differs from Proxmox: Pulse reports cache-aware
usage (MemAvailable) while Proxmox includes cache as used (Total-Free).
Backend: add Cache field to Memory model, derived from MemInfo
(Available - Free). Only uses MemInfo.Free (not FreeMem fallback) to
avoid inflating cache by the balloon gap on ballooned VMs.
Frontend: StackedMemoryBar renders three segments with tooltip
breakdown. Tooltip Free accounts for balloon limit when active.
Percentage label and alerts remain cache-aware (unchanged).
Replace the diskUsage <= 0 heuristic with a diskFromAgent bool that is
only set when the guest agent actually returns valid filesystem data.
Prevents carry-forward from firing on a genuine 0% disk reading.
Prevents stale disk data from persisting indefinitely in the efficient
poller when a user disables the guest agent after it had been providing
data. Matches the fallback poller's agent-disabled exclusion.
Carry forward previous cycle's disk data when the QEMU guest agent
times out or errors, instead of falling back to Proxmox cluster/resources
which always reports 0 for VM disk usage. Applied to both polling paths
(pollVMsAndContainersEfficient and pollVMsWithNodes) with safety guards
against uint64 underflow and permanent-failure exclusions.
Host agents removed from the UI would reappear on the next report cycle
because there was no rejection mechanism — unlike Docker agents which
already had resurrection prevention. Mirror the Docker agent pattern:
- Track removed host IDs in a `removedHosts` map with 24hr TTL
- Persist removal records in `State.RemovedHosts` for frontend display
- Reject reports from removed hosts in `ApplyHostReport()`
- Add `AllowHostReenroll()` + API route to clear the block
- Show removed host agents in the Settings UI with "Allow re-enroll"
- Sync removed-agent maps from state on startup for all agent types
- Fix mock integration snapshot missing `RemovedDockerHosts` field
Move the guest-agent file-read of /proc/meminfo earlier in the memory
fallback chain so it runs before RRD, giving real-time MemAvailable that
correctly excludes reclaimable buff/cache on Linux VMs. Also add
VM.GuestAgent.FileRead permission for PVE 9 and fix install.sh to use
comma-separated privilege strings.
Two root causes: (1) When Proxmox cluster/resources returns a partial
response (e.g. during migration or transient API issue), VMs missing
from a responsive node were silently dropped because the node appeared
in nodesWithResources, bypassing grace-period preservation. Now
preserves recently-seen guests from online nodes for up to the grace
window. (2) The task queue allowed overlapping polls for the same PVE
instance — a slower stale poll could overwrite a newer complete VM list.
Added per-instance execution lock to skip duplicate scheduled tasks.
A broken or hung qemu-agent on one VM could stall the entire polling
loop, preventing higher-VMID VMs from being detected. Wrap all guest
agent work in a 10s per-VM budget with panic recovery, and add a 2s
timeout to GetVMStatus in the efficient poller to match the legacy path.
When the balloon driver reports Free but not Buffers or Cached, the
meminfo-derived fallback computed memAvailable = Free alone, counting
all reclaimable page cache as used memory. This caused Linux VMs to
show wildly inflated usage (e.g. 93% when actual is 21%).
Now meminfo-derived requires at least one cache metric (Buffers > 0
or Cached > 0) before trusting the value. When missing, the code
falls through to RRD/guest-agent/Total-Used fallbacks which provide
accurate cache-aware data. Both efficient and traditional polling
paths are now consistent.
Escalation was calling SendAlert() which always sends to all enabled
channels, ignoring the per-level channel selection (email/webhook/all).
Add SendAlertToChannels() that snapshots only the requested channel
configs and uses a distinct "_escalation" queue type so the dequeue
handler skips cooldown writes — preventing interference with the alert
manager's own re-notify cadence.
When Proxmox /nodes/{node}/status returns only total/used/free without
available/buffers/cached, EffectiveAvailable() returns Free (non-zero),
causing the RRD fallback gate to be skipped. This results in inflated
node memory where cache/buffers are counted as "used."
Widen the RRD fallback condition from requiring effectiveAvailable == 0
to triggering whenever missingCacheMetrics is true. Add negative caching
for failed RRD lookups (2-minute backoff) to avoid repeated retries.
Proxmox status.Mem includes page cache as "used" memory, inflating
reported VM usage. The existing fallbacks (balloon meminfo, RRD, linked
host agent) were frequently unavailable, causing most VMs to fall
through to the inflated status-mem source.
Adds a new last-resort fallback that reads /proc/meminfo via the QEMU
guest agent file-read endpoint to get accurate MemAvailable. Results
are cached (60s positive, 5min negative backoff for unsupported VMs).
Also fixes: RRD memavailable fallback missing from traditional polling
path, cache key collisions in multi-PVE setups, FreeMem underflow
guard inconsistency, and integer overflow in kB-to-bytes conversion.
The --disk-exclude agent flag only filtered local metric collection but
had no effect on server-side Proxmox disk health and SSD wearout alerts,
which poll the Proxmox API directly. Users excluding disks (e.g.
--disk-exclude sda) still received alerts for those disks.
Agent now sends its DiskExclude patterns in each report. The server
stores them on the Host model and consults them during Proxmox disk
polling — excluded disks get a synthetic healthy status passed to
CheckDiskHealth so any existing alerts clear immediately.
Also adds FreeBSD pseudo-filesystem types (fdescfs, devfs, linprocfs,
linsysfs) to the virtual FS filter and /var/run/ to special mount
prefixes, fixing false disk-full alerts on FreeBSD for fdescfs mounts.
When no guest agent MemInfo or RRD data is available, prefer the linked
Pulse host agent's memory (read from /proc/meminfo via gopsutil, which
excludes page cache) over Proxmox's status.Mem (total - free, inflated
by reclaimable cache). Applied to both efficient and traditional polling
paths. Diagnostic fields added to VMMemoryRaw for visibility.
Windows VMs and VMs without qemu-guest-agent triggered an uncached
GetVMRRDData HTTP call on every poll cycle. Add vmRRDMemCache using the
same read-through cache pattern as nodeRRDMemCache (shared rrdCacheMu,
same TTL, same cleanup path).
(cherry picked from commit 582f16004a0f275de4c458e5d288be70eee613e4)
Linux VM page cache (#1270): QEMU VM memory now falls back to Proxmox
RRD's memavailable metric (which excludes reclaimable page cache) when
the qemu-guest-agent doesn't provide MemInfo.Available. Previously the
fallback was detailedStatus.Mem (total - MemFree), inflating usage to
80%+ on VMs with normal Linux page cache. Mirrors the existing LXC
rrd-memavailable path.
FreeBSD ZFS ARC (#1264, #1051): The host agent now reads
kstat.zfs.misc.arcstats.size via SysctlRaw on FreeBSD and subtracts
the ARC size from reported memory usage. ZFS ARC is reclaimable under
memory pressure (like Linux SReclaimable) but gopsutil counts it as
wired/non-reclaimable, causing false 90%+ memory alerts on TrueNAS
and FreeBSD hosts. Build-tagged so it compiles cleanly on all platforms.
Fixes#1270Fixes#1264Fixes#1051
(cherry picked from commit 94502f83ff9ffc6da28aaadc946a2f7d8b4e9bac)
Recovery (all-clear) notifications were being silently suppressed during
quiet hours for any non-critical alert. Since powered-off alerts default
to Warning level, users who received an alert at 2pm would never get the
recovery notification if the VM came back during quiet hours.
Quiet hours are intended to suppress noisy firing alerts, not to hide
the fact that an issue has resolved. If you got the alert, you should
always get the all-clear.
Remove the ShouldSuppressResolvedNotification gate from handleAlertResolved.
The notifyOnResolve toggle (explicit user preference) is still respected.
Fixes#1259
The dedup logic only handled btrfs/zfs subvolumes, but Kubernetes
bind-mounts the same device at both pod and plugin paths, causing
xfs/ext4 volumes to be double-counted. Now deduplicates by
device+totalBytes for all filesystem types.
Fixes#1158
The Docker agent was not passing the disk exclusion list to
hostmetricsCollect(), so excluded mounts appeared in the Docker tab
disk totals. Also add server-side fsfilters filtering to Docker
report processing for parity with the host agent path.
- fix(models): filter nodes by instance in UpdateNodesForInstance to prevent
PVE node duplication across poll cycles (#1214, #1192, #1217)
- fix(alerts): sort GetActiveAlerts output for stable ordering, preventing
hostname scrambling in frontend (#1218)
- fix(notifications): add ntfy-specific resolved webhook formatting with
plain-text body and proper headers (#1213)
- fix(frontend): respect "hide Docker update actions" setting in
DockerFilter Update All button (#1219)
- fix(frontend): add missing v prefix to GitHub release tag URLs (#1195)
- fix(monitoring): reduce disk detection warning from Warn to Debug to
eliminate log spam for pass-through disks (#1216)
- chore: bump VERSION to 5.1.5