registerWithPulse() was a one-shot call at agent startup — if it failed
(timing, transient network, Pulse not ready), the agent silently continued
as a generic Host forever. Wrap the HTTP POST in a retry loop with
exponential backoff (5s, 10s, 20s, 40s, 60s) and distinguish 4xx errors
(no retry) from 5xx/network errors (retry).
Probe ~/.docker/run/docker.sock for RuntimeDocker and RuntimeAuto
before falling back to /var/run/docker.sock. This lets the agent
connect on macOS without requiring DOCKER_HOST to be set manually.
Ref #1200
saveToDisk used os.WriteFile which doesn't sync to disk before the
atomic rename. On CI runners with aggressive filesystem caching this
can leave the destination file with zero bytes, causing
TestKnowledgeStore_SaveLoad to fail with "unexpected end of JSON input".
syscall.SysctlRaw is Darwin-only in Go's standard library; FreeBSD
requires the equivalent from golang.org/x/sys/unix. This fixes the
Docker cross-compilation build failure for the freebsd/amd64 target.
(cherry picked from commit 5fe16c75a075b817f90b7192d8270a7bd6677017)
Windows VMs and VMs without qemu-guest-agent triggered an uncached
GetVMRRDData HTTP call on every poll cycle. Add vmRRDMemCache using the
same read-through cache pattern as nodeRRDMemCache (shared rrdCacheMu,
same TTL, same cleanup path).
(cherry picked from commit 582f16004a0f275de4c458e5d288be70eee613e4)
#1197: Add Custom URL input to the expanded host row in Settings → Agents.
Loads existing URL via HostMetadataAPI on row expand; saves on button click.
Only shown for host-type agent rows.
#1210: Fix agent_connected always false for Docker hosts on Proxmox VMs.
connectedAgentHostnames now also marks Docker host hostnames reachable when
their matching VM/LXC has a node with a connected Proxmox agent, mirroring
the routing logic already used in the control path.
#1267/#1269: Improve Proxmox auto-registration failure logging. Response body
is now included in the error message, and the warning directs users to delete
the state file to force re-registration rather than claiming the node exists.
(cherry picked from commit 305f6d3c94f0da4fc970450a6304da57d6d7fe80)
Linux VM page cache (#1270): QEMU VM memory now falls back to Proxmox
RRD's memavailable metric (which excludes reclaimable page cache) when
the qemu-guest-agent doesn't provide MemInfo.Available. Previously the
fallback was detailedStatus.Mem (total - MemFree), inflating usage to
80%+ on VMs with normal Linux page cache. Mirrors the existing LXC
rrd-memavailable path.
FreeBSD ZFS ARC (#1264, #1051): The host agent now reads
kstat.zfs.misc.arcstats.size via SysctlRaw on FreeBSD and subtracts
the ARC size from reported memory usage. ZFS ARC is reclaimable under
memory pressure (like Linux SReclaimable) but gopsutil counts it as
wired/non-reclaimable, causing false 90%+ memory alerts on TrueNAS
and FreeBSD hosts. Build-tagged so it compiles cleanly on all platforms.
Fixes#1270Fixes#1264Fixes#1051
(cherry picked from commit 94502f83ff9ffc6da28aaadc946a2f7d8b4e9bac)
Docker container URL preserved on update (#1054): container updates
recreate the container with a new runtime ID. The agent now includes
{oldContainerId, newContainerId} in the completion ACK payload; the
server uses this to copy persisted metadata (custom URLs, descriptions,
tags) to the new ID so nothing is lost. Migration is a copy, not a move,
so rollback scenarios still find metadata under the original ID.
Reduce metrics.db write amplification (#1124): add a UNIQUE index on
(resource_type, resource_id, metric_type, timestamp, tier) so rollup
reprocessing after a failed checkpoint uses INSERT OR IGNORE instead of
creating duplicate rows. Existing duplicates are deduplicated once on
startup if the index creation would otherwise fail. Also sets
wal_autocheckpoint(500) to checkpoint the WAL more frequently, preventing
unbounded WAL growth.
Fixes#1054Fixes#1124
FreeBSD auto-update (#1254): determineArch() now includes freebsd in its
OS switch, producing freebsd-amd64/arm64 instead of falling through to
a uname -m fallback that incorrectly returned linux-<arch>. FreeBSD agents
were downloading Linux ELF binaries and failing to exec them.
Docker rootless socket (#1200): buildRuntimeCandidates() now probes
/run/user/<uid>/docker.sock before the system-wide /var/run/docker.sock,
enabling auto-detection of Docker rootless installations.
Duplicate PVE/PBS hosts (#1245, #1252): handleSecureAutoRegister() now
deduplicates by host URL, updating the existing instance's token in-place
instead of appending a duplicate entry on each re-run of the setup script.
Fixes#1254Fixes#1200Fixes#1245Fixes#1252
(cherry picked from commit 0f1d9e9b9fea6c8b9e65872e8a78e25f93653eef)
The enable/disable toggle PUT sends back the flat list-response shape
(no nested oidc/saml objects). handleUpdateSSOProvider was unmarshaling
this directly, leaving OIDC and SAML as nil and overwriting all stored
credentials on every toggle.
Now preserves existing sub-config objects when the incoming payload omits
them, matching the existing ClientSecret preservation behaviour.
Fixes part of #1255
(cherry picked from commit 44868e99d66aa157f5c62d100151a6f8bc940205)
r.ssoConfig was never loaded from persistence in NewRouter(), so on every
restart all SSO providers were silently discarded (handleListSSOProviders
would reinitialize to an empty config on the first request).
Also adds ssoProviders to /api/security/status so the login page can
render SAML/OIDC login buttons for enabled providers.
Fixes part of #1255
(cherry picked from commit 395cd101ff4acb1b7f89ec3d907b84cbec217dc8)
TriggerPatrolForAlert was enqueuing into adHocTrigger regardless of
whether Patrol was enabled. With patrolLoop not running (disabled),
nothing drained the channel — it filled on the 10th alert and spammed
"Patrol trigger queue full, dropping trigger" on every subsequent alert.
Read p.config.Enabled in the same RLock as triggerManager and return
early when disabled.
Fixes#1258
(cherry picked from commit 69f399469538f0c9cd59084f6429fed8a793c042)
Recovery (all-clear) notifications were being silently suppressed during
quiet hours for any non-critical alert. Since powered-off alerts default
to Warning level, users who received an alert at 2pm would never get the
recovery notification if the VM came back during quiet hours.
Quiet hours are intended to suppress noisy firing alerts, not to hide
the fact that an issue has resolved. If you got the alert, you should
always get the all-clear.
Remove the ShouldSuppressResolvedNotification gate from handleAlertResolved.
The notifyOnResolve toggle (explicit user preference) is still respected.
Fixes#1259
When syscall.Exec() fails after the binary has already been atomically
replaced on disk, the old process would log an error and keep running
indefinitely with stale code. The next update check (1 hour later) sees
the on-disk version matches the server and skips the update — so the
restart is never retried.
Now the agent exits with code 1 when this happens, allowing systemd (or
any service manager) to restart it with the new binary. This fixes the
"temperature broken after each upgrade" reports where users had to
manually reinstall the agent after every Pulse server upgrade.
Fixes#1247
The SAML route registration (bee3d05f) was incomplete: the auth
middleware uses exact-match for public paths, so /api/saml/{id}/login
etc. would be blocked. Add prefix-based auth bypass for /api/saml/
paths and update route inventory tests for both SSO and SAML routes.
The SAML handler functions existed but were never registered in
setupRoutes(), causing 404s for all SAML authentication flows.
Adds /api/saml/ prefix route with dispatcher for all 5 endpoints.
The SSO handler functions and frontend were implemented but the HTTP
routes were never registered in setupRoutes(), causing 404 on all
/api/security/sso/providers endpoints.
Fixes#1248
When two standalone (non-clustered) PVE hosts share the same storage (NFS,
etc.), both instances see the same backup files during polling. Each instance
creates its own StorageBackup entry, causing guests with the same VMID on
different hosts to incorrectly show each other's backups.
Detect shared-storage duplicates by checking if the same volid appears across
multiple instances. When it does AND the VMID is ambiguous (exists on multiple
instances), skip the backup in SyncGuestBackupTimes rather than guessing which
instance owns it. This uses the same ambiguity pattern already applied to PBS
backups.
Fixes#1177
The shared storage deduplication key was just the storage name, causing
storages with the same name from different Proxmox instances (or PVE + PBS)
to be incorrectly merged into a single entry. This made one random host
appear to have all storages from all instances.
Include the instance name in the aggregation key so shared storage is only
merged within the same Proxmox cluster/instance.
Fixes#1246
When the top-level temperature.current field is 0 or missing (common
on some SATA drives), temperature was reported as 0°C with no fallback.
Now extracts temperature from ATA SMART attribute 194 (Temperature_Celsius)
or 190 (Airflow_Temperature_Cel) as a fallback.
Fixes#1243
The dedup logic only handled btrfs/zfs subvolumes, but Kubernetes
bind-mounts the same device at both pod and plugin paths, causing
xfs/ext4 volumes to be double-counted. Now deduplicates by
device+totalBytes for all filesystem types.
Fixes#1158
Patrol only pinged the first IP address of each VM/container, causing
false "unreachable" reports for guests with multiple IPs (common with
Windows VMs that have IPv6 or multi-adapter setups). Now probes all
IPs and marks reachable if any responds.
Fixes#1215
When users enable AI discovery without setting an interval, the
default of 0 silently stays in manual-only mode. Now normalizes
0 to 24h on save so discovery actually starts automatically.
Fixes#1225
Docker's one-shot stats API (stream=false) returns PreCPUStats from the
daemon's internal cache, which many Docker versions don't update between
non-streaming reads. This causes every call to return the same stale
PreCPUStats from container start, producing a constant lifetime-average
CPU% (e.g. 3.4%) instead of current usage.
Switch to always using manual delta tracking, which stores the previous
sample from our own reads and computes accurate deltas between collection
cycles. The first cycle returns 0 while establishing a baseline; all
subsequent cycles produce correct current CPU percentages.
Seagate drives pack vendor-specific data in the upper bytes of the
48-bit SMART raw value, causing Power_On_Hours to report billions of
years instead of the actual value. Use smartctl's raw.string field
(e.g. "16951 (223 173 0)") and extract the first integer, which is
the correct interpretation. Falls back to raw.value when the string
is empty or non-numeric.
The Docker agent was not passing the disk exclusion list to
hostmetricsCollect(), so excluded mounts appeared in the Docker tab
disk totals. Also add server-side fsfilters filtering to Docker
report processing for parity with the host agent path.
Relax the Linux-only gate on SMART collection to also run on FreeBSD.
Add FreeBSD disk discovery via sysctl kern.disks (lsblk is Linux-only).
The smartctl invocation and JSON parsing are already platform-agnostic.
Mock seeding wrote Docker container metrics as "docker" but the real
monitor uses "dockerContainer". This made mock-mode charts miss the
SQLite store path after the API normalization fix in 7336ec2d.
Frontend sends resourceType="docker" but the SQLite store uses
"dockerContainer". The /api/metrics-store/history handler now
normalizes the alias so queries return the correct historical data
instead of falling back to a single live data point.
Backend: nodes with the same logical identity (cluster+name) are merged
using a health-weighted preference, preserving host-agent links across
node-ID churn.
Frontend: extract buildMentionResources() with alias-based dedup so
docker hosts and standalone host agents sharing an ID/hostname appear
once in the @ mention autocomplete.
- fix(models): filter nodes by instance in UpdateNodesForInstance to prevent
PVE node duplication across poll cycles (#1214, #1192, #1217)
- fix(alerts): sort GetActiveAlerts output for stable ordering, preventing
hostname scrambling in frontend (#1218)
- fix(notifications): add ntfy-specific resolved webhook formatting with
plain-text body and proper headers (#1213)
- fix(frontend): respect "hide Docker update actions" setting in
DockerFilter Update All button (#1219)
- fix(frontend): add missing v prefix to GitHub release tag URLs (#1195)
- fix(monitoring): reduce disk detection warning from Warn to Debug to
eliminate log spam for pass-through disks (#1216)
- chore: bump VERSION to 5.1.5
The security hardening in beae4c86 added a settings:write scope
requirement to /api/auto-register, but agent install tokens only have
host-agent:report scope. This broke Proxmox auto-registration for all
agent-generated tokens. Accept either settings:write or host-agent:report
scope for auto-registration.
Fixes#1191
Agent tokens created from the Settings UI and the backend install
command handler were missing the agent:exec scope, which was added
as a security requirement in 60f9e6f0. This caused all newly
installed agents to fail registration with "Agent exec token missing
required scope: agent:exec".
Fixes#1191
The in-memory metrics buffer was changed from 1000 to 86400 points per
metric to support 30-day sparklines, but this pre-allocated ~18 MB per
guest (7 slices × 86400 × 32 bytes). With 50 guests that's 920 MB —
explaining why users needed to double their LXC memory after upgrading
to 5.1.0.
- Revert in-memory buffer to 1000 points / 24h retention
- Remove eager slice pre-allocation (use append growth instead)
- Add LTTB (Largest Triangle Three Buckets) downsampling algorithm
- Chart endpoints now use a two-tier strategy: in-memory for ranges
≤ 2h, SQLite persistent store + LTTB for longer ranges
- Reduce frontend ring buffer from 86400 to 2000 points
Related to #1190