Pulse

vrr/Pulse

mirror of https://github.com/rcourtman/Pulse.git synced 2026-05-11 13:05:31 +00:00

Author	SHA1	Message	Date
rcourtman	00afaec2ae	fix(agent): add retry with backoff to Proxmox auto-registration (#1267 , #1269 , #1261 , #1268 ) registerWithPulse() was a one-shot call at agent startup — if it failed (timing, transient network, Pulse not ready), the agent silently continued as a generic Host forever. Wrap the HTTP POST in a retry loop with exponential backoff (5s, 10s, 20s, 40s, 60s) and distinguish 4xx errors (no retry) from 5xx/network errors (retry).	2026-02-18 16:05:40 +00:00
Surendra Raika	f663aade53	feat(docker): add macOS Docker Desktop socket auto-detection Probe ~/.docker/run/docker.sock for RuntimeDocker and RuntimeAuto before falling back to /var/run/docker.sock. This lets the agent connect on macOS without requiring DOCKER_HOST to be set manually. Ref #1200	2026-02-18 19:23:14 +05:30
rcourtman	5666d6a9e8	fix(ai): fsync knowledge store temp file before rename to prevent empty reads saveToDisk used os.WriteFile which doesn't sync to disk before the atomic rename. On CI runners with aggressive filesystem caching this can leave the destination file with zero bytes, causing TestKnowledgeStore_SaveLoad to fail with "unexpected end of JSON input".	2026-02-18 13:27:47 +00:00
rcourtman	6c720b7aea	fix(freebsd): use golang.org/x/sys/unix.SysctlRaw instead of syscall.SysctlRaw syscall.SysctlRaw is Darwin-only in Go's standard library; FreeBSD requires the equivalent from golang.org/x/sys/unix. This fixes the Docker cross-compilation build failure for the freebsd/amd64 target. (cherry picked from commit 5fe16c75a075b817f90b7192d8270a7bd6677017)	2026-02-18 13:00:02 +00:00
rcourtman	71b8b81af5	fix(monitoring): cache per-VM RRD memory lookups to avoid serial HTTP calls Windows VMs and VMs without qemu-guest-agent triggered an uncached GetVMRRDData HTTP call on every poll cycle. Add vmRRDMemCache using the same read-through cache pattern as nodeRRDMemCache (shared rrdCacheMu, same TTL, same cleanup path). (cherry picked from commit 582f16004a0f275de4c458e5d288be70eee613e4)	2026-02-18 12:57:15 +00:00
rcourtman	7efcec3120	fix(agents,ai): host URL field, AI Docker routing, Proxmox registration logging (#1197 , #1210 , #1267 ) #1197: Add Custom URL input to the expanded host row in Settings → Agents. Loads existing URL via HostMetadataAPI on row expand; saves on button click. Only shown for host-type agent rows. #1210: Fix agent_connected always false for Docker hosts on Proxmox VMs. connectedAgentHostnames now also marks Docker host hostnames reachable when their matching VM/LXC has a node with a connected Proxmox agent, mirroring the routing logic already used in the control path. #1267/#1269: Improve Proxmox auto-registration failure logging. Response body is now included in the error message, and the warning directs users to delete the state file to force re-registration rather than claiming the node exists. (cherry picked from commit 305f6d3c94f0da4fc970450a6304da57d6d7fe80)	2026-02-18 12:57:09 +00:00
rcourtman	efa916ee2a	fix(memory): correct memory reporting for Linux VMs and FreeBSD ZFS ARC Linux VM page cache (#1270): QEMU VM memory now falls back to Proxmox RRD's memavailable metric (which excludes reclaimable page cache) when the qemu-guest-agent doesn't provide MemInfo.Available. Previously the fallback was detailedStatus.Mem (total - MemFree), inflating usage to 80%+ on VMs with normal Linux page cache. Mirrors the existing LXC rrd-memavailable path. FreeBSD ZFS ARC (#1264, #1051): The host agent now reads kstat.zfs.misc.arcstats.size via SysctlRaw on FreeBSD and subtracts the ARC size from reported memory usage. ZFS ARC is reclaimable under memory pressure (like Linux SReclaimable) but gopsutil counts it as wired/non-reclaimable, causing false 90%+ memory alerts on TrueNAS and FreeBSD hosts. Build-tagged so it compiles cleanly on all platforms. Fixes #1270 Fixes #1264 Fixes #1051 (cherry picked from commit 94502f83ff9ffc6da28aaadc946a2f7d8b4e9bac)	2026-02-18 12:56:53 +00:00
rcourtman	9d8f8b45b5	fix(docker,metrics): preserve container metadata on update and reduce DB writes Docker container URL preserved on update (#1054): container updates recreate the container with a new runtime ID. The agent now includes {oldContainerId, newContainerId} in the completion ACK payload; the server uses this to copy persisted metadata (custom URLs, descriptions, tags) to the new ID so nothing is lost. Migration is a copy, not a move, so rollback scenarios still find metadata under the original ID. Reduce metrics.db write amplification (#1124): add a UNIQUE index on (resource_type, resource_id, metric_type, timestamp, tier) so rollup reprocessing after a failed checkpoint uses INSERT OR IGNORE instead of creating duplicate rows. Existing duplicates are deduplicated once on startup if the index creation would otherwise fail. Also sets wal_autocheckpoint(500) to checkpoint the WAL more frequently, preventing unbounded WAL growth. Fixes #1054 Fixes #1124	2026-02-18 12:56:46 +00:00
rcourtman	7522f6599c	fix(agent): three backend fixes for FreeBSD, Docker rootless, and duplicate PVE hosts FreeBSD auto-update (#1254): determineArch() now includes freebsd in its OS switch, producing freebsd-amd64/arm64 instead of falling through to a uname -m fallback that incorrectly returned linux-<arch>. FreeBSD agents were downloading Linux ELF binaries and failing to exec them. Docker rootless socket (#1200): buildRuntimeCandidates() now probes /run/user/<uid>/docker.sock before the system-wide /var/run/docker.sock, enabling auto-detection of Docker rootless installations. Duplicate PVE/PBS hosts (#1245, #1252): handleSecureAutoRegister() now deduplicates by host URL, updating the existing instance's token in-place instead of appending a duplicate entry on each re-run of the setup script. Fixes #1254 Fixes #1200 Fixes #1245 Fixes #1252 (cherry picked from commit 0f1d9e9b9fea6c8b9e65872e8a78e25f93653eef)	2026-02-18 12:53:25 +00:00
rcourtman	97aee77ae7	fix(sso): preserve oidc/saml sub-config when toggle sends flat update payload The enable/disable toggle PUT sends back the flat list-response shape (no nested oidc/saml objects). handleUpdateSSOProvider was unmarshaling this directly, leaving OIDC and SAML as nil and overwriting all stored credentials on every toggle. Now preserves existing sub-config objects when the incoming payload omits them, matching the existing ClientSecret preservation behaviour. Fixes part of #1255 (cherry picked from commit 44868e99d66aa157f5c62d100151a6f8bc940205)	2026-02-18 12:53:18 +00:00
rcourtman	a210b01a03	fix(sso): load SSO config at startup and expose providers on login page r.ssoConfig was never loaded from persistence in NewRouter(), so on every restart all SSO providers were silently discarded (handleListSSOProviders would reinitialize to an empty config on the first request). Also adds ssoProviders to /api/security/status so the login page can render SAML/OIDC login buttons for enabled providers. Fixes part of #1255 (cherry picked from commit 395cd101ff4acb1b7f89ec3d907b84cbec217dc8)	2026-02-18 12:53:15 +00:00
rcourtman	43af70ca1f	fix(patrol): skip alert triggers when Patrol is disabled TriggerPatrolForAlert was enqueuing into adHocTrigger regardless of whether Patrol was enabled. With patrolLoop not running (disabled), nothing drained the channel — it filled on the 10th alert and spammed "Patrol trigger queue full, dropping trigger" on every subsequent alert. Read p.config.Enabled in the same RLock as triggerManager and return early when disabled. Fixes #1258 (cherry picked from commit 69f399469538f0c9cd59084f6429fed8a793c042)	2026-02-18 12:53:12 +00:00
rcourtman	df23d80919	fix(alerts): always send recovery notifications regardless of quiet hours Recovery (all-clear) notifications were being silently suppressed during quiet hours for any non-critical alert. Since powered-off alerts default to Warning level, users who received an alert at 2pm would never get the recovery notification if the VM came back during quiet hours. Quiet hours are intended to suppress noisy firing alerts, not to hide the fact that an issue has resolved. If you got the alert, you should always get the all-clear. Remove the ShouldSuppressResolvedNotification gate from handleAlertResolved. The notifyOnResolve toggle (explicit user preference) is still respected. Fixes #1259	2026-02-18 12:53:09 +00:00
rcourtman	6f156cd211	fix: exit agent when exec fails after binary replacement during auto-update When syscall.Exec() fails after the binary has already been atomically replaced on disk, the old process would log an error and keep running indefinitely with stale code. The next update check (1 hour later) sees the on-disk version matches the server and skips the update — so the restart is never retried. Now the agent exits with code 1 when this happens, allowing systemd (or any service manager) to restart it with the new binary. This fixes the "temperature broken after each upgrade" reports where users had to manually reinstall the agent after every Pulse server upgrade. Fixes #1247	2026-02-11 14:26:14 +00:00
rcourtman	2fb6ebc25f	fix: add SAML auth bypass and update route inventory tests The SAML route registration (`bee3d05f`) was incomplete: the auth middleware uses exact-match for public paths, so /api/saml/{id}/login etc. would be blocked. Add prefix-based auth bypass for /api/saml/ paths and update route inventory tests for both SSO and SAML routes.	2026-02-11 13:48:16 +00:00
rcourtman	bee3d05f0d	fix: register SAML login flow routes (login, ACS, metadata, logout, SLO) The SAML handler functions existed but were never registered in setupRoutes(), causing 404s for all SAML authentication flows. Adds /api/saml/ prefix route with dispatcher for all 5 endpoints.	2026-02-11 13:29:05 +00:00
rcourtman	89969079b9	fix: register SSO provider API routes The SSO handler functions and frontend were implemented but the HTTP routes were never registered in setupRoutes(), causing 404 on all /api/security/sso/providers endpoints. Fixes #1248	2026-02-11 13:17:51 +00:00
rcourtman	2735204638	fix: skip ambiguous shared-storage backups when VMID exists on multiple instances When two standalone (non-clustered) PVE hosts share the same storage (NFS, etc.), both instances see the same backup files during polling. Each instance creates its own StorageBackup entry, causing guests with the same VMID on different hosts to incorrectly show each other's backups. Detect shared-storage duplicates by checking if the same volid appears across multiple instances. When it does AND the VMID is ambiguous (exists on multiple instances), skip the backup in SyncGuestBackupTimes rather than guessing which instance owns it. This uses the same ambiguity pattern already applied to PBS backups. Fixes #1177	2026-02-11 11:07:28 +00:00
rcourtman	d4ff967815	fix: scope shared storage aggregation to per-instance to prevent cross-instance merging The shared storage deduplication key was just the storage name, causing storages with the same name from different Proxmox instances (or PVE + PBS) to be incorrectly merged into a single entry. This made one random host appear to have all storages from all instances. Include the instance name in the aggregation key so shared storage is only merged within the same Proxmox cluster/instance. Fixes #1246	2026-02-11 09:18:09 +00:00
rcourtman	2ba590d994	fix: fall back to SMART attributes 194/190 for disk temperature When the top-level temperature.current field is 0 or missing (common on some SATA drives), temperature was reported as 0°C with no fallback. Now extracts temperature from ATA SMART attribute 194 (Temperature_Celsius) or 190 (Airflow_Temperature_Cel) as a fallback. Fixes #1243	2026-02-11 09:09:55 +00:00
rcourtman	03939c3f9e	fix: deduplicate bind-mounted volumes in disk total calculation The dedup logic only handled btrfs/zfs subvolumes, but Kubernetes bind-mounts the same device at both pod and plugin paths, causing xfs/ext4 volumes to be double-counted. Now deduplicates by device+totalBytes for all filesystem types. Fixes #1158	2026-02-10 21:52:25 +00:00
rcourtman	42c01c1be5	fix: probe all guest IPs for reachability, not just first Patrol only pinged the first IP address of each VM/container, causing false "unreachable" reports for guests with multiple IPs (common with Windows VMs that have IPv6 or multi-adapter setups). Now probes all IPs and marks reachable if any responds. Fixes #1215	2026-02-10 21:46:11 +00:00
rcourtman	6140cb5be4	fix: auto-default discovery interval to 24h when enabled When users enable AI discovery without setting an interval, the default of 0 silently stays in manual-only mode. Now normalizes 0 to 24h on save so discovery actually starts automatically. Fixes #1225	2026-02-10 21:45:59 +00:00
rcourtman	ae4632b5b5	fix: correct UpdateAlertDelayHours doc comment (0 normalizes to 24, -1 disables)	2026-02-10 21:13:12 +00:00
rcourtman	a68e0050f8	fix(docker): use manual CPU delta tracking instead of stale PreCPUStats (#1229 ) Docker's one-shot stats API (stream=false) returns PreCPUStats from the daemon's internal cache, which many Docker versions don't update between non-streaming reads. This causes every call to return the same stale PreCPUStats from container start, producing a constant lifetime-average CPU% (e.g. 3.4%) instead of current usage. Switch to always using manual delta tracking, which stores the previous sample from our own reads and computes accurate deltas between collection cycles. The first cycle returns 0 while establishing a baseline; all subsequent cycles produce correct current CPU percentages.	2026-02-10 20:49:29 +00:00
rcourtman	47ceffe0c2	fix(smart): parse raw.string instead of raw.value for SATA attributes (#1239 ) Seagate drives pack vendor-specific data in the upper bytes of the 48-bit SMART raw value, causing Power_On_Hours to report billions of years instead of the actual value. Use smartctl's raw.string field (e.g. "16951 (223 173 0)") and extract the first integer, which is the correct interpretation. Falls back to raw.value when the string is empty or non-numeric.	2026-02-10 20:42:15 +00:00
rcourtman	26776b2075	fix(agent): apply --disk-exclude to Docker agent disk metrics (#1237 ) The Docker agent was not passing the disk exclusion list to hostmetricsCollect(), so excluded mounts appeared in the Docker tab disk totals. Also add server-side fsfilters filtering to Docker report processing for parity with the host agent path.	2026-02-10 16:59:35 +00:00
rcourtman	47adcbd8af	feat(agent): add FreeBSD S.M.A.R.T. disk collection support (#1236 ) Relax the Linux-only gate on SMART collection to also run on FreeBSD. Add FreeBSD disk discovery via sysctl kern.disks (lsblk is Linux-only). The smartctl invocation and JSON parsing are already platform-agnostic.	2026-02-10 12:44:15 +00:00
rcourtman	f7a14feb0f	fix(mock): align Docker container store type with real monitor Mock seeding wrote Docker container metrics as "docker" but the real monitor uses "dockerContainer". This made mock-mode charts miss the SQLite store path after the API normalization fix in `7336ec2d`.	2026-02-09 22:42:08 +00:00
rcourtman	7336ec2d87	fix(metrics): normalize docker resource type in metrics history API (#1229 ) Frontend sends resourceType="docker" but the SQLite store uses "dockerContainer". The /api/metrics-store/history handler now normalizes the alias so queries return the correct historical data instead of falling back to a single live data point.	2026-02-09 22:33:24 +00:00
rcourtman	c92ccc122e	fix(state): deduplicate PVE nodes and AI mention resources (#1217 , #1214 ) Backend: nodes with the same logical identity (cluster+name) are merged using a health-weighted preference, preserving host-agent links across node-ID churn. Frontend: extract buildMentionResources() with alias-based dedup so docker hosts and standalone host agents sharing an ID/hostname appear once in the @ mention autocomplete.	2026-02-09 22:19:55 +00:00
rcourtman	815c990e85	fix(proxmox): avoid 403 on apt update checks	2026-02-09 20:28:09 +00:00
rcourtman	721be9bce6	fix(config): honor legacy env aliases for docker update-action toggle (#1219 )	2026-02-09 14:00:24 +00:00
rcourtman	cedf0c8f0f	fix(temperature): parse string sensor values without zeroing readings (#1224 )	2026-02-09 14:00:09 +00:00
rcourtman	0d6fffbb1c	fix(servicediscovery): run automatic refresh for changed/stale resources (#1225 )	2026-02-09 14:00:02 +00:00
rcourtman	1f74c12ef8	fix(alerts): preserve docker update delay across host identity churn (#1226 )	2026-02-09 13:59:52 +00:00
rcourtman	8a48acef1d	fix: hotfix 5.1.5 — node duplication, alert scrambling, ntfy resolved formatting - fix(models): filter nodes by instance in UpdateNodesForInstance to prevent PVE node duplication across poll cycles (#1214, #1192, #1217) - fix(alerts): sort GetActiveAlerts output for stable ordering, preventing hostname scrambling in frontend (#1218) - fix(notifications): add ntfy-specific resolved webhook formatting with plain-text body and proper headers (#1213) - fix(frontend): respect "hide Docker update actions" setting in DockerFilter Update All button (#1219) - fix(frontend): add missing v prefix to GitHub release tag URLs (#1195) - fix(monitoring): reduce disk detection warning from Warn to Debug to eliminate log spam for pass-through disks (#1216) - chore: bump VERSION to 5.1.5	2026-02-08 11:48:22 +00:00
rcourtman	d1e61d8a8a	fix: ship alerting hotfixes and prepare 5.1.4	2026-02-07 22:05:55 +00:00
rcourtman	f253ed2778	fix(license): harden release key validation and fingerprint logging	2026-02-07 14:18:44 +00:00
rcourtman	6909264a02	fix(alerts): reduce swarm alert noise and preserve notification state (#1096 )	2026-02-07 14:18:39 +00:00
rcourtman	13af83f3fc	fix(monitoring): preserve recent PVE nodes on empty polls (#1094 )	2026-02-07 14:18:33 +00:00
rcourtman	0f961054c6	fix: allow agent tokens to auto-register Proxmox nodes The security hardening in `beae4c86` added a settings:write scope requirement to /api/auto-register, but agent install tokens only have host-agent:report scope. This broke Proxmox auto-registration for all agent-generated tokens. Accept either settings:write or host-agent:report scope for auto-registration. Fixes #1191	2026-02-04 22:55:25 +00:00
rcourtman	f6338f34fa	fix: add agent:exec scope to generated agent tokens Agent tokens created from the Settings UI and the backend install command handler were missing the agent:exec scope, which was added as a security requirement in `60f9e6f0`. This caused all newly installed agents to fail registration with "Agent exec token missing required scope: agent:exec". Fixes #1191	2026-02-04 22:33:01 +00:00
rcourtman	5bbc4329bd	Remove pprof diagnostics endpoint	2026-02-04 20:44:00 +00:00
rcourtman	a37b59b7e4	Add admin-gated pprof diagnostics endpoint	2026-02-04 20:39:24 +00:00
rcourtman	8bb89c4031	test: add memory regression coverage for AI stores	2026-02-04 19:56:12 +00:00
rcourtman	ee0e89871d	fix: reduce metrics memory 86x by reverting buffer and adding LTTB downsampling The in-memory metrics buffer was changed from 1000 to 86400 points per metric to support 30-day sparklines, but this pre-allocated ~18 MB per guest (7 slices × 86400 × 32 bytes). With 50 guests that's 920 MB — explaining why users needed to double their LXC memory after upgrading to 5.1.0. - Revert in-memory buffer to 1000 points / 24h retention - Remove eager slice pre-allocation (use append growth instead) - Add LTTB (Largest Triangle Three Buckets) downsampling algorithm - Chart endpoints now use a two-tier strategy: in-memory for ranges ≤ 2h, SQLite persistent store + LTTB for longer ranges - Reduce frontend ring buffer from 86400 to 2000 points Related to #1190	2026-02-04 19:49:52 +00:00
rcourtman	d2604a6859	test: add AI memory regression coverage	2026-02-04 19:46:20 +00:00
rcourtman	bcd0dbfc18	Add metrics history memory regression test	2026-02-04 19:35:19 +00:00
rcourtman	049a3e424c	Add memory regression tests for agent and scheduler	2026-02-04 19:33:29 +00:00

1 2 3 4 5 ...

1596 commits