The legacy auto-register endpoint captured TLS fingerprints via
FetchFingerprint() but never persisted them to the node config. Nodes
with self-signed certs registered via the agent would fail with
"x509: certificate signed by unknown authority" on subsequent polls.
Store the fingerprint in all add/update paths for both PVE and PBS,
guard updates against empty-fingerprint clobber when FetchFingerprint
fails, and pass the fingerprint to cluster detection configs.
HandleUndismissFinding now checks both patrol and unified stores
before returning. Returns 404 with error message when the finding
is not found or not dismissed, instead of silently returning success.
When the balloon driver reports Free but not Buffers or Cached, the
meminfo-derived fallback computed memAvailable = Free alone, counting
all reclaimable page cache as used memory. This caused Linux VMs to
show wildly inflated usage (e.g. 93% when actual is 21%).
Now meminfo-derived requires at least one cache metric (Buffers > 0
or Cached > 0) before trusting the value. When missing, the code
falls through to RRD/guest-agent/Total-Used fallbacks which provide
accurate cache-aware data. Both efficient and traditional polling
paths are now consistent.
Add an actions menu to the hosts overview with a "Remove host from
Pulse" button. Includes permission checks (requires settings:write
scope), confirmation handling, and a security regression test for
the delete endpoint scope enforcement.
The backup_existing function copied the entire config directory
(including metrics.db at ~2.5GB) on every upgrade with no cleanup.
On small VMs this filled the disk within a few releases.
The upgrade only swaps the binary; config files are not modified,
so the backup served no practical purpose.
createProviderForModel() only handled "provider:model" colon format.
Models like "google/gemini-2.5-flash" or "google/gemini-2.0-flash:free"
(OpenRouter format) failed because the colon split produced invalid
provider names.
Now uses config.ParseModelString() which correctly detects slash-
delimited models as OpenRouter (routed via OpenAI-compatible API).
The Undismiss() method existed on FindingsStore but was never exposed
via the API. Users who dismissed findings as "not_an_issue" had no way
to revert them.
- Add HandleUndismissFinding handler and route
- Add Undismiss() to UnifiedStore for parity with FindingsStore
- Also remove matching explicit suppression rules on undismiss
Auto-register was running the monitor reload in a background goroutine,
so the HTTP response was sent before the poller picked up the new node.
If reload failed or was slow, the node appeared in Settings > Proxmox
(reads config from disk) but not on the main Proxmox tab (reads from
active polling state).
Changed both auto-register paths to reload synchronously, matching the
manual add path (HandleAddNode).
Patrol runs, evaluation passes, and QuickAnalysis calls were consuming
LLM tokens without recording them in the cost store. This made the
cost_budget_usd_30d budget setting ineffective since enforceBudget()
never saw patrol spend.
- Add RecordUsage() to ai.Service for thread-safe cost recording
- Add recordPatrolUsage() helper to PatrolService, called on both
success and error paths for main patrol and evaluation pass
- Record QuickAnalysis token usage in cost store
- Return partial PatrolResponse (with token counts) on error instead
of nil, so callers can always record consumed tokens
- Propagate partial response through chat_service_adapter on error
Expose PublicURL from runtime config in the system settings API response
so the frontend displays the actual value instead of the placeholder when
the env var is set.
Add w-full to PVE, PBS, and PMG node tables so they expand to fill the
container in full-width mode.
The AI also receives disk data via tool calls (pulse_metrics type="disks"),
not just the patrol context table. The raw JSON field "wearout" was
ambiguous — rename to "ssd_life_remaining_pct" so the field name itself
communicates that 100 = healthy.
The patrol context table header said "Wearout" and the tool returned a raw
"wearout" JSON field with no indication that 100 = full life remaining.
The AI interpreted "wearout: 100" as fully worn out and raised false
"100% Disk Wearout" findings on healthy NVMe drives.
Rename the patrol table column to "SSD Life Remaining (100%=new)" and
update the data type comment to clarify the semantics.
- Split configuration table into "Installer flags" and "Agent-only flags"
so users know which flags work with `curl | bash` vs the binary directly
- Add missing --cacert and --env flags to installer docs
- Fix --disable-auto-update example (install script doesn't accept it;
use --env PULSE_DISABLE_AUTO_UPDATE=true instead)
- Add --disable-docker/kubernetes/proxmox and --proxmox-type to
install.sh show_help()
- Fix --enable-docker=false in CENTRALIZED_MANAGEMENT.md
All --enable-docker=false, --enable-kubernetes=false, --enable-proxmox=false
references replaced with --disable-docker, --disable-kubernetes, --disable-proxmox.
On QNAP, /usr/local/bin is a tiny RAM disk. The installer was downloading
the binary then mv'ing it there, which failed when the RAM disk was full.
The QNAP-specific logic that copies to the persistent data volume only
ran after that mv.
Move QNAP detection before the download step so INSTALL_DIR points to the
persistent data volume (e.g. /share/CACHEDEV1_DATA/.pulse-agent) directly.
The wrapper script still attempts to copy to /usr/local/bin at boot but
falls back to running from persistent storage if that fails.
Also fixes:
- pkill -f pattern in wrapper could match and kill the wrapper itself
(path contains "pulse-agent"); switched to pkill -x for exact match
- Upgrade detection now checks /usr/local/bin for legacy QNAP installs
- Uninstall cleans up /usr/local/bin runtime copy
The printf '%s\n' calls in shell code within the Go Sprintf template
were being counted as format verbs, causing a build failure (10 verbs
but 9 args). Using %%s produces literal %s in the output.
QNAP wipes /etc/init.d on every reboot, so the agent needs persistent
storage on a data volume and autorun.sh boot persistence via the flash
config partition. Adds detection, install (with watchdog wrapper), and
clean uninstall paths. Flash config mount/umount is fail-safe via
subshell isolation to prevent leaving the partition mounted on write
errors.
The previous hasLiveInventory guard was a single boolean — if any PVE
instance had at least one live guest, orphan detection ran for all
instances. In multi-instance clusters with staggered polling, backups
from instances whose VMs hadn't been polled yet appeared orphaned,
producing false positive alerts with 0m duration.
Replace the global boolean with a per-instance map. PVE storage backups
now only run orphan detection when their specific instance has live
inventory. PBS/PMG backups (which span instances) retain the "any
instance has live guests" check.
The PVE setup script had three bugs in the temperature monitoring SSH key setup:
- Nested double quotes in SSH_SENSORS_KEY_ENTRY broke the bash string, causing
"No such file or directory" errors for the key options
- The grep/mv pattern to update authorized_keys destroyed the symlink that
Proxmox maintains from /root/.ssh/authorized_keys to /etc/pve/priv/
- The uninstall path grepped for "# pulse-managed-key" but keys were tagged
"# pulse-sensors", so uninstall never cleaned up sensor keys
Fixes: resolve symlinks with readlink -f before operating, create temp files in
/tmp with mv-then-cp fallback for cross-device moves, escape inner quotes, and
broaden the uninstall filter to match all pulse-prefixed keys.
Show "Pro" badge on the Reporting settings tab so users know upfront
that advanced reporting requires a Pro license, rather than discovering
it after filling out the form.
Downgrade patrol trigger queue-full and rejection messages from Warn to
Debug — these are normal rate-limiting behavior, not actionable warnings.
Model name detection used substring matching (.includes('gemini')) which
falsely required Gemini provider config for OpenRouter model IDs like
"google/gemini-2.5-flash". Now only known provider prefixes are treated
as explicit delimiters, slash-containing names route to OpenAI (OpenRouter
convention), and colons in model names (e.g. "llama3.2:latest") are no
longer misinterpreted as provider prefixes.
Backup polling goroutines can snapshot state before VM/container polling
populates the guest inventory. When guestsByVMID is empty, every backup
appears orphaned. Gate orphan detection on hasLiveInventory (at least one
guest with non-empty ResourceID) and preserve existing orphan alerts when
inventory becomes unavailable.
Escalation was calling SendAlert() which always sends to all enabled
channels, ignoring the per-level channel selection (email/webhook/all).
Add SendAlertToChannels() that snapshots only the requested channel
configs and uses a distinct "_escalation" queue type so the dequeue
handler skips cooldown writes — preventing interference with the alert
manager's own re-notify cadence.
Backport from v6 (88d5865a8). Recovery webhook notifications were using
the firing PayloadTemplate which services like Telegram, Teams, Discord
etc. silently rejected as malformed. Now uses a three-tier template
pipeline matching the firing path:
- Tier 1: Custom user template (if configured)
- Tier 2: Service-specific ResolvedPayloadTemplate (Discord green embed,
Telegram chat_id+text, Slack header blocks, Teams MessageCard/Adaptive,
PagerDuty event_action:"resolve", Pushover, Gotify, Mattermost)
- Tier 3: Generic JSON fallback (backward compatible)
Also adds Event, ResolvedAt, ResolvedAtISO fields to WebhookPayloadData.
The backend only allows one command per host at a time. The "Update All"
button was firing requests in parallel chunks, causing the second
container per host to fail with 400. Group targets by host and process
them sequentially within each host while still updating different hosts
in parallel.
Recovery notifications for Discord, Slack, Teams, PagerDuty, and other
service webhooks were sending a generic JSON payload that lacked the
required format (e.g. Discord needs `embeds`, Slack needs `blocks`),
causing resolved notifications to silently fail.
- Add `prepareResolvedWebhookData` to build template data with Level="resolved"
- Route resolved webhooks through service-specific templates with full
URL rendering, Telegram ChatID extraction, and PagerDuty routing_key
- Custom user templates take precedence over built-in service templates
- Return errors on service template failures instead of falling back to
generic payloads that endpoints would reject
- Fix PagerDuty template to send event_action="resolve" for resolved alerts
Fire a warning alert immediately when a backup's guest no longer exists
in inventory, without requiring age thresholds to be breached. The
existing alertOrphaned toggle and ignoreVMIDs UI control this feature
with no frontend changes needed.
The isRunning prop used a `cpuPercent > 0` gate that treated idle
containers (0% CPU) as not-running, causing the bar to flip between
a percentage and an em-dash on every poll cycle. Remove the value
guard so visibility depends only on container running state, matching
how memory, disk, and restart columns already behave.
Replace first-match-only logic in upsertMentionResource with a
union-merge algorithm that collects all matching keys, merges losers
into a canonical winner, and re-points aliases. This fixes the case
where a host agent bridges a VM and a DockerHost but only the first
alias match was merged, leaving a duplicate entry in the picker.
The v4 installer added mount entries for /run/pulse-sensor-proxy to LXC
container configs. After upgrading to v5 and rebooting, /run (tmpfs) is
wiped and the container fails to start. The installer now detects and
removes these stale mp<N> and lxc.mount.entry references automatically
when run on a PVE host, and the upgrade docs include manual fix steps.
When Proxmox /nodes/{node}/status returns only total/used/free without
available/buffers/cached, EffectiveAvailable() returns Free (non-zero),
causing the RRD fallback gate to be skipped. This results in inflated
node memory where cache/buffers are counted as "used."
Widen the RRD fallback condition from requiring effectiveAvailable == 0
to triggering whenever missingCacheMetrics is true. Add negative caching
for failed RRD lookups (2-minute backoff) to avoid repeated retries.
HostFrontend was missing LinkedVmId and LinkedContainerId fields, so the
frontend dedup aliases for VM/container agents resolved to undefined and
never matched. Also add .trim() to getStatusColor and default host agent
status to 'online' to fix grey status dots.
When an agent re-registers with the same token, the DHCP matching case
would overwrite the Host field with the agent's local IP — even if the
user had edited it to a public URL or different IP. Now agent source
re-registrations always preserve the existing host, while non-agent
DHCP updates still work. Adds 5 regression tests covering hostname
preservation, public-IP preservation, agent DHCP, non-agent DHCP, and
PBS parity.
Three fixes for remaining mention autocomplete issues:
- Status dots now correctly show green/red/yellow for online/offline/
degraded statuses (previously only handled running/stopped/paused)
- Docker hosts merge with their host agent via agentId cross-reference
- VMs and LXC containers merge with host agents running inside them
via linkedVmId/linkedContainerId backend ID aliases