When the agent runs as root, os.Getuid() returns 0 so it only probes
/run/user/0/docker.sock. Rootless Docker installs live at
/run/user/1000/docker.sock (or similar). Glob /run/user/*/docker.sock
and /run/user/*/podman/podman.sock to discover sockets for all users.
Probe ~/.docker/run/docker.sock for RuntimeDocker and RuntimeAuto
before falling back to /var/run/docker.sock. This lets the agent
connect on macOS without requiring DOCKER_HOST to be set manually.
Ref #1200
Docker container URL preserved on update (#1054): container updates
recreate the container with a new runtime ID. The agent now includes
{oldContainerId, newContainerId} in the completion ACK payload; the
server uses this to copy persisted metadata (custom URLs, descriptions,
tags) to the new ID so nothing is lost. Migration is a copy, not a move,
so rollback scenarios still find metadata under the original ID.
Reduce metrics.db write amplification (#1124): add a UNIQUE index on
(resource_type, resource_id, metric_type, timestamp, tier) so rollup
reprocessing after a failed checkpoint uses INSERT OR IGNORE instead of
creating duplicate rows. Existing duplicates are deduplicated once on
startup if the index creation would otherwise fail. Also sets
wal_autocheckpoint(500) to checkpoint the WAL more frequently, preventing
unbounded WAL growth.
Fixes#1054Fixes#1124
FreeBSD auto-update (#1254): determineArch() now includes freebsd in its
OS switch, producing freebsd-amd64/arm64 instead of falling through to
a uname -m fallback that incorrectly returned linux-<arch>. FreeBSD agents
were downloading Linux ELF binaries and failing to exec them.
Docker rootless socket (#1200): buildRuntimeCandidates() now probes
/run/user/<uid>/docker.sock before the system-wide /var/run/docker.sock,
enabling auto-detection of Docker rootless installations.
Duplicate PVE/PBS hosts (#1245, #1252): handleSecureAutoRegister() now
deduplicates by host URL, updating the existing instance's token in-place
instead of appending a duplicate entry on each re-run of the setup script.
Fixes#1254Fixes#1200Fixes#1245Fixes#1252
(cherry picked from commit 0f1d9e9b9fea6c8b9e65872e8a78e25f93653eef)
Docker's one-shot stats API (stream=false) returns PreCPUStats from the
daemon's internal cache, which many Docker versions don't update between
non-streaming reads. This causes every call to return the same stale
PreCPUStats from container start, producing a constant lifetime-average
CPU% (e.g. 3.4%) instead of current usage.
Switch to always using manual delta tracking, which stores the previous
sample from our own reads and computes accurate deltas between collection
cycles. The first cycle returns 0 while establishing a baseline; all
subsequent cycles produce correct current CPU percentages.
The Docker agent was not passing the disk exclusion list to
hostmetricsCollect(), so excluded mounts appeared in the Docker tab
disk totals. Also add server-side fsfilters filtering to Docker
report processing for parity with the host agent path.
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.
Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()
Related to #1063
When a user's reverse proxy redirects HTTP to HTTPS, Go's default HTTP
client behavior converts POST requests to GET on 301/302 redirects
(per HTTP specification). This causes the Pulse server to return 405
"Only POST is allowed" errors.
Added CheckRedirect to all agent HTTP clients (host, docker, kubernetes)
that returns a clear error message guiding users to use the correct
protocol in their --url flag instead of silently following redirects.
Related to #1058
When --docker-runtime=podman is explicitly set, the agent should try
Podman-specific sockets first before falling back to environment
defaults (which try /var/run/docker.sock).
Also adds /var/run/podman/podman.sock as a candidate socket path,
which is used by CoreOS and some Fedora configurations.
Related to #1045
When a Docker agent tries to register with a token that's already bound
to another agent, the error was logged generically as "Failed to send
docker report". Users had to dig into logs to understand the issue.
Now logs a prominent error message:
"DOCKER REGISTRATION FAILED: This API token is already used by another
Docker agent. Each Docker host requires its own unique token. Generate
a new token in Pulse Settings > Agents and reinstall with the new token."
Related to #1027
Add PULSE_DISABLE_DOCKER_UPDATE_CHECKS environment variable and
--disable-docker-update-checks flag to disable Docker image update
detection. This is useful for:
- Avoiding Docker Hub rate limits
- Users who don't want update notifications in their dashboard
Related to Discussion #982
- Separated metrics collection into internal/dockeragent/collect.go
- Added agent self-update pre-flight check (--self-test)
- Implemented signed binary verification with key rotation for updates
- Added batch update support to frontend with parallel processing
- Cleaned up agent.go and added startup cleanup for backup containers
- Updated documentation for Docker features and agent security
Fixed an issue where all Docker containers were showing 'click to update'
even when they were up to date. The root cause was comparing the wrong
digest types:
- Previously: Compared ImageID (local config hash) vs registry manifest digest
- Now: Uses RepoDigests from image inspect, which is the actual manifest
digest that Docker received from the registry when pulling the image
For multi-arch images, the registry returns a manifest list digest, while
Docker stores the platform-specific image config digest locally. These
will never match, causing false positives for all multi-arch images.
Changes:
- Added ImageInspectWithRaw to dockerClient interface
- Added getImageRepoDigest method to extract RepoDigest from image
- Added matchesImageReference helper for Docker Hub naming conventions
- Added tests for matchesImageReference
Fixes#955
- Add comprehensive test coverage for agent report, flush buffer, and deps
- Expand flow, HTTP, CPU, and swarm test coverage
- Refactor registry access to use deps interface for better testability
- Add container update and self-update test scenarios
Allows specifying which IP address the agent should report, useful for:
- Multi-homed systems with separate management networks
- Systems with private monitoring interfaces
- VPN/overlay network scenarios
Usage:
pulse-agent --report-ip 192.168.1.100
PULSE_REPORT_IP=192.168.1.100 pulse-agent
Content was being streamed twice:
1. During each iteration of the tool loop (intended for intermediate feedback)
2. Again after the loop ended with finalContent (redundant)
This caused duplicate responses when using Ollama and other providers.
- Add container update command handling to unified agent
- Agent can now receive update_container commands from Pulse server
- Pulls latest image, stops container, creates backup, starts new container
- Automatic rollback on failure
- Backup container cleaned up after 5 minutes
- Added comprehensive test coverage for container update logic
BREAKING CHANGE: AI Patrol now uses EXACT alert thresholds by default
instead of warning 5-15% before the threshold.
Changes:
- Default behavior: Patrol warns at your configured threshold (e.g., 96% = warns at 96%)
- New setting: 'use_proactive_thresholds' enables the old early-warning behavior
- API: Added use_proactive_thresholds to GET/PUT /api/settings/ai
- Backend: Added SetProactiveMode/GetProactiveMode to PatrolService
- Backend: Added GetThresholds to PatrolService for UI display
- Tests: Updated and added tests for both exact and proactive modes
- Also fixed unused imports in dockeragent/agent.go
When proactive mode is disabled (default):
- Watch: threshold - 5% (slight buffer)
- Warning: exact threshold
When proactive mode is enabled:
- Watch: threshold - 15%
- Warning: threshold - 5%
Related to #951
The GuestURL field was missing from NodeFrontend and its converter,
causing configured Guest URLs to be ignored when clicking on cluster
node names. The frontend would fall back to the auto-detected IP
instead of using the user-configured Guest URL.
Related to #940
- Add Env field to Container struct in pkg/agents/docker/report.go
- Extract env vars from inspect.Config.Env in Docker agent
- Mask sensitive values (password, secret, key, token, etc.) with ***
- Display env vars in container drawer with green badges (amber for masked)
- Add tests for maskSensitiveEnvVars function
Related to #916
Users can now exclude specific mount points from disk monitoring:
- Via CLI: --disk-exclude /mnt/backup --disk-exclude '/media/*'
- Via env: PULSE_DISK_EXCLUDE=/mnt/backup,*pbs*
Patterns support:
- Exact paths: /mnt/backup
- Prefix patterns: /mnt/ext*
- Contains patterns: *pbs*
This addresses the common case where external disks or
PBS datastores are being monitored but shouldn't be.
Related to #823
- Log payload size (in KB and bytes) at debug level
- Warn when payload approaches 400KB (512KB limit)
- Helps diagnose 'request body too large' errors
- Add unit tests for internal/buffer package
- Fix misleading "ring buffer" comment (it's a bounded FIFO queue)
- Remove unused BufferCapacity config field from both agents
- Rewrite flaky integration test to use polling instead of fixed sleeps
- Move normalizeVersion to utils.NormalizeVersion for single source of truth
- Update agentupdate and dockeragent packages to use shared function
- Add 14 test cases for version normalization
This prevents bugs like issue #773 where a fix applied to one copy
but not the other caused an update loop.
The unified agent got the version normalization fix (1b866598), but the
standalone docker agent's checkForUpdates() still used direct string
comparison. When server returns "4.34.0" and agent has "v4.34.0", this
caused an infinite self-update loop.
Apply the same normalizeVersion() function used in the unified agent.
Related to #773
When systemUsage counter goes backward (common in unprivileged LXC
containers), the previous code used the absolute value as systemDelta.
This created an artificially small denominator, inflating CPU to ~100%.
Now leaves systemDelta as 0 on counter reset, falling through to the
time-based calculation which produces accurate results.
Related to #770
- Default enableDocker to false in UI to prevent unintended Docker
agent activation on host-only installs (Related to #766)
- Deploy agent scripts and binaries during web UI upgrades, not just
the main binary (Related to #760)
- Apply symlink resolution fix to standalone docker agent self-update
to prevent cross-device rename failures (Related to #737)
Mark intentionally unused parameters with underscore to:
- Silence unparam warnings for legitimate unused parameters
- Keep function signatures intact for API compatibility
- Remove unused req from serveChecksum helper
- Use container.Summary instead of types.Container
- Use swarmtypes.ServiceListOptions instead of types.ServiceListOptions
- Use swarmtypes.TaskListOptions instead of types.TaskListOptions
These types were deprecated in favor of package-specific types.
Podman can return unstable or empty daemon IDs across API calls. When
the agent fetched info.ID on every report cycle, this could cause the
agent identity to change mid-session, triggering "token already in use"
errors on the server.
Cache the daemon ID at initialization and use it consistently for all
reports.
Related to #740
Issues found during scenario testing:
1. Version propagation: The hostagent and dockeragent packages were
reporting their own Version (0.1.0-dev) instead of the unified
agent's version. Added AgentVersion config field to pass the
parent's version down.
2. macOS legacy cleanup: The install.sh script was missing cleanup
for pulse-docker-agent on macOS.
3. Windows legacy cleanup: The install.ps1 script was missing cleanup
for legacy PulseHostAgent and PulseDockerAgent services.
These fixes ensure:
- Unified agent reports consistent version across host/docker metrics
- Legacy agents are properly removed on all platforms during upgrade
- Users migrating from legacy agents get a clean transition
Add seamless migration path from legacy agents to unified agent:
- Add AgentType field to report payloads (unified vs legacy detection)
- Update server to detect legacy agents by type instead of version
- Add UI banner showing upgrade command when legacy agents are detected
- Add deprecation notice to install-host-agent.ps1
- Create install-docker-agent.sh stub that redirects to unified installer
Legacy agents (pulse-host-agent, pulse-docker-agent) now show a "Legacy"
badge in the UI with a one-click copy command to upgrade to the unified
agent.
Related to #547 and #622
## Samsung SSD Fix (#547)
Samsung 980 and 990 series SSDs have known firmware bugs that cause them to
report incorrect health status (typically FAILED or critical warnings) even
when the drives are actually healthy. This is commonly due to incorrect
temperature threshold reporting in the firmware.
This change adds special handling to detect these drives and skip health
status alerts while still monitoring wearout metrics, which remain reliable.
The fix also clears any existing false alerts for these drives.
Users experiencing these false alerts should update their Samsung SSD firmware
to the latest version from Samsung, which typically resolves the issue.
## Docker Agent CPU Fix (#622)
Addresses issue where Docker container CPU usage shows 0%. The Docker
agent uses ContainerStatsOneShot which typically doesn't populate
PreCPUStats, requiring manual delta tracking between collection cycles.
Changes:
- Fix logic bug where prevContainerCPU was updated before checking if
previous sample existed, causing incorrect delta calculations
- Add comprehensive debug logging showing which calculation method
succeeded (PreCPUStats, system delta, or time-based fallback)
- Add warning after 10 PreCPUStats failures to inform about manual
tracking mode (normal for one-shot stats)
- Add detailed failure logging when CPU calculation cannot complete
Expected behavior: First collection cycle returns 0% (no previous
sample), subsequent cycles show accurate CPU metrics.
This commit addresses multiple issues in the Docker/host agent removal flow:
Agent Stop Fix:
- Add systemctl stop command after agent acknowledgement to prevent systemd restart
- Previous behavior: agent disabled but systemd immediately restarted it (Restart=always)
- New behavior: agent disables itself, sends ack, then stops systemd service completely
UX Improvements:
- Add real-time elapsed time counter during removal wait
- Show progress indicators prominently (no longer hidden in dropdown)
- Display expected time range (30-60 seconds) and last heartbeat
- Auto-show timeout warning after 2 minutes with actionable "Force remove" button
- Add contextual help explaining what's happening at each stage
Security Enhancement:
- Automatically revoke API tokens when removing Docker/host agents
- Previous behavior: tokens remained valid after agent removal
- New behavior: tokens are revoked and persisted immediately on removal
- Prevents removed agents from re-authenticating with old credentials
Extends Docker container monitoring with comprehensive disk and storage information:
- Writable layer size and root filesystem usage displayed in new Disk column
- Block I/O statistics (read/write bytes totals) shown in container drawer
- Mount metadata including type, source, destination, mode, and driver details
- Configurable via --collect-disk flag (enabled by default, can be disabled for large fleets)
Also fixes config watcher to consistently use production auth config path instead of following PULSE_DATA_DIR when in mock mode.