Commit graph

51 commits

Author SHA1 Message Date
rcourtman
a31c1a4547 Replace Docker SDK with split Moby modules 2026-03-27 23:50:41 +00:00
rcourtman
0493fb78bf fix(agent): detect rootless Docker/Podman sockets for other users (#1200)
When the agent runs as root, os.Getuid() returns 0 so it only probes
/run/user/0/docker.sock. Rootless Docker installs live at
/run/user/1000/docker.sock (or similar). Glob /run/user/*/docker.sock
and /run/user/*/podman/podman.sock to discover sockets for all users.
2026-03-05 10:01:43 +00:00
Surendra Raika
f663aade53 feat(docker): add macOS Docker Desktop socket auto-detection
Probe ~/.docker/run/docker.sock for RuntimeDocker and RuntimeAuto
before falling back to /var/run/docker.sock. This lets the agent
connect on macOS without requiring DOCKER_HOST to be set manually.

Ref #1200
2026-02-18 19:23:14 +05:30
rcourtman
9d8f8b45b5 fix(docker,metrics): preserve container metadata on update and reduce DB writes
Docker container URL preserved on update (#1054): container updates
recreate the container with a new runtime ID. The agent now includes
{oldContainerId, newContainerId} in the completion ACK payload; the
server uses this to copy persisted metadata (custom URLs, descriptions,
tags) to the new ID so nothing is lost. Migration is a copy, not a move,
so rollback scenarios still find metadata under the original ID.

Reduce metrics.db write amplification (#1124): add a UNIQUE index on
(resource_type, resource_id, metric_type, timestamp, tier) so rollup
reprocessing after a failed checkpoint uses INSERT OR IGNORE instead of
creating duplicate rows. Existing duplicates are deduplicated once on
startup if the index creation would otherwise fail. Also sets
wal_autocheckpoint(500) to checkpoint the WAL more frequently, preventing
unbounded WAL growth.

Fixes #1054
Fixes #1124
2026-02-18 12:56:46 +00:00
rcourtman
7522f6599c fix(agent): three backend fixes for FreeBSD, Docker rootless, and duplicate PVE hosts
FreeBSD auto-update (#1254): determineArch() now includes freebsd in its
OS switch, producing freebsd-amd64/arm64 instead of falling through to
a uname -m fallback that incorrectly returned linux-<arch>. FreeBSD agents
were downloading Linux ELF binaries and failing to exec them.

Docker rootless socket (#1200): buildRuntimeCandidates() now probes
/run/user/<uid>/docker.sock before the system-wide /var/run/docker.sock,
enabling auto-detection of Docker rootless installations.

Duplicate PVE/PBS hosts (#1245, #1252): handleSecureAutoRegister() now
deduplicates by host URL, updating the existing instance's token in-place
instead of appending a duplicate entry on each re-run of the setup script.

Fixes #1254
Fixes #1200
Fixes #1245
Fixes #1252

(cherry picked from commit 0f1d9e9b9fea6c8b9e65872e8a78e25f93653eef)
2026-02-18 12:53:25 +00:00
rcourtman
a68e0050f8 fix(docker): use manual CPU delta tracking instead of stale PreCPUStats (#1229)
Docker's one-shot stats API (stream=false) returns PreCPUStats from the
daemon's internal cache, which many Docker versions don't update between
non-streaming reads. This causes every call to return the same stale
PreCPUStats from container start, producing a constant lifetime-average
CPU% (e.g. 3.4%) instead of current usage.

Switch to always using manual delta tracking, which stores the previous
sample from our own reads and computes accurate deltas between collection
cycles. The first cycle returns 0 while establishing a baseline; all
subsequent cycles produce correct current CPU percentages.
2026-02-10 20:49:29 +00:00
rcourtman
26776b2075 fix(agent): apply --disk-exclude to Docker agent disk metrics (#1237)
The Docker agent was not passing the disk exclusion list to
hostmetricsCollect(), so excluded mounts appeared in the Docker tab
disk totals. Also add server-side fsfilters filtering to Docker
report processing for parity with the host agent path.
2026-02-10 16:59:35 +00:00
rcourtman
035436ad6e fix: add mutex to prevent concurrent map writes in Docker agent CPU tracking
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.

Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()

Related to #1063
2026-01-15 21:10:55 +00:00
rcourtman
95fb896a03 fix: Agent 405 errors when reverse proxy redirects HTTP to HTTPS
When a user's reverse proxy redirects HTTP to HTTPS, Go's default HTTP
client behavior converts POST requests to GET on 301/302 redirects
(per HTTP specification). This causes the Pulse server to return 405
"Only POST is allowed" errors.

Added CheckRedirect to all agent HTTP clients (host, docker, kubernetes)
that returns a clear error message guiding users to use the correct
protocol in their --url flag instead of silently following redirects.

Related to #1058
2026-01-07 17:56:07 +00:00
rcourtman
74ea90e4b3 fix: Podman sockets not prioritized when --docker-runtime=podman
When --docker-runtime=podman is explicitly set, the agent should try
Podman-specific sockets first before falling back to environment
defaults (which try /var/run/docker.sock).

Also adds /var/run/podman/podman.sock as a candidate socket path,
which is used by CoreOS and some Fedora configurations.

Related to #1045
2026-01-06 10:56:37 +00:00
rcourtman
fd7e80ae17 fix: Add clear warning when Docker token is already in use
When a Docker agent tries to register with a token that's already bound
to another agent, the error was logged generically as "Failed to send
docker report". Users had to dig into logs to understand the issue.

Now logs a prominent error message:
"DOCKER REGISTRATION FAILED: This API token is already used by another
Docker agent. Each Docker host requires its own unique token. Generate
a new token in Pulse Settings > Agents and reinstall with the new token."

Related to #1027
2026-01-03 20:56:04 +00:00
rcourtman
e3b3785582 feat(agent): add option to disable Docker update checks
Add PULSE_DISABLE_DOCKER_UPDATE_CHECKS environment variable and
--disable-docker-update-checks flag to disable Docker image update
detection. This is useful for:
- Avoiding Docker Hub rate limits
- Users who don't want update notifications in their dashboard

Related to Discussion #982
2026-01-01 00:20:49 +00:00
rcourtman
4225f905b0 feat: Add manual Docker update check button. Related to #955 2025-12-29 23:37:05 +00:00
rcourtman
d07b471e40 Refactor Docker agent: metrics collection, security checks, and batch updates
- Separated metrics collection into internal/dockeragent/collect.go
- Added agent self-update pre-flight check (--self-test)
- Implemented signed binary verification with key rotation for updates
- Added batch update support to frontend with parallel processing
- Cleaned up agent.go and added startup cleanup for backup containers
- Updated documentation for Docker features and agent security
2025-12-29 17:20:18 +00:00
rcourtman
053a40d7df fix: Docker container update detection showing false positives
Fixed an issue where all Docker containers were showing 'click to update'
even when they were up to date. The root cause was comparing the wrong
digest types:

- Previously: Compared ImageID (local config hash) vs registry manifest digest
- Now: Uses RepoDigests from image inspect, which is the actual manifest
  digest that Docker received from the registry when pulling the image

For multi-arch images, the registry returns a manifest list digest, while
Docker stores the platform-specific image config digest locally. These
will never match, causing false positives for all multi-arch images.

Changes:
- Added ImageInspectWithRaw to dockerClient interface
- Added getImageRepoDigest method to extract RepoDigest from image
- Added matchesImageReference helper for Docker Hub naming conventions
- Added tests for matchesImageReference

Fixes #955
2025-12-29 13:49:04 +00:00
rcourtman
44fa50eed7 feat(dockeragent): improve test coverage and refactor registry dependencies
- Add comprehensive test coverage for agent report, flush buffer, and deps
- Expand flow, HTTP, CPU, and swarm test coverage
- Refactor registry access to use deps interface for better testability
- Add container update and self-update test scenarios
2025-12-29 09:57:45 +00:00
rcourtman
32111c7837 feat: Add --report-ip flag for multi-NIC systems (issue #945)
Allows specifying which IP address the agent should report, useful for:
- Multi-homed systems with separate management networks
- Systems with private monitoring interfaces
- VPN/overlay network scenarios

Usage:
  pulse-agent --report-ip 192.168.1.100
  PULSE_REPORT_IP=192.168.1.100 pulse-agent
2025-12-29 09:28:28 +00:00
rcourtman
ae1c39960f fix: Remove duplicate AI chat response streaming (issue #947)
Content was being streamed twice:
1. During each iteration of the tool loop (intended for intermediate feedback)
2. Again after the loop ended with finalContent (redundant)

This caused duplicate responses when using Ollama and other providers.
2025-12-29 09:18:05 +00:00
rcourtman
2bf8e044df feat: Add Docker container update capability
- Add container update command handling to unified agent
- Agent can now receive update_container commands from Pulse server
- Pulls latest image, stops container, creates backup, starts new container
- Automatic rollback on failure
- Backup container cleaned up after 5 minutes
- Added comprehensive test coverage for container update logic
2025-12-29 09:00:40 +00:00
rcourtman
3040800e7b fix: AI Patrol now respects exact user-configured thresholds
BREAKING CHANGE: AI Patrol now uses EXACT alert thresholds by default
instead of warning 5-15% before the threshold.

Changes:
- Default behavior: Patrol warns at your configured threshold (e.g., 96% = warns at 96%)
- New setting: 'use_proactive_thresholds' enables the old early-warning behavior
- API: Added use_proactive_thresholds to GET/PUT /api/settings/ai
- Backend: Added SetProactiveMode/GetProactiveMode to PatrolService
- Backend: Added GetThresholds to PatrolService for UI display
- Tests: Updated and added tests for both exact and proactive modes
- Also fixed unused imports in dockeragent/agent.go

When proactive mode is disabled (default):
- Watch: threshold - 5% (slight buffer)
- Warning: exact threshold

When proactive mode is enabled:
- Watch: threshold - 15%
- Warning: threshold - 5%

Related to #951
2025-12-29 08:40:34 +00:00
rcourtman
9f3367da36 fix: Include GuestURL in NodeFrontend for cluster node navigation
The GuestURL field was missing from NodeFrontend and its converter,
causing configured Guest URLs to be ignored when clicking on cluster
node names. The frontend would fall back to the auto-detected IP
instead of using the user-configured Guest URL.

Related to #940
2025-12-28 14:49:49 +00:00
rcourtman
b50872b686 feat: Implement unified update detection system (Phase 1)
Docker container image update detection with full stack implementation:

Backend:
- Add internal/updatedetection package with types, store, registry checker, manager
- Add registry checking to Docker agent (internal/dockeragent/registry.go)
- Add ImageDigest and UpdateStatus fields to container reports
- Add /api/infra-updates API endpoints for querying updates
- Integrate with alert system - fires after 24h of pending updates

Frontend:
- Add UpdateBadge and UpdateIcon components for update indicators
- Add updateStatus to DockerContainer TypeScript interface
- Display blue update badges in Docker unified table image column
- Add 'has:update' search filter support

Features:
- Registry digest comparison for Docker Hub, GHCR, private registries
- Auth token handling for Docker Hub public images
- Caching with 6h TTL (15min for errors)
- Configurable alert delay via UpdateAlertDelayHours (default: 24h)
- Alert metadata includes digests, pending time, image info
2025-12-27 17:58:38 +00:00
rcourtman
86e41effc0 feat: Display environment variables for Docker containers
- Add Env field to Container struct in pkg/agents/docker/report.go
- Extract env vars from inspect.Config.Env in Docker agent
- Mask sensitive values (password, secret, key, token, etc.) with ***
- Display env vars in container drawer with green badges (amber for masked)
- Add tests for maskSensitiveEnvVars function

Related to #916
2025-12-25 23:52:57 +00:00
rcourtman
c1422882bd feat: Add disk exclusion filter for host agent. Closes #896
Users can now exclude specific mount points from disk monitoring:
- Via CLI: --disk-exclude /mnt/backup --disk-exclude '/media/*'
- Via env: PULSE_DISK_EXCLUDE=/mnt/backup,*pbs*

Patterns support:
- Exact paths: /mnt/backup
- Prefix patterns: /mnt/ext*
- Contains patterns: *pbs*

This addresses the common case where external disks or
PBS datastores are being monitored but shouldn't be.
2025-12-25 12:04:40 +00:00
rcourtman
d12ab31703 feat(docker-agent): add payload size logging for debugging body-too-large errors
Related to #823

- Log payload size (in KB and bytes) at debug level
- Warn when payload approaches 400KB (512KB limit)
- Helps diagnose 'request body too large' errors
2025-12-14 21:10:06 +00:00
rcourtman
b4a33c4f2d Fix offline buffering: add tests, remove unused config, fix flaky test
- Add unit tests for internal/buffer package
- Fix misleading "ring buffer" comment (it's a bounded FIFO queue)
- Remove unused BufferCapacity config field from both agents
- Rewrite flaky integration test to use polling instead of fixed sleeps
2025-12-02 22:31:44 +00:00
courtmanr@gmail.com
caf0c10206 feat: Implement offline buffering for host and docker agents
- Add internal/buffer package with generic ring buffer
- Add buffering logic to host agent for failed reports
- Add buffering logic to docker agent for failed reports
- Add BufferCapacity configuration option
- Add integration tests for buffering logic
2025-12-02 22:12:47 +00:00
rcourtman
b1bc704e3a Consolidate duplicate normalizeVersion functions into shared utility
- Move normalizeVersion to utils.NormalizeVersion for single source of truth
- Update agentupdate and dockeragent packages to use shared function
- Add 14 test cases for version normalization

This prevents bugs like issue #773 where a fix applied to one copy
but not the other caused an update loop.
2025-11-29 22:57:33 +00:00
rcourtman
04d1e1bcf4 Fix standalone docker agent version comparison prefix mismatch
The unified agent got the version normalization fix (1b866598), but the
standalone docker agent's checkForUpdates() still used direct string
comparison. When server returns "4.34.0" and agent has "v4.34.0", this
caused an infinite self-update loop.

Apply the same normalizeVersion() function used in the unified agent.

Related to #773
2025-11-29 00:04:43 +00:00
rcourtman
b5798012fc Fix Docker CPU calculation on systemUsage counter reset
When systemUsage counter goes backward (common in unprivileged LXC
containers), the previous code used the absolute value as systemDelta.
This created an artificially small denominator, inflating CPU to ~100%.

Now leaves systemDelta as 0 on counter reset, falling through to the
time-based calculation which produces accurate results.

Related to #770
2025-11-28 15:07:49 +00:00
rcourtman
d425bc3df4 fix: multiple agent installation and update issues
- Default enableDocker to false in UI to prevent unintended Docker
  agent activation on host-only installs (Related to #766)
- Deploy agent scripts and binaries during web UI upgrades, not just
  the main binary (Related to #760)
- Apply symlink resolution fix to standalone docker agent self-update
  to prevent cross-device rename failures (Related to #737)
2025-11-27 15:49:03 +00:00
rcourtman
8152197207 fix: mark unused parameters to satisfy unparam linter
Mark intentionally unused parameters with underscore to:
- Silence unparam warnings for legitimate unused parameters
- Keep function signatures intact for API compatibility
- Remove unused req from serveChecksum helper
2025-11-27 10:12:48 +00:00
rcourtman
f76c1fb43b chore: update to non-deprecated Docker SDK types
- Use container.Summary instead of types.Container
- Use swarmtypes.ServiceListOptions instead of types.ServiceListOptions
- Use swarmtypes.TaskListOptions instead of types.TaskListOptions

These types were deprecated in favor of package-specific types.
2025-11-27 09:36:05 +00:00
rcourtman
dc4669f9f6 security: harden agent installers and auto-update mechanism
Install script (scripts/install.sh):
- Add multi-platform support: Unraid, OpenRC/Alpine, Synology DSM 6/7
- Add input validation for URL, token format, and interval
- Add binary magic verification (ELF/Mach-O/PE)
- Add cleanup trap for temp files
- Wrap script in main() for partial download protection
- Fix shellcheck compliance issues
- Add curl timeouts

Agent auto-update (agentupdate, dockeragent):
- Enforce TLS 1.2 minimum version
- Make SHA256 checksum verification mandatory
- Add 100MB binary size limit
- Add binary magic verification before replacement
- Add Unraid persistent binary update after self-update
- Add 5-minute download timeout

Frontend:
- Update Linux install description to note auto-detection of init systems
2025-11-26 13:14:58 +00:00
rcourtman
9daf1d5398 fix: cache daemon ID at init to prevent Podman token binding conflicts
Podman can return unstable or empty daemon IDs across API calls. When
the agent fetched info.ID on every report cycle, this could cause the
agent identity to change mid-session, triggering "token already in use"
errors on the server.

Cache the daemon ID at initialization and use it consistently for all
reports.

Related to #740
2025-11-26 10:23:22 +00:00
rcourtman
ae3b78d661 fix: propagate unified agent version and improve legacy cleanup
Issues found during scenario testing:

1. Version propagation: The hostagent and dockeragent packages were
   reporting their own Version (0.1.0-dev) instead of the unified
   agent's version. Added AgentVersion config field to pass the
   parent's version down.

2. macOS legacy cleanup: The install.sh script was missing cleanup
   for pulse-docker-agent on macOS.

3. Windows legacy cleanup: The install.ps1 script was missing cleanup
   for legacy PulseHostAgent and PulseDockerAgent services.

These fixes ensure:
- Unified agent reports consistent version across host/docker metrics
- Legacy agents are properly removed on all platforms during upgrade
- Users migrating from legacy agents get a clean transition
2025-11-25 23:39:10 +00:00
rcourtman
ea335546fc feat: improve legacy agent detection and migration UX
Add seamless migration path from legacy agents to unified agent:

- Add AgentType field to report payloads (unified vs legacy detection)
- Update server to detect legacy agents by type instead of version
- Add UI banner showing upgrade command when legacy agents are detected
- Add deprecation notice to install-host-agent.ps1
- Create install-docker-agent.sh stub that redirects to unified installer

Legacy agents (pulse-host-agent, pulse-docker-agent) now show a "Legacy"
badge in the UI with a one-click copy command to upgrade to the unified
agent.
2025-11-25 23:26:22 +00:00
courtmanr@gmail.com
4640633430 Improve agent update logging and installer warnings (related to #737) 2025-11-23 22:07:37 +00:00
rcourtman
6fb839cbdf Add log level control for docker agent
Related to #742
2025-11-22 07:43:48 +00:00
rcourtman
b44084af3c Skip false health alerts for Samsung 980/990 SSDs and improve Docker CPU calculation
Related to #547 and #622

## Samsung SSD Fix (#547)
Samsung 980 and 990 series SSDs have known firmware bugs that cause them to
report incorrect health status (typically FAILED or critical warnings) even
when the drives are actually healthy. This is commonly due to incorrect
temperature threshold reporting in the firmware.

This change adds special handling to detect these drives and skip health
status alerts while still monitoring wearout metrics, which remain reliable.
The fix also clears any existing false alerts for these drives.

Users experiencing these false alerts should update their Samsung SSD firmware
to the latest version from Samsung, which typically resolves the issue.

## Docker Agent CPU Fix (#622)
Addresses issue where Docker container CPU usage shows 0%. The Docker
agent uses ContainerStatsOneShot which typically doesn't populate
PreCPUStats, requiring manual delta tracking between collection cycles.

Changes:
- Fix logic bug where prevContainerCPU was updated before checking if
  previous sample existed, causing incorrect delta calculations
- Add comprehensive debug logging showing which calculation method
  succeeded (PreCPUStats, system delta, or time-based fallback)
- Add warning after 10 PreCPUStats failures to inform about manual
  tracking mode (normal for one-shot stats)
- Add detailed failure logging when CPU calculation cannot complete

Expected behavior: First collection cycle returns 0% (no previous
sample), subsequent cycles show accurate CPU metrics.
2025-11-05 19:33:16 +00:00
rcourtman
adda6eea38 Update docker CPU metrics and add OpenRC installer support (Refs #255) 2025-11-04 22:16:50 +00:00
rcourtman
6eb1a10d9b Refactor: Code cleanup and localStorage consolidation
This commit includes comprehensive codebase cleanup and refactoring:

## Code Cleanup
- Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate)
- Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo)
- Clean up commented-out code blocks across multiple files
- Remove unused TypeScript exports (helpTextClass, private tag color helpers)
- Delete obsolete test files and components

## localStorage Consolidation
- Centralize all storage keys into STORAGE_KEYS constant
- Update 5 files to use centralized keys:
  * utils/apiClient.ts (AUTH, LEGACY_TOKEN)
  * components/Dashboard/Dashboard.tsx (GUEST_METADATA)
  * components/Docker/DockerHosts.tsx (DOCKER_METADATA)
  * App.tsx (PLATFORMS_SEEN)
  * stores/updates.ts (UPDATES)
- Benefits: Single source of truth, prevents typos, better maintainability

## Previous Work Committed
- Docker monitoring improvements and disk metrics
- Security enhancements and setup fixes
- API refactoring and cleanup
- Documentation updates
- Build system improvements

## Testing
- All frontend tests pass (29 tests)
- All Go tests pass (15 packages)
- Production build successful
- Zero breaking changes

Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)
2025-11-04 21:50:46 +00:00
rcourtman
5c4be1921c chore: snapshot current changes 2025-11-02 22:47:55 +00:00
rcourtman
730c6bf864 Fix Docker agent removal and improve security
This commit addresses multiple issues in the Docker/host agent removal flow:

Agent Stop Fix:
- Add systemctl stop command after agent acknowledgement to prevent systemd restart
- Previous behavior: agent disabled but systemd immediately restarted it (Restart=always)
- New behavior: agent disables itself, sends ack, then stops systemd service completely

UX Improvements:
- Add real-time elapsed time counter during removal wait
- Show progress indicators prominently (no longer hidden in dropdown)
- Display expected time range (30-60 seconds) and last heartbeat
- Auto-show timeout warning after 2 minutes with actionable "Force remove" button
- Add contextual help explaining what's happening at each stage

Security Enhancement:
- Automatically revoke API tokens when removing Docker/host agents
- Previous behavior: tokens remained valid after agent removal
- New behavior: tokens are revoked and persisted immediately on removal
- Prevents removed agents from re-authenticating with old credentials
2025-10-29 12:27:36 +00:00
rcourtman
32392d1212 Add disk metrics, block I/O, and mount details to Docker monitoring
Extends Docker container monitoring with comprehensive disk and storage information:
- Writable layer size and root filesystem usage displayed in new Disk column
- Block I/O statistics (read/write bytes totals) shown in container drawer
- Mount metadata including type, source, destination, mode, and driver details
- Configurable via --collect-disk flag (enabled by default, can be disabled for large fleets)

Also fixes config watcher to consistently use production auth config path instead of following PULSE_DATA_DIR when in mock mode.
2025-10-29 12:05:36 +00:00
rcourtman
f2acdd59af Normalize docker agent version handling 2025-10-28 08:42:58 +00:00
rcourtman
68ce8e7520 feat: finalize swarm service monitoring (#598) 2025-10-26 09:35:49 +00:00
rcourtman
8e83eaf823 Add container state filtering to Docker agent 2025-10-25 21:40:59 +00:00
rcourtman
79dc620b34 Docker agent: add arch-aware self-update download
Refs #526
2025-10-16 08:43:59 +00:00
rcourtman
91fecacfef feat: add docker agent command handling 2025-10-15 19:27:19 +00:00