Commit graph

70 commits

Author SHA1 Message Date
rcourtman
f76c1fb43b chore: update to non-deprecated Docker SDK types
- Use container.Summary instead of types.Container
- Use swarmtypes.ServiceListOptions instead of types.ServiceListOptions
- Use swarmtypes.TaskListOptions instead of types.TaskListOptions

These types were deprecated in favor of package-specific types.
2025-11-27 09:36:05 +00:00
rcourtman
dc4669f9f6 security: harden agent installers and auto-update mechanism
Install script (scripts/install.sh):
- Add multi-platform support: Unraid, OpenRC/Alpine, Synology DSM 6/7
- Add input validation for URL, token format, and interval
- Add binary magic verification (ELF/Mach-O/PE)
- Add cleanup trap for temp files
- Wrap script in main() for partial download protection
- Fix shellcheck compliance issues
- Add curl timeouts

Agent auto-update (agentupdate, dockeragent):
- Enforce TLS 1.2 minimum version
- Make SHA256 checksum verification mandatory
- Add 100MB binary size limit
- Add binary magic verification before replacement
- Add Unraid persistent binary update after self-update
- Add 5-minute download timeout

Frontend:
- Update Linux install description to note auto-detection of init systems
2025-11-26 13:14:58 +00:00
rcourtman
9daf1d5398 fix: cache daemon ID at init to prevent Podman token binding conflicts
Podman can return unstable or empty daemon IDs across API calls. When
the agent fetched info.ID on every report cycle, this could cause the
agent identity to change mid-session, triggering "token already in use"
errors on the server.

Cache the daemon ID at initialization and use it consistently for all
reports.

Related to #740
2025-11-26 10:23:22 +00:00
rcourtman
ae3b78d661 fix: propagate unified agent version and improve legacy cleanup
Issues found during scenario testing:

1. Version propagation: The hostagent and dockeragent packages were
   reporting their own Version (0.1.0-dev) instead of the unified
   agent's version. Added AgentVersion config field to pass the
   parent's version down.

2. macOS legacy cleanup: The install.sh script was missing cleanup
   for pulse-docker-agent on macOS.

3. Windows legacy cleanup: The install.ps1 script was missing cleanup
   for legacy PulseHostAgent and PulseDockerAgent services.

These fixes ensure:
- Unified agent reports consistent version across host/docker metrics
- Legacy agents are properly removed on all platforms during upgrade
- Users migrating from legacy agents get a clean transition
2025-11-25 23:39:10 +00:00
rcourtman
ea335546fc feat: improve legacy agent detection and migration UX
Add seamless migration path from legacy agents to unified agent:

- Add AgentType field to report payloads (unified vs legacy detection)
- Update server to detect legacy agents by type instead of version
- Add UI banner showing upgrade command when legacy agents are detected
- Add deprecation notice to install-host-agent.ps1
- Create install-docker-agent.sh stub that redirects to unified installer

Legacy agents (pulse-host-agent, pulse-docker-agent) now show a "Legacy"
badge in the UI with a one-click copy command to upgrade to the unified
agent.
2025-11-25 23:26:22 +00:00
courtmanr@gmail.com
4640633430 Improve agent update logging and installer warnings (related to #737) 2025-11-23 22:07:37 +00:00
rcourtman
6fb839cbdf Add log level control for docker agent
Related to #742
2025-11-22 07:43:48 +00:00
rcourtman
fdcec85931 Fix critical version embedding issues for 4.26 release
Addresses the root cause of issue #631 (infinite Docker agent restart loop)
and prevents similar issues with host-agent and sensor-proxy.

Changes:
- Set dockeragent.Version default to "dev" instead of hardcoded version
- Add version embedding to server build in Dockerfile
- Add version embedding to host-agent builds (all platforms)
- Add version embedding to sensor-proxy builds (all platforms)

This ensures:
1. Server's /api/agent/version endpoint returns correct v4.26.0
2. Downloaded agent binaries have matching embedded versions
3. Dev builds skip auto-update (Version="dev")
4. No version mismatch triggers infinite restart loops

Related to #631
2025-11-06 11:42:52 +00:00
rcourtman
b44084af3c Skip false health alerts for Samsung 980/990 SSDs and improve Docker CPU calculation
Related to #547 and #622

## Samsung SSD Fix (#547)
Samsung 980 and 990 series SSDs have known firmware bugs that cause them to
report incorrect health status (typically FAILED or critical warnings) even
when the drives are actually healthy. This is commonly due to incorrect
temperature threshold reporting in the firmware.

This change adds special handling to detect these drives and skip health
status alerts while still monitoring wearout metrics, which remain reliable.
The fix also clears any existing false alerts for these drives.

Users experiencing these false alerts should update their Samsung SSD firmware
to the latest version from Samsung, which typically resolves the issue.

## Docker Agent CPU Fix (#622)
Addresses issue where Docker container CPU usage shows 0%. The Docker
agent uses ContainerStatsOneShot which typically doesn't populate
PreCPUStats, requiring manual delta tracking between collection cycles.

Changes:
- Fix logic bug where prevContainerCPU was updated before checking if
  previous sample existed, causing incorrect delta calculations
- Add comprehensive debug logging showing which calculation method
  succeeded (PreCPUStats, system delta, or time-based fallback)
- Add warning after 10 PreCPUStats failures to inform about manual
  tracking mode (normal for one-shot stats)
- Add detailed failure logging when CPU calculation cannot complete

Expected behavior: First collection cycle returns 0% (no previous
sample), subsequent cycles show accurate CPU metrics.
2025-11-05 19:33:16 +00:00
rcourtman
adda6eea38 Update docker CPU metrics and add OpenRC installer support (Refs #255) 2025-11-04 22:16:50 +00:00
rcourtman
6eb1a10d9b Refactor: Code cleanup and localStorage consolidation
This commit includes comprehensive codebase cleanup and refactoring:

## Code Cleanup
- Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate)
- Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo)
- Clean up commented-out code blocks across multiple files
- Remove unused TypeScript exports (helpTextClass, private tag color helpers)
- Delete obsolete test files and components

## localStorage Consolidation
- Centralize all storage keys into STORAGE_KEYS constant
- Update 5 files to use centralized keys:
  * utils/apiClient.ts (AUTH, LEGACY_TOKEN)
  * components/Dashboard/Dashboard.tsx (GUEST_METADATA)
  * components/Docker/DockerHosts.tsx (DOCKER_METADATA)
  * App.tsx (PLATFORMS_SEEN)
  * stores/updates.ts (UPDATES)
- Benefits: Single source of truth, prevents typos, better maintainability

## Previous Work Committed
- Docker monitoring improvements and disk metrics
- Security enhancements and setup fixes
- API refactoring and cleanup
- Documentation updates
- Build system improvements

## Testing
- All frontend tests pass (29 tests)
- All Go tests pass (15 packages)
- Production build successful
- Zero breaking changes

Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)
2025-11-04 21:50:46 +00:00
rcourtman
5c4be1921c chore: snapshot current changes 2025-11-02 22:47:55 +00:00
rcourtman
730c6bf864 Fix Docker agent removal and improve security
This commit addresses multiple issues in the Docker/host agent removal flow:

Agent Stop Fix:
- Add systemctl stop command after agent acknowledgement to prevent systemd restart
- Previous behavior: agent disabled but systemd immediately restarted it (Restart=always)
- New behavior: agent disables itself, sends ack, then stops systemd service completely

UX Improvements:
- Add real-time elapsed time counter during removal wait
- Show progress indicators prominently (no longer hidden in dropdown)
- Display expected time range (30-60 seconds) and last heartbeat
- Auto-show timeout warning after 2 minutes with actionable "Force remove" button
- Add contextual help explaining what's happening at each stage

Security Enhancement:
- Automatically revoke API tokens when removing Docker/host agents
- Previous behavior: tokens remained valid after agent removal
- New behavior: tokens are revoked and persisted immediately on removal
- Prevents removed agents from re-authenticating with old credentials
2025-10-29 12:27:36 +00:00
rcourtman
32392d1212 Add disk metrics, block I/O, and mount details to Docker monitoring
Extends Docker container monitoring with comprehensive disk and storage information:
- Writable layer size and root filesystem usage displayed in new Disk column
- Block I/O statistics (read/write bytes totals) shown in container drawer
- Mount metadata including type, source, destination, mode, and driver details
- Configurable via --collect-disk flag (enabled by default, can be disabled for large fleets)

Also fixes config watcher to consistently use production auth config path instead of following PULSE_DATA_DIR when in mock mode.
2025-10-29 12:05:36 +00:00
rcourtman
f2acdd59af Normalize docker agent version handling 2025-10-28 08:42:58 +00:00
rcourtman
68ce8e7520 feat: finalize swarm service monitoring (#598) 2025-10-26 09:35:49 +00:00
rcourtman
8e83eaf823 Add container state filtering to Docker agent 2025-10-25 21:40:59 +00:00
rcourtman
79dc620b34 Docker agent: add arch-aware self-update download
Refs #526
2025-10-16 08:43:59 +00:00
rcourtman
91fecacfef feat: add docker agent command handling 2025-10-15 19:27:19 +00:00
rcourtman
f46ff1792b Fix settings security tab navigation 2025-10-11 23:29:47 +00:00