Commit graph

66 commits

Author SHA1 Message Date
rcourtman
93475f3941 Self-heal stale Proxmox auto-register markers (#1267) 2026-03-25 12:34:50 +00:00
rcourtman
930738593b Pass setup token in Proxmox auto-register requests (#1303) 2026-03-25 11:52:46 +00:00
rcourtman
572520ebc6 Promote guest-agent /proc/meminfo fallback for accurate VM memory (#1270)
Move the guest-agent file-read of /proc/meminfo earlier in the memory
fallback chain so it runs before RRD, giving real-time MemAvailable that
correctly excludes reclaimable buff/cache on Linux VMs. Also add
VM.GuestAgent.FileRead permission for PVE 9 and fix install.sh to use
comma-separated privilege strings.
2026-03-09 10:04:28 +00:00
rcourtman
fe0706f614 Fix cluster double-registration invalidating Proxmox credentials (#1319)
Two nodes in the same PVE cluster generated identical Proxmox API token
names, so the second node's setup rotated the shared token and broke the
first node. Include the hostname in the token name so each node gets its
own token. Also refresh the stored cluster credential on the server when
a new endpoint merges into an existing cluster entry.
2026-03-07 22:36:01 +00:00
rcourtman
499ab812e3 Fix post-release regressions and lock v5 to single-tenant runtime 2026-03-05 23:46:35 +00:00
rcourtman
b38488f2da fix(proxmox): stabilize pulse monitor token lifecycle 2026-03-03 10:57:19 +00:00
rcourtman
dacf0f86c4 fix(agent): collect temperature on FreeBSD via sysctl (#1254)
The agent gate only allowed temperature collection on Linux (lm-sensors).
FreeBSD exposes CPU and ACPI thermal zone temperatures via sysctl
(dev.cpu.N.temperature, hw.acpi.thermal.tzN.temperature). Parse sysctl
output directly in Go without shell involvement.
2026-02-20 19:00:40 +00:00
rcourtman
8c7d507ea4 fix(alerts): make --disk-exclude suppress Proxmox SSD wear/health alerts (#1142)
The --disk-exclude agent flag only filtered local metric collection but
had no effect on server-side Proxmox disk health and SSD wearout alerts,
which poll the Proxmox API directly. Users excluding disks (e.g.
--disk-exclude sda) still received alerts for those disks.

Agent now sends its DiskExclude patterns in each report. The server
stores them on the Host model and consults them during Proxmox disk
polling — excluded disks get a synthetic healthy status passed to
CheckDiskHealth so any existing alerts clear immediately.

Also adds FreeBSD pseudo-filesystem types (fdescfs, devfs, linprocfs,
linsysfs) to the virtual FS filter and /var/run/ to special mount
prefixes, fixing false disk-full alerts on FreeBSD for fdescfs mounts.
2026-02-20 13:31:52 +00:00
rcourtman
00afaec2ae fix(agent): add retry with backoff to Proxmox auto-registration (#1267, #1269, #1261, #1268)
registerWithPulse() was a one-shot call at agent startup — if it failed
(timing, transient network, Pulse not ready), the agent silently continued
as a generic Host forever. Wrap the HTTP POST in a retry loop with
exponential backoff (5s, 10s, 20s, 40s, 60s) and distinguish 4xx errors
(no retry) from 5xx/network errors (retry).
2026-02-18 16:05:40 +00:00
rcourtman
7efcec3120 fix(agents,ai): host URL field, AI Docker routing, Proxmox registration logging (#1197, #1210, #1267)
#1197: Add Custom URL input to the expanded host row in Settings → Agents.
Loads existing URL via HostMetadataAPI on row expand; saves on button click.
Only shown for host-type agent rows.

#1210: Fix agent_connected always false for Docker hosts on Proxmox VMs.
connectedAgentHostnames now also marks Docker host hostnames reachable when
their matching VM/LXC has a node with a connected Proxmox agent, mirroring
the routing logic already used in the control path.

#1267/#1269: Improve Proxmox auto-registration failure logging. Response body
is now included in the error message, and the warning directs users to delete
the state file to force re-registration rather than claiming the node exists.

(cherry picked from commit 305f6d3c94f0da4fc970450a6304da57d6d7fe80)
2026-02-18 12:57:09 +00:00
rcourtman
47adcbd8af feat(agent): add FreeBSD S.M.A.R.T. disk collection support (#1236)
Relax the Linux-only gate on SMART collection to also run on FreeBSD.
Add FreeBSD disk discovery via sysctl kern.disks (lsblk is Linux-only).
The smartctl invocation and JSON parsing are already platform-agnostic.
2026-02-10 12:44:15 +00:00
rcourtman
815c990e85 fix(proxmox): avoid 403 on apt update checks 2026-02-09 20:28:09 +00:00
rcourtman
5c18748742 Add SMART disk lifecycle monitoring with historical charts
Expand the smartctl collector to capture detailed SMART attributes (SATA
and NVMe), propagate them through the full data pipeline, persist them
as time-series metrics, and display them in an interactive disk detail
drawer with historical sparkline charts.

Backend: add SMARTAttributes struct, writeSMARTMetrics for persistent
storage, "disk" resource type in metrics API with live fallback.
Frontend: enhanced DiskList with Power-On column and SMART warnings,
new DiskDetail drawer matching NodeDrawer styling patterns, generic
HistoryChart metric support with proper tooltip formatting.
2026-02-04 13:35:40 +00:00
rcourtman
316a56299c fix(agent): grant PVEDatastoreAdmin for backup visibility
The unified agent's Proxmox setup was missing the PVEDatastoreAdmin
permission on /storage, causing local PVE backups to not appear in
Pulse's backup overview for users who set up nodes via the agent.

The UI-generated setup script already included this permission, but
the agent path (--enable-proxmox) did not, creating an inconsistency.

Related to #1139
2026-02-03 19:11:25 +00:00
rcourtman
19a67dd4f3 Update core infrastructure components
Config:
- AI configuration improvements
- API tokens handling
- Persistence layer updates

Host Agent:
- Command execution improvements
- Better test coverage

Infrastructure Discovery:
- Service improvements
- Enhanced test coverage

Models:
- State snapshot updates
- Model improvements

Monitoring:
- Polling improvements
- Guest config handling
- Storage config support

WebSocket:
- Hub tenant test updates

Service Discovery:
- New service discovery module
2026-01-28 16:52:35 +00:00
rcourtman
54a3e7f4af feat: add host agent sysinfo and improve test coverage
New Features:
- Add sysinfo module for system information collection
- Enhance agent with improved metrics handling

Test Coverage:
- Add sysinfo tests
- Add commands coverage tests
- Add hostagent coverage tests
- Add mock collector for testing
- Improve agent, metrics, sensors, and proxmox setup tests
2026-01-24 22:42:46 +00:00
rcourtman
8412cc7ddb fix: env overrides and OS-aware test improvements
- Add PBS/PMG polling interval environment variable overrides in config.go
- Fix temp path expectation in detect_root_test.go using filepath.Join
- Use EvalSymlinks for symlink target comparison in self_update_test.go
- Add Linux-only skip for MAC fallback test in agent_new_test.go
- Add OS-aware RAID/SMART assertions in agent_metrics_test.go
2026-01-22 13:49:05 +00:00
rcourtman
a383f06848 fix(test): add stateFileDir to TestRun_Legacy test setup 2026-01-20 17:43:58 +00:00
rcourtman
a6a8efaa65 test: Add comprehensive test coverage across packages
New test files with expanded coverage:

API tests:
- ai_handler_test.go: AI handler unit tests with mocking
- agent_profiles_tools_test.go: Profile management tests
- alerts_endpoints_test.go: Alert API endpoint tests
- alerts_test.go: Updated for interface changes
- audit_handlers_test.go: Audit handler tests
- frontend_embed_test.go: Frontend embedding tests
- metadata_handlers_test.go, metadata_provider_test.go: Metadata tests
- notifications_test.go: Updated for interface changes
- profile_suggestions_test.go: Profile suggestion tests
- saml_service_test.go: SAML authentication tests
- sensor_proxy_gate_test.go: Sensor proxy tests
- updates_test.go: Updated for interface changes

Agent tests:
- dockeragent/signature_test.go: Docker agent signature tests
- hostagent/agent_metrics_test.go: Host agent metrics tests
- hostagent/commands_test.go: Command execution tests
- hostagent/network_helpers_test.go: Network helper tests
- hostagent/proxmox_setup_test.go: Updated setup tests
- kubernetesagent/*_test.go: Kubernetes agent tests

Core package tests:
- monitoring/kubernetes_agents_test.go, reload_test.go
- remoteconfig/client_test.go, signature_test.go
- sensors/collector_test.go
- updates/adapter_installsh_*_test.go: Install adapter tests
- updates/manager_*_test.go: Update manager tests
- websocket/hub_*_test.go: WebSocket hub tests

Library tests:
- pkg/audit/export_test.go: Audit export tests
- pkg/metrics/store_test.go: Metrics store tests
- pkg/proxmox/*_test.go: Proxmox client tests
- pkg/reporting/reporting_test.go: Reporting tests
- pkg/server/*_test.go: Server tests
- pkg/tlsutil/extra_test.go: TLS utility tests

Total: ~8000 lines of new test code
2026-01-19 19:26:18 +00:00
rcourtman
d06ed2edb3 refactor: Add testability improvements to core packages
hostagent/commands.go:
- Extract execCommandContext as mockable variable

hostagent/proxmox_setup.go:
- Convert stateFilePath constants to variables (testable)
- Extract runCommand and lookPath as mockable functions
- Add duplicate comment (minor cleanup needed)

notifications/notifications.go:
- Add GetQueueStats() method for interface compliance
- Used by NotificationMonitor interface

updates/manager.go:
- Add AddSSEClient, RemoveSSEClient, GetSSECachedStatus methods
- Enables interface-based SSE client management

pkg/audit/export.go:
- Minor testability improvements

go.mod/go.sum:
- Add stretchr/objx v0.5.2 (test mocking dependency)
2026-01-19 19:25:38 +00:00
rcourtman
6ed1fdf806 feat(rbac): implement RBAC UI, OIDC group mapping, and API standard auth
- Added Roles and Users settings panels
- Implemented OIDC group-to-role mappings in config and auth flow
- Standardized API token context handling via pkg/auth
- Added Pulse Pro branding and upgrade banners to RBAC features
- Cleanup: Removed empty code blocks and fixed lint errors
2026-01-09 19:16:34 +00:00
rcourtman
5c4399d69f feat(agent): add DisableCeph toggle, report_ip remote config, and improved IP detection (#929) 2026-01-09 14:45:29 +00:00
rcourtman
6019e3e77e fix: normalize custom OpenAI-compatible API URLs (#1067)
Users providing base URLs like "https://openrouter.ai/api/v1" were
getting HTML error responses because the client used the URL directly
without appending "/chat/completions".

- Normalize baseURL in NewOpenAIClient to ensure it ends with /chat/completions
- Fix modelsEndpoint() to derive /models from the normalized baseURL
- Add tests for URL normalization with various endpoint formats
2026-01-09 09:13:36 +00:00
rcourtman
5f0214b949 fix: support ReportIP override in Proxmox auto-registration (#1061) 2026-01-08 21:20:51 +00:00
rcourtman
7db6b3e47d feat: Add AI chat session sync across devices
Implements server-side persistence for AI chat sessions, allowing users
to continue conversations across devices and browser sessions. Related
to #1059.

Backend:
- Add chat session CRUD API endpoints (GET/PUT/DELETE)
- Add persistence layer with per-user session storage
- Support session cleanup for old sessions (90 days)
- Multi-user support via auth context

Frontend:
- Rewrite aiChat store with server sync (debounced)
- Add session management UI (new conversation, switch, delete)
- Local storage as fallback/cache
- Initialize sync on app startup when AI is enabled
2026-01-08 10:47:45 +00:00
rcourtman
95fb896a03 fix: Agent 405 errors when reverse proxy redirects HTTP to HTTPS
When a user's reverse proxy redirects HTTP to HTTPS, Go's default HTTP
client behavior converts POST requests to GET on 301/302 redirects
(per HTTP specification). This causes the Pulse server to return 405
"Only POST is allowed" errors.

Added CheckRedirect to all agent HTTP clients (host, docker, kubernetes)
that returns a clear error message guiding users to use the correct
protocol in their --url flag instead of silently following redirects.

Related to #1058
2026-01-07 17:56:07 +00:00
rcourtman
3fdf753a5b Enhance devcontainer and CI workflows
- Add persistent volume mounts for Go/npm caches (faster rebuilds)
- Add shell config with helpful aliases and custom prompt
- Add comprehensive devcontainer documentation
- Add pre-commit hooks for Go formatting and linting
- Use go-version-file in CI workflows instead of hardcoded versions
- Simplify docker compose commands with --wait flag
- Add gitignore entries for devcontainer auth files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 22:29:15 +00:00
rcourtman
c1f4b8f40b feat: PULSE_DISK_EXCLUDE now applies to SMART monitoring. Related to #983
Previously, the PULSE_DISK_EXCLUDE environment variable and --disk-exclude
flag only filtered mount points in the hostmetrics collector. This change
extends the exclusion to SMART data collection.

Changes:
- Updated smartctl.CollectLocal() to accept diskExclude patterns
- Added matchesDeviceExclude() for block device pattern matching
- Patterns support: exact match (sda), prefix (nvme*), contains (*cache*)
- Updated hostagent to pass DiskExclude to SMART collector
- Added comprehensive tests for pattern matching
- Updated documentation
2025-12-31 23:07:01 +00:00
rcourtman
32111c7837 feat: Add --report-ip flag for multi-NIC systems (issue #945)
Allows specifying which IP address the agent should report, useful for:
- Multi-homed systems with separate management networks
- Systems with private monitoring interfaces
- VPN/overlay network scenarios

Usage:
  pulse-agent --report-ip 192.168.1.100
  PULSE_REPORT_IP=192.168.1.100 pulse-agent
2025-12-29 09:28:28 +00:00
rcourtman
39941a3927 fix(agent): use IP that can reach Pulse for registration
When a Proxmox host has multiple network interfaces (management, Ceph,
cluster ring), the agent would use heuristic scoring to pick an IP,
which could select an isolated network instead of the management network.

Now the agent first determines which local IP is actually used to connect
to the Pulse server, ensuring registration uses a reachable IP. Falls back
to the heuristic scoring if connection-based detection fails.

Related to #929
2025-12-27 17:06:20 +00:00
rcourtman
81718fcdaa fix(agent): use specific distro name instead of family for osName
Ubuntu was showing as "debian 24.04" because we used PlatformFamily
(which is "debian" for all Debian derivatives) instead of Platform
(which is "ubuntu" for Ubuntu).

Now uses Platform first, falling back to PlatformFamily only if empty.

Related to #927
2025-12-27 15:59:03 +00:00
rcourtman
861be84f8c fix(agent): improve backward compat for PBS-only hosts. Related to #925
The legacy state file could represent either PVE or PBS registration,
depending on what was installed at the time. Now we check what's
currently installed to determine the correct behavior:
- If PVE is installed: legacy file means PVE was registered
- If PBS-only (no PVE): legacy file means PBS was registered
2025-12-27 10:46:51 +00:00
rcourtman
0865ca3512 feat(agent): detect and register both PVE and PBS on same host. Related to #925
When PBS is installed directly on a PVE host (an officially supported
configuration), the agent now detects and registers BOTH products instead
of only detecting PVE.

Changes:
- Add detectProxmoxTypes() to detect all Proxmox products on a host
- Add RunAll() method to register each detected product separately
- Use per-type state files (proxmox-pve-registered, proxmox-pbs-registered)
  to track registration status for each product independently
- Maintain backward compatibility with legacy single state file
- Add tests for new state file path logic
2025-12-27 10:41:44 +00:00
rcourtman
08c04b78ae feat: add power consumption monitoring (Intel RAPL + AMD Energy)
- Add power.go with Intel RAPL and AMD energy driver support
- Read CPU package, core, and DRAM power consumption in watts
- Sample energy counters over 100ms interval to calculate power
- Add PowerWatts field to Sensors struct for API reporting
- Integrate power collection into host agent sensor gathering
- Add comprehensive tests for power collection module

Supports Intel CPUs (Sandy Bridge+) via RAPL and AMD Ryzen/EPYC
via the amd_energy kernel module.

Closes community-scripts/ProxmoxVE#9575
2025-12-25 21:14:12 +00:00
rcourtman
7dd6c0d57a feat: Collect and display all lm-sensors data (fans, DDR5, etc.)
Extended lm-sensors parsing to capture all sensor readings:
- Fan speeds (RPM) from SuperIO chips like NCT6687
- Additional temperatures (DDR5/RAM, motherboard, etc.)
- All sensors not already captured as CPU/NVMe/GPU

Updated frontend tooltip to display fans and additional sensors
in separate sections with formatted names.

Closes discussion #911
2025-12-25 19:08:03 +00:00
rcourtman
c1422882bd feat: Add disk exclusion filter for host agent. Closes #896
Users can now exclude specific mount points from disk monitoring:
- Via CLI: --disk-exclude /mnt/backup --disk-exclude '/media/*'
- Via env: PULSE_DISK_EXCLUDE=/mnt/backup,*pbs*

Patterns support:
- Exact paths: /mnt/backup
- Prefix patterns: /mnt/ext*
- Contains patterns: *pbs*

This addresses the common case where external disks or
PBS datastores are being monitored but shouldn't be.
2025-12-25 12:04:40 +00:00
rcourtman
8f9d5c1120 feat: Agent collects S.M.A.R.T. disk data via smartctl. Related to #907
- Add smartctl package to collect disk temperature and health data
- Add SMART field to agent Sensors struct
- Host agent now runs smartctl to collect disk temps when available
- Backend processes agent SMART data for temperature display
- Graceful fallback when smartctl not installed
2025-12-25 11:37:53 +00:00
rcourtman
9a9c50f8b1 fix: Properly close command client WebSocket when disabling remotely
When the server disables command execution for an agent, we now properly
call Close() on the command client to tear down the WebSocket connection.
Previously we just set the pointer to nil which left the goroutine running
with an orphaned connection.
2025-12-25 08:09:42 +00:00
rcourtman
55f63d5e96 feat(#903): Remote agent configuration for AI command execution
This implements full remote configuration for the AI command execution setting:

Backend:
- Add CommandsEnabled field to HostMetadata for persistent storage
- Add GetHostAgentConfig/UpdateHostAgentConfig methods to Monitor
- Add /api/agents/host/{id}/config endpoint (GET for agents, PATCH for UI)
- Server includes config in report response for immediate agent application
- Agent parses response and dynamically enables/disables command client

Frontend:
- Add 'AI Commands' toggle column in Managed Agents table
- Toggle immediately updates server config; agent applies on next heartbeat
- Add 'Enable AI command execution' checkbox in agent installer wizard
- Checkbox adds --enable-commands flag to generated install commands

This allows users to:
1. Enable at install time via checkbox in the wizard
2. Toggle remotely via the Managed Agents UI for existing agents
3. Agents apply changes automatically on their next report cycle
2025-12-25 08:07:28 +00:00
rcourtman
598285d3d2 feat: Agent reports CommandsEnabled status to server. Related to #903
- Add CommandsEnabled field to AgentInfo in pkg/agents/host/report.go
- Agent now reports whether AI command execution is enabled
- Server stores and exposes this via Host model
- Frontend can now show which agents have commands enabled
- This provides visibility before implementing remote configuration
2025-12-25 07:55:22 +00:00
rcourtman
2420c2affb feat: Commands disabled by default, require --enable-commands to opt-in
BREAKING CHANGE: AI command execution on agents is now disabled by default.
Users who want AI auto-fix must explicitly enable it with --enable-commands
flag or PULSE_ENABLE_COMMANDS=true environment variable.

Changes:
- Add --enable-commands flag (opt-in for command execution)
- Commands disabled by default for security (defense-in-depth)
- --disable-commands is now deprecated (logs warning, no longer needed)
- PULSE_DISABLE_COMMANDS deprecated in favor of PULSE_ENABLE_COMMANDS
- Update installer script to use --enable-commands
- Backwards compatibility: PULSE_DISABLE_COMMANDS=false still enables commands

This addresses community feedback about secure defaults for arbitrary
command execution on production infrastructure.

Related to #889
2025-12-24 17:36:44 +00:00
rcourtman
92988ae0e6 fix: allow duplicate hostnames for different Proxmox hosts. Related to #891
PROBLEM:
When two Proxmox hosts have the same hostname (e.g., 'px1' on different networks),
the auto-registration was matching by name and overwriting the first with the second.
This has been a recurring issue (#104) with at least 3 prior fix attempts.

ROOT CAUSE:
The auto-register handler matched existing nodes by BOTH Host URL and Name.
Matching by name is incorrect - different physical hosts can share hostnames.

FIXES:
1. Remove name-based matching in auto-registration - match by Host URL only
2. Add disambiguateNodeName() to append IP when duplicate hostnames exist
3. Add regression tests to prevent this from breaking again

Now when registering two hosts named 'px1':
- First becomes: px1
- Second becomes: px1 (10.0.2.224)
Both are stored as separate nodes with their own credentials.
2025-12-24 16:05:07 +00:00
rcourtman
28ac86c8ab fix: reduce WebSocket reconnection log noise in host agent
Addresses #866 - agents were logging 'WebSocket connection failed' warnings
even during normal reconnection scenarios (server restart, network blip, etc).

Changes:
- Normal close errors (1000, 1001, connection reset) now log at Debug level
- Only log Warning after 3+ consecutive failures
- Changed 'Connecting to Pulse' from Info to Debug to reduce noise
- Successful connections still log at Info level

The WebSocket is only used for AI command execution, not metrics, so
transient disconnections don't affect monitoring functionality.
2025-12-22 14:11:23 +00:00
rcourtman
c9fc827f4c fix: Prevent buffering and log actionable error for host agent 403s. Related to discussion #845 2025-12-22 09:51:27 +00:00
rcourtman
ebc3474647 hostagent: avoid identity collisions with MAC fallback (Related to #836) 2025-12-17 20:09:55 +00:00
rcourtman
30f01771ac Add meaningful tests for host agent and exec websocket 2025-12-17 17:02:01 +00:00
rcourtman
d663ba4342 hostagent: avoid host ID collisions and prefer LAN IP 2025-12-17 16:29:59 +00:00
rcourtman
204219ab7f fix(agent): use /etc/machine-id in LXC containers to avoid ID collisions
LXC containers share the host's /sys/class/dmi/id/product_uuid, which
causes gopsutil to return identical HostIDs for all LXC containers on
the same physical host. This results in agent ID collisions where
multiple LXC containers appear as a single host in Pulse.

The fix detects LXC containers and prefers /etc/machine-id (which is
unique per container) over gopsutil's HostID.

Related to #773
2025-12-14 23:05:32 +00:00
rcourtman
927ac76bad feat: AI integration, Docker metrics, RAID display, and infrastructure improvements
- Add Claude OAuth authentication support with hybrid API key/OAuth flow
- Implement Docker container historical metrics in backend and charts API
- Add CEPH cluster data collection and new Ceph page
- Enhance RAID status display with detailed tooltips and visual indicators
- Fix host deduplication logic with Docker bridge IP filtering
- Fix NVMe temperature collection in host agent
- Add comprehensive test coverage for new features
- Improve frontend sparklines and metrics history handling
- Fix navigation issues and frontend reload loops
2025-12-09 09:29:27 +00:00
rcourtman
8948e84fe5 feat: AI features, agent improvements, and host monitoring enhancements
AI Chat Integration:
- Multi-provider support (Anthropic, OpenAI, Ollama)
- Streaming responses with markdown rendering
- Agent command execution for remote troubleshooting
- Context-aware conversations with host/container metadata

Agent Updates:
- Add --enable-proxmox flag for automatic PVE/PBS token setup
- Improve auto-update with semver comparison (prevents downgrades)
- Add updatedFrom tracking to report previous version after update
- Reduce initial update check delay from 30s to 5s
- Add agent version column to Hosts page table

Host Metrics:
- Add DiskIO stats collection (read/write bytes, ops, time)
- Improve disk filtering to exclude Docker overlay mounts
- Add RAID array monitoring via mdadm
- Enhanced temperature sensor parsing

Frontend:
- New Agent Version column on Hosts overview table
- Improved node modal with agent-first installation flow
- Add DiskIO display in host drawer
- Better responsive handling for metric bars
2025-12-05 10:37:02 +00:00