Commit graph

88 commits

Author SHA1 Message Date
rcourtman
c51708000f Tighten unified agent hardening proof 2026-04-23 23:37:25 +01:00
rcourtman
9bada35337 Harden unified agent runtime and installer 2026-04-23 23:04:18 +01:00
rcourtman
386099aeee Surface ZFS pool membership on physical disks 2026-04-23 20:38:33 +01:00
rcourtman
60d7db6ef9 Harden agentexec token binding and disk filtering 2026-04-23 15:54:48 +01:00
rcourtman
eb98c13896 Allow insecure dev HTTP agent runtime URLs 2026-04-23 13:48:54 +01:00
rcourtman
b33e21e0e8 Add least-privilege SSH deploy mode 2026-04-22 15:23:02 +01:00
rcourtman
ccb2edc3b8 Require explicit websocket origin continuity 2026-04-22 04:46:13 +01:00
rcourtman
d64f5b2917 Canonicalize loopback-only Pulse transport validation 2026-04-22 04:11:18 +01:00
rcourtman
7b1520b760 Add fingerprint-pinned TLS mode for unified agent 2026-04-22 01:36:46 +01:00
rcourtman
c49176d700 Require TLS for non-loopback agent transport 2026-04-21 23:56:07 +01:00
rcourtman
3ec2c0779e Harden agent command and deploy trust boundaries 2026-04-21 23:50:34 +01:00
rcourtman
c4a4d175ce Fix v6 dry run backend contract regressions 2026-04-20 14:57:49 +01:00
rcourtman
40a25f82d1 Add periodic Proxmox registration health check loop
The cold-startup race: if Pulse and the agent restart together, the
monitor has no connection-health data when the agent calls
checkRegistrationWithPulse at startup. The server defaults to
registered=true (no known-disconnected entry), so the agent skips
re-registration even though the token is stale. The node stays broken
until the next manual agent restart.

Fix: after the initial runProxmoxSetup call, start a background goroutine
that waits 2 minutes (giving the monitor time to poll PVE and record
failure state), then rechecks every 5 minutes via RunHealthCheck.

RunHealthCheck only acts on types that have a local registration marker.
Types without a marker are skipped to prevent uncontrolled token rotation
when Pulse is temporarily unreachable — those need a full startup setup
cycle via RunAll.

Together with the two earlier commits this closes all three stale-token
scenarios: install-time 401, long-running stale state, and cold-startup
race.
2026-04-18 22:25:18 +01:00
rcourtman
501c61b82f Fix PVE stale token self-healing after failed registration
Two gaps in the existing flow allowed a disconnected PVE node to stay
broken indefinitely even after the agent restarted:

1. Server-side: autoRegisteredNodeExists checked only that a PVE/PBS
   instance existed in the config, not whether its connection was
   healthy. A node with a stale token would return registered=true on
   every check, causing the agent to skip re-registration forever.
   Fixed: also consult GetConnectionStatuses(); return registered=false
   when the monitor has a definitive disconnected entry so the agent can
   rotate and re-register.

2. Agent-side: the type-specific registration marker was cleared only on
   success. If rotation succeeded but the Pulse update failed (e.g.
   transient network error), the old marker from a previous successful
   registration persisted, leaving next-startup to skip setup again.
   Fixed: clear the marker before entering the token setup/rotation
   phase so any failure leaves the system in a retriable state.

Together these two fixes make the stale-token scenario self-healing:
the monitor detects the broken connection, the next agent startup sees
registered=false, clears its marker, rotates the token, and updates
Pulse — without manual intervention.
2026-04-18 22:07:30 +01:00
rcourtman
b0b790cf55 Fix PVE token re-registration after agent reinstall
When the agent is reinstalled on a Proxmox host, it rotates the PVE API
token in Proxmox but the Pulse server's /api/setup-script-url endpoint
requires settings:write scope — agent tokens only have agent:report — so
the 401 aborted the update, leaving Pulse with a stale token and a
disconnected PVE node.

Three-part fix:
- server: accept agent API tokens on /api/auto-register for updating
  existing nodes (new nodes still require setup-token auth)
- agent: fall through instead of aborting when setup token fetch returns
  4xx; send X-API-Token header so the server can authenticate via the
  agent token instead
- update: allow HTTP auto-update URLs for RFC 1918 private network
  addresses (LAN installs without HTTPS no longer block auto-update)
2026-04-18 21:44:42 +01:00
rcourtman
d03056f656 Port v5 NAS vendor identity and RAID normalization 2026-04-15 12:54:15 +01:00
rcourtman
05fa111ca1 Stabilize backend race tests for v6 RC publish 2026-04-11 22:46:34 +01:00
rcourtman
7028c95ed0 Hermeticize Linux SMART discovery tests 2026-04-09 21:28:37 +01:00
rcourtman
8de5c60b46 Self-heal stale Proxmox auto-register markers 2026-04-01 19:56:20 +01:00
rcourtman
24fc9a019b Forward-port SMART collector hardening 2026-04-01 15:15:59 +01:00
rcourtman
bf7ca9fa0b Select reachable Proxmox auto-register hosts 2026-04-01 13:49:40 +01:00
rcourtman
82ce98f7ca Prefer hostname endpoints for Proxmox auto-register 2026-04-01 12:49:17 +01:00
rcourtman
2afb96ee13 fix(release): align api and hostagent rc contracts 2026-03-26 17:08:48 +00:00
rcourtman
778a2577b6 feat: Pulse v6 release 2026-03-18 16:06:30 +00:00
rcourtman
572520ebc6 Promote guest-agent /proc/meminfo fallback for accurate VM memory (#1270)
Move the guest-agent file-read of /proc/meminfo earlier in the memory
fallback chain so it runs before RRD, giving real-time MemAvailable that
correctly excludes reclaimable buff/cache on Linux VMs. Also add
VM.GuestAgent.FileRead permission for PVE 9 and fix install.sh to use
comma-separated privilege strings.
2026-03-09 10:04:28 +00:00
rcourtman
fe0706f614 Fix cluster double-registration invalidating Proxmox credentials (#1319)
Two nodes in the same PVE cluster generated identical Proxmox API token
names, so the second node's setup rotated the shared token and broke the
first node. Include the hostname in the token name so each node gets its
own token. Also refresh the stored cluster credential on the server when
a new endpoint merges into an existing cluster entry.
2026-03-07 22:36:01 +00:00
rcourtman
499ab812e3 Fix post-release regressions and lock v5 to single-tenant runtime 2026-03-05 23:46:35 +00:00
rcourtman
b38488f2da fix(proxmox): stabilize pulse monitor token lifecycle 2026-03-03 10:57:19 +00:00
rcourtman
dacf0f86c4 fix(agent): collect temperature on FreeBSD via sysctl (#1254)
The agent gate only allowed temperature collection on Linux (lm-sensors).
FreeBSD exposes CPU and ACPI thermal zone temperatures via sysctl
(dev.cpu.N.temperature, hw.acpi.thermal.tzN.temperature). Parse sysctl
output directly in Go without shell involvement.
2026-02-20 19:00:40 +00:00
rcourtman
8c7d507ea4 fix(alerts): make --disk-exclude suppress Proxmox SSD wear/health alerts (#1142)
The --disk-exclude agent flag only filtered local metric collection but
had no effect on server-side Proxmox disk health and SSD wearout alerts,
which poll the Proxmox API directly. Users excluding disks (e.g.
--disk-exclude sda) still received alerts for those disks.

Agent now sends its DiskExclude patterns in each report. The server
stores them on the Host model and consults them during Proxmox disk
polling — excluded disks get a synthetic healthy status passed to
CheckDiskHealth so any existing alerts clear immediately.

Also adds FreeBSD pseudo-filesystem types (fdescfs, devfs, linprocfs,
linsysfs) to the virtual FS filter and /var/run/ to special mount
prefixes, fixing false disk-full alerts on FreeBSD for fdescfs mounts.
2026-02-20 13:31:52 +00:00
rcourtman
00afaec2ae fix(agent): add retry with backoff to Proxmox auto-registration (#1267, #1269, #1261, #1268)
registerWithPulse() was a one-shot call at agent startup — if it failed
(timing, transient network, Pulse not ready), the agent silently continued
as a generic Host forever. Wrap the HTTP POST in a retry loop with
exponential backoff (5s, 10s, 20s, 40s, 60s) and distinguish 4xx errors
(no retry) from 5xx/network errors (retry).
2026-02-18 16:05:40 +00:00
rcourtman
7efcec3120 fix(agents,ai): host URL field, AI Docker routing, Proxmox registration logging (#1197, #1210, #1267)
#1197: Add Custom URL input to the expanded host row in Settings → Agents.
Loads existing URL via HostMetadataAPI on row expand; saves on button click.
Only shown for host-type agent rows.

#1210: Fix agent_connected always false for Docker hosts on Proxmox VMs.
connectedAgentHostnames now also marks Docker host hostnames reachable when
their matching VM/LXC has a node with a connected Proxmox agent, mirroring
the routing logic already used in the control path.

#1267/#1269: Improve Proxmox auto-registration failure logging. Response body
is now included in the error message, and the warning directs users to delete
the state file to force re-registration rather than claiming the node exists.

(cherry picked from commit 305f6d3c94f0da4fc970450a6304da57d6d7fe80)
2026-02-18 12:57:09 +00:00
rcourtman
47adcbd8af feat(agent): add FreeBSD S.M.A.R.T. disk collection support (#1236)
Relax the Linux-only gate on SMART collection to also run on FreeBSD.
Add FreeBSD disk discovery via sysctl kern.disks (lsblk is Linux-only).
The smartctl invocation and JSON parsing are already platform-agnostic.
2026-02-10 12:44:15 +00:00
rcourtman
815c990e85 fix(proxmox): avoid 403 on apt update checks 2026-02-09 20:28:09 +00:00
rcourtman
5c18748742 Add SMART disk lifecycle monitoring with historical charts
Expand the smartctl collector to capture detailed SMART attributes (SATA
and NVMe), propagate them through the full data pipeline, persist them
as time-series metrics, and display them in an interactive disk detail
drawer with historical sparkline charts.

Backend: add SMARTAttributes struct, writeSMARTMetrics for persistent
storage, "disk" resource type in metrics API with live fallback.
Frontend: enhanced DiskList with Power-On column and SMART warnings,
new DiskDetail drawer matching NodeDrawer styling patterns, generic
HistoryChart metric support with proper tooltip formatting.
2026-02-04 13:35:40 +00:00
rcourtman
316a56299c fix(agent): grant PVEDatastoreAdmin for backup visibility
The unified agent's Proxmox setup was missing the PVEDatastoreAdmin
permission on /storage, causing local PVE backups to not appear in
Pulse's backup overview for users who set up nodes via the agent.

The UI-generated setup script already included this permission, but
the agent path (--enable-proxmox) did not, creating an inconsistency.

Related to #1139
2026-02-03 19:11:25 +00:00
rcourtman
19a67dd4f3 Update core infrastructure components
Config:
- AI configuration improvements
- API tokens handling
- Persistence layer updates

Host Agent:
- Command execution improvements
- Better test coverage

Infrastructure Discovery:
- Service improvements
- Enhanced test coverage

Models:
- State snapshot updates
- Model improvements

Monitoring:
- Polling improvements
- Guest config handling
- Storage config support

WebSocket:
- Hub tenant test updates

Service Discovery:
- New service discovery module
2026-01-28 16:52:35 +00:00
rcourtman
54a3e7f4af feat: add host agent sysinfo and improve test coverage
New Features:
- Add sysinfo module for system information collection
- Enhance agent with improved metrics handling

Test Coverage:
- Add sysinfo tests
- Add commands coverage tests
- Add hostagent coverage tests
- Add mock collector for testing
- Improve agent, metrics, sensors, and proxmox setup tests
2026-01-24 22:42:46 +00:00
rcourtman
8412cc7ddb fix: env overrides and OS-aware test improvements
- Add PBS/PMG polling interval environment variable overrides in config.go
- Fix temp path expectation in detect_root_test.go using filepath.Join
- Use EvalSymlinks for symlink target comparison in self_update_test.go
- Add Linux-only skip for MAC fallback test in agent_new_test.go
- Add OS-aware RAID/SMART assertions in agent_metrics_test.go
2026-01-22 13:49:05 +00:00
rcourtman
a383f06848 fix(test): add stateFileDir to TestRun_Legacy test setup 2026-01-20 17:43:58 +00:00
rcourtman
a6a8efaa65 test: Add comprehensive test coverage across packages
New test files with expanded coverage:

API tests:
- ai_handler_test.go: AI handler unit tests with mocking
- agent_profiles_tools_test.go: Profile management tests
- alerts_endpoints_test.go: Alert API endpoint tests
- alerts_test.go: Updated for interface changes
- audit_handlers_test.go: Audit handler tests
- frontend_embed_test.go: Frontend embedding tests
- metadata_handlers_test.go, metadata_provider_test.go: Metadata tests
- notifications_test.go: Updated for interface changes
- profile_suggestions_test.go: Profile suggestion tests
- saml_service_test.go: SAML authentication tests
- sensor_proxy_gate_test.go: Sensor proxy tests
- updates_test.go: Updated for interface changes

Agent tests:
- dockeragent/signature_test.go: Docker agent signature tests
- hostagent/agent_metrics_test.go: Host agent metrics tests
- hostagent/commands_test.go: Command execution tests
- hostagent/network_helpers_test.go: Network helper tests
- hostagent/proxmox_setup_test.go: Updated setup tests
- kubernetesagent/*_test.go: Kubernetes agent tests

Core package tests:
- monitoring/kubernetes_agents_test.go, reload_test.go
- remoteconfig/client_test.go, signature_test.go
- sensors/collector_test.go
- updates/adapter_installsh_*_test.go: Install adapter tests
- updates/manager_*_test.go: Update manager tests
- websocket/hub_*_test.go: WebSocket hub tests

Library tests:
- pkg/audit/export_test.go: Audit export tests
- pkg/metrics/store_test.go: Metrics store tests
- pkg/proxmox/*_test.go: Proxmox client tests
- pkg/reporting/reporting_test.go: Reporting tests
- pkg/server/*_test.go: Server tests
- pkg/tlsutil/extra_test.go: TLS utility tests

Total: ~8000 lines of new test code
2026-01-19 19:26:18 +00:00
rcourtman
d06ed2edb3 refactor: Add testability improvements to core packages
hostagent/commands.go:
- Extract execCommandContext as mockable variable

hostagent/proxmox_setup.go:
- Convert stateFilePath constants to variables (testable)
- Extract runCommand and lookPath as mockable functions
- Add duplicate comment (minor cleanup needed)

notifications/notifications.go:
- Add GetQueueStats() method for interface compliance
- Used by NotificationMonitor interface

updates/manager.go:
- Add AddSSEClient, RemoveSSEClient, GetSSECachedStatus methods
- Enables interface-based SSE client management

pkg/audit/export.go:
- Minor testability improvements

go.mod/go.sum:
- Add stretchr/objx v0.5.2 (test mocking dependency)
2026-01-19 19:25:38 +00:00
rcourtman
6ed1fdf806 feat(rbac): implement RBAC UI, OIDC group mapping, and API standard auth
- Added Roles and Users settings panels
- Implemented OIDC group-to-role mappings in config and auth flow
- Standardized API token context handling via pkg/auth
- Added Pulse Pro branding and upgrade banners to RBAC features
- Cleanup: Removed empty code blocks and fixed lint errors
2026-01-09 19:16:34 +00:00
rcourtman
5c4399d69f feat(agent): add DisableCeph toggle, report_ip remote config, and improved IP detection (#929) 2026-01-09 14:45:29 +00:00
rcourtman
6019e3e77e fix: normalize custom OpenAI-compatible API URLs (#1067)
Users providing base URLs like "https://openrouter.ai/api/v1" were
getting HTML error responses because the client used the URL directly
without appending "/chat/completions".

- Normalize baseURL in NewOpenAIClient to ensure it ends with /chat/completions
- Fix modelsEndpoint() to derive /models from the normalized baseURL
- Add tests for URL normalization with various endpoint formats
2026-01-09 09:13:36 +00:00
rcourtman
5f0214b949 fix: support ReportIP override in Proxmox auto-registration (#1061) 2026-01-08 21:20:51 +00:00
rcourtman
7db6b3e47d feat: Add AI chat session sync across devices
Implements server-side persistence for AI chat sessions, allowing users
to continue conversations across devices and browser sessions. Related
to #1059.

Backend:
- Add chat session CRUD API endpoints (GET/PUT/DELETE)
- Add persistence layer with per-user session storage
- Support session cleanup for old sessions (90 days)
- Multi-user support via auth context

Frontend:
- Rewrite aiChat store with server sync (debounced)
- Add session management UI (new conversation, switch, delete)
- Local storage as fallback/cache
- Initialize sync on app startup when AI is enabled
2026-01-08 10:47:45 +00:00
rcourtman
95fb896a03 fix: Agent 405 errors when reverse proxy redirects HTTP to HTTPS
When a user's reverse proxy redirects HTTP to HTTPS, Go's default HTTP
client behavior converts POST requests to GET on 301/302 redirects
(per HTTP specification). This causes the Pulse server to return 405
"Only POST is allowed" errors.

Added CheckRedirect to all agent HTTP clients (host, docker, kubernetes)
that returns a clear error message guiding users to use the correct
protocol in their --url flag instead of silently following redirects.

Related to #1058
2026-01-07 17:56:07 +00:00
rcourtman
3fdf753a5b Enhance devcontainer and CI workflows
- Add persistent volume mounts for Go/npm caches (faster rebuilds)
- Add shell config with helpful aliases and custom prompt
- Add comprehensive devcontainer documentation
- Add pre-commit hooks for Go formatting and linting
- Use go-version-file in CI workflows instead of hardcoded versions
- Simplify docker compose commands with --wait flag
- Add gitignore entries for devcontainer auth files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 22:29:15 +00:00
rcourtman
c1f4b8f40b feat: PULSE_DISK_EXCLUDE now applies to SMART monitoring. Related to #983
Previously, the PULSE_DISK_EXCLUDE environment variable and --disk-exclude
flag only filtered mount points in the hostmetrics collector. This change
extends the exclusion to SMART data collection.

Changes:
- Updated smartctl.CollectLocal() to accept diskExclude patterns
- Added matchesDeviceExclude() for block device pattern matching
- Patterns support: exact match (sda), prefix (nvme*), contains (*cache*)
- Updated hostagent to pass DiskExclude to SMART collector
- Added comprehensive tests for pattern matching
- Updated documentation
2025-12-31 23:07:01 +00:00