Commit graph

160 commits

Author SHA1 Message Date
rcourtman
93475f3941 Self-heal stale Proxmox auto-register markers (#1267) 2026-03-25 12:34:50 +00:00
rcourtman
8119050819 Accept tokenId/tokenSecret aliases for node config API (#1147) 2026-03-25 12:23:39 +00:00
rcourtman
b20221429f Harden Proxmox setup SSH key handling (#1297) 2026-03-25 11:27:25 +00:00
rcourtman
4d4344911a Harden PVE setup token extraction (#1312) 2026-03-25 11:09:19 +00:00
rcourtman
572520ebc6 Promote guest-agent /proc/meminfo fallback for accurate VM memory (#1270)
Move the guest-agent file-read of /proc/meminfo earlier in the memory
fallback chain so it runs before RRD, giving real-time MemAvailable that
correctly excludes reclaimable buff/cache on Linux VMs. Also add
VM.GuestAgent.FileRead permission for PVE 9 and fix install.sh to use
comma-separated privilege strings.
2026-03-09 10:04:28 +00:00
rcourtman
fe0706f614 Fix cluster double-registration invalidating Proxmox credentials (#1319)
Two nodes in the same PVE cluster generated identical Proxmox API token
names, so the second node's setup rotated the shared token and broke the
first node. Include the hostname in the token name so each node gets its
own token. Also refresh the stored cluster credential on the server when
a new endpoint merges into an existing cluster entry.
2026-03-07 22:36:01 +00:00
rcourtman
a6f6f66078 Improve auto-register auth errors and setup token grace window (#1319)
Some checks are pending
Build and Test / Secret Scan (push) Waiting to run
Build and Test / Frontend & Backend (push) Waiting to run
Core E2E Tests / Playwright Core E2E (push) Waiting to run
The /api/auto-register endpoint returned a generic "Invalid or expired
setup code" for all auth failures, making cluster registration issues
impossible to diagnose. Now returns specific errors for expired tokens,
wrong scope, invalid API tokens, etc.

Also extend the setup token grace window to /api/auto-register so
multiple cluster nodes can register with the same token within the
1-minute grace period after first use.
2026-03-07 13:39:26 +00:00
rcourtman
499ab812e3 Fix post-release regressions and lock v5 to single-tenant runtime 2026-03-05 23:46:35 +00:00
rcourtman
72be883f4e fix(proxmox): prevent broken TLS config on auto-register fingerprint failure (#1303)
When FetchFingerprint fails during agent auto-registration, set verifySSL
based on whether a fingerprint was captured rather than hardcoding true.
Also heal already-broken nodes (verifySSL=true with empty fingerprint) on
legacy re-register to prevent permanent connection failures with self-signed
Proxmox certs.
2026-03-05 10:01:43 +00:00
rcourtman
8818a740e2 fix(proxmox): prevent setup-script token drift and add lifecycle integration tests (#1312) 2026-03-03 20:11:01 +00:00
rcourtman
b38488f2da fix(proxmox): stabilize pulse monitor token lifecycle 2026-03-03 10:57:19 +00:00
rcourtman
510ec999ab fix(api): store TLS fingerprint during auto-registration (#1303)
The legacy auto-register endpoint captured TLS fingerprints via
FetchFingerprint() but never persisted them to the node config. Nodes
with self-signed certs registered via the agent would fail with
"x509: certificate signed by unknown authority" on subsequent polls.

Store the fingerprint in all add/update paths for both PVE and PBS,
guard updates against empty-fingerprint clobber when FetchFingerprint
fails, and pass the fingerprint to cluster detection configs.
2026-03-02 14:07:18 +00:00
rcourtman
027fd9932c fix(proxmox): make monitor reload synchronous after auto-registration (#1303)
Auto-register was running the monitor reload in a background goroutine,
so the HTTP response was sent before the poller picked up the new node.
If reload failed or was slow, the node appeared in Settings > Proxmox
(reads config from disk) but not on the main Proxmox tab (reads from
active polling state).

Changed both auto-register paths to reload synchronously, matching the
manual add path (HandleAddNode).
2026-03-01 21:04:20 +00:00
rcourtman
7530b66254 fix(setup): escape printf %s in Sprintf template to fix format verb count (#1297)
The printf '%s\n' calls in shell code within the Go Sprintf template
were being counted as format verbs, causing a build failure (10 verbs
but 9 args). Using %%s produces literal %s in the output.
2026-02-27 14:44:41 +00:00
rcourtman
4c7a79cecb fix(setup): preserve SSH authorized_keys symlink on Proxmox and fix key entry quoting (#1297)
The PVE setup script had three bugs in the temperature monitoring SSH key setup:

- Nested double quotes in SSH_SENSORS_KEY_ENTRY broke the bash string, causing
  "No such file or directory" errors for the key options
- The grep/mv pattern to update authorized_keys destroyed the symlink that
  Proxmox maintains from /root/.ssh/authorized_keys to /etc/pve/priv/
- The uninstall path grepped for "# pulse-managed-key" but keys were tagged
  "# pulse-sensors", so uninstall never cleaned up sensor keys

Fixes: resolve symlinks with readlink -f before operating, create temp files in
/tmp with mv-then-cp fallback for cross-device moves, escape inner quotes, and
broaden the uninstall filter to match all pulse-prefixed keys.
2026-02-27 13:23:03 +00:00
rcourtman
b445f8d8fa fix(agent): preserve user-configured host URL during agent re-registration (#1283)
When an agent re-registers with the same token, the DHCP matching case
would overwrite the Host field with the agent's local IP — even if the
user had edited it to a public URL or different IP. Now agent source
re-registrations always preserve the existing host, while non-agent
DHCP updates still work. Adds 5 regression tests covering hostname
preservation, public-IP preservation, agent DHCP, non-agent DHCP, and
PBS parity.
2026-02-21 12:46:02 +00:00
rcourtman
1d07c1cd30 fix(agent): prevent duplicate PVE entries on agent re-registration (#1245)
Two changes to prevent duplicates in Settings > Virtual Environment:

1. Install script: only clear Proxmox state files on fresh installs,
   not upgrades. Previously every install forced re-registration.

2. Auto-register dedup: match agent re-registrations by server name
   when both the existing entry and new request have Pulse-created
   tokens (pulse-monitor@pam!pulse-*). This catches the case where
   the agent creates a new token after state files are cleared.
2026-02-20 19:38:03 +00:00
rcourtman
7522f6599c fix(agent): three backend fixes for FreeBSD, Docker rootless, and duplicate PVE hosts
FreeBSD auto-update (#1254): determineArch() now includes freebsd in its
OS switch, producing freebsd-amd64/arm64 instead of falling through to
a uname -m fallback that incorrectly returned linux-<arch>. FreeBSD agents
were downloading Linux ELF binaries and failing to exec them.

Docker rootless socket (#1200): buildRuntimeCandidates() now probes
/run/user/<uid>/docker.sock before the system-wide /var/run/docker.sock,
enabling auto-detection of Docker rootless installations.

Duplicate PVE/PBS hosts (#1245, #1252): handleSecureAutoRegister() now
deduplicates by host URL, updating the existing instance's token in-place
instead of appending a duplicate entry on each re-run of the setup script.

Fixes #1254
Fixes #1200
Fixes #1245
Fixes #1252

(cherry picked from commit 0f1d9e9b9fea6c8b9e65872e8a78e25f93653eef)
2026-02-18 12:53:25 +00:00
rcourtman
815c990e85 fix(proxmox): avoid 403 on apt update checks 2026-02-09 20:28:09 +00:00
rcourtman
0f961054c6 fix: allow agent tokens to auto-register Proxmox nodes
The security hardening in beae4c86 added a settings:write scope
requirement to /api/auto-register, but agent install tokens only have
host-agent:report scope. This broke Proxmox auto-registration for all
agent-generated tokens. Accept either settings:write or host-agent:report
scope for auto-registration.

Fixes #1191
2026-02-04 22:55:25 +00:00
rcourtman
f6338f34fa fix: add agent:exec scope to generated agent tokens
Agent tokens created from the Settings UI and the backend install
command handler were missing the agent:exec scope, which was added
as a security requirement in 60f9e6f0. This caused all newly
installed agents to fail registration with "Agent exec token missing
required scope: agent:exec".

Fixes #1191
2026-02-04 22:33:01 +00:00
rcourtman
8f92273e33 security: enforce scope checks for AI approvals and config management 2026-02-03 18:40:31 +00:00
rcourtman
beae4c860c fix: address 6 security and reliability issues
Security fixes:
- Auto-register now requires settings:write scope for API tokens
- X-Forwarded-For in auto-register only trusted from verified proxies
- Public URL capture requires authentication (no loopback bypass)
- Lockout reset now uses RequireAdmin for session users

Reliability fixes:
- Docker stop command expiration clears PendingUninstall flag
- Cancelled notifications get completed_at set and are cleaned up
2026-02-03 17:32:44 +00:00
rcourtman
4af5fc4246 refactor(config): rename BackendHost/BackendPort to BindAddress
Simplify server config by consolidating BackendHost and BackendPort into
a single BindAddress field. The port is now solely controlled by FrontendPort.

Changes:
- Replace BackendHost/BackendPort with BindAddress in Config struct
- Add deprecation warning for BACKEND_HOST env var (use BIND_ADDRESS)
- Update connection timeout default from 45s to 60s
- Remove backendPort from SystemSettings and frontend types
- Update server.go to use cfg.BindAddress
- Update all tests to use new config field names
2026-02-01 23:26:32 +00:00
rcourtman
508c9f88f6 fix: Support partial updates for PBS nodes. Related to #1105
Allow updating PBS node settings (like excludeDatastores) without
requiring host to be resent. Match the behavior of PVE/PMG handlers
which only validate and update fields when provided.

Previously, PUT /api/config/nodes/{pbs-id} with just {excludeDatastores: [...]}
would fail with 'host is required' because the handler always called
normalizeNodeHost regardless of whether a new host was provided.
2026-01-23 00:13:28 +00:00
rcourtman
289d95374f feat: add multi-tenancy foundation (directory-per-tenant)
Implements Phase 1-2 of multi-tenancy support using a directory-per-tenant
strategy that preserves existing file-based persistence.

Key changes:
- Add MultiTenantPersistence manager for org-scoped config routing
- Add TenantMiddleware for X-Pulse-Org-ID header extraction and context propagation
- Add MultiTenantMonitor for per-tenant monitor lifecycle management
- Refactor handlers (ConfigHandlers, AlertHandlers, AIHandlers, etc.) to be
  context-aware with getConfig(ctx)/getMonitor(ctx) helpers
- Add Organization model for future tenant metadata
- Update server and router to wire multi-tenant components

All handlers maintain backward compatibility via legacy field fallbacks
for single-tenant deployments using the "default" org.
2026-01-22 13:39:06 +00:00
rcourtman
a55bdb7a3a feat(api): security and metrics history improvements
- Require admin + settings:write scope for setup-script-url endpoint
- Add license enforcement for long-term metrics (30d/90d require Pro)
- Add downsampling step calculation for metrics history queries
- Add isContainerSSHRestricted helper for SSH restriction checks
- Clean up temperature proxy references from config handlers
- Minor OIDC and rate limit improvements
2026-01-22 00:44:12 +00:00
rcourtman
7599915b8f refactor(api): remove sensor proxy config from API handlers
- config_handlers.go: remove proxy configuration endpoints
- system_settings.go: remove proxy-related settings
- rate_limit_config.go: update rate limit configuration
- Update related test files
2026-01-21 12:02:46 +00:00
rcourtman
035436ad6e fix: add mutex to prevent concurrent map writes in Docker agent CPU tracking
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.

Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()

Related to #1063
2026-01-15 21:10:55 +00:00
rcourtman
9b49d3171d feat(pbs): add datastore exclusion to reduce PBS log noise
Users with removable/unmounted datastores (e.g., external HDDs for
offline backup) experienced excessive PBS log entries because Pulse
was querying all datastores including unavailable ones.

Added `excludeDatastores` field to PBS node configuration that accepts
patterns to exclude specific datastores from monitoring:
- Exact names: "exthdd1500gb"
- Prefix patterns: "ext*"
- Suffix patterns: "*hdd"
- Contains patterns: "*removable*"

Pattern matching is case-insensitive.

Fixes #1105
2026-01-14 12:26:18 +00:00
rcourtman
b7f5cfde1c fix: apply subnet preference for cluster nodes in fallback path
When cluster node validation fails (because cluster-reported IPs are on
an internal network unreachable from Pulse), the fallback path was not
applying subnet preference logic. This caused Pulse to continue trying
to connect to internal cluster IPs instead of management network IPs.

Now the fallback path queries node network interfaces via the initial
connection and sets IPOverride to an IP on the same network as the
original connection, just like the validated node path does.

Fixes #929
2026-01-10 15:40:48 +00:00
rcourtman
3e2824a7ff feat: remove Enterprise badges, simplify Pro upgrade prompts
- Replace barrel import in AuditLogPanel.tsx to fix ad-blocker crash
- Remove all Enterprise/Pro badges from nav and feature headers
- Simplify upgrade CTAs to clean 'Upgrade to Pro' links
- Update docs: PULSE_PRO.md, API.md, README.md, SECURITY.md
- Align terminology: single Pro tier, no separate Enterprise tier

Also includes prior refactoring:
- Move auth package to pkg/auth for enterprise reuse
- Export server functions for testability
- Stabilize CLI tests
2026-01-09 16:51:08 +00:00
rcourtman
020553a12d fix: use flexible subnet matching instead of fixed /24
The previous implementation assumed /24 subnets, which failed for
larger networks (e.g., /16 or /20). Now uses progressive subnet
matching that tries /24, /20, and /16 to handle various network sizes.

Example: If connection IP is 10.1.1.5 and a node has 10.1.2.6,
it now correctly identifies them as being on the same network.
2026-01-08 23:24:50 +00:00
rcourtman
bd1df9f942 feat: automatic subnet preference for cluster node discovery
When discovering cluster nodes, Pulse now automatically prefers IPs
on the same subnet as the initial connection. This fixes the common
issue where Pulse used internal cluster network IPs (e.g., 172.x.x.x)
instead of management network IPs (e.g., 10.x.x.x).

How it works:
1. Extract subnet from initial connection URL (assumes /24 for IPv4)
2. For each discovered node, query /nodes/{node}/network for all IPs
3. If cluster-reported IP is on a different subnet, find an IP on
   the preferred subnet and set it as IPOverride
4. Manual IPOverride settings are preserved and take precedence

This eliminates the need for manual IPOverride configuration in most
multi-network Proxmox setups.

Refs #929, #1066
2026-01-08 23:12:30 +00:00
rcourtman
9e339957c6 fix: Update runtime config when toggling Docker update actions setting
The DisableDockerUpdateActions setting was being saved to disk but not
updated in h.config, causing the UI toggle to appear to revert on page
refresh since the API returned the stale runtime value.

Related to #1023
2026-01-03 11:14:17 +00:00
rcourtman
ee45323312 feat: Allow configuring physical disk polling interval in UI (Related to #1007) 2026-01-01 16:00:28 +00:00
rcourtman
76990a65a7 fix: Preserve user's configured hostname when agent registers with IP
When a node was manually added with a hostname (e.g., pve.example.com)
and then the agent registered using its IP address, the code would
correctly deduplicate but incorrectly overwrite the user's configured
hostname with the agent's IP.

Now when matching by IP resolution (hostname resolves to agent's IP),
we preserve the user's original hostname configuration instead of
replacing it with the IP.

Related to #940
2025-12-28 15:44:40 +00:00
rcourtman
1dff90817f fix: detect duplicate nodes by IP resolution during agent auto-register. Related to #924
When an agent registers using an IP address, check if any existing node's
hostname resolves to that same IP. This prevents duplicates when a node
was manually configured via hostname and later the agent is installed
which registers using the host's IP.

Changes:
- Add extractHostIP() to extract IP from URL if present
- Add resolveHostnameToIP() with 2s timeout for DNS resolution
- During agent auto-registration, check if existing hostname-based
  configs resolve to the new IP and update instead of creating duplicates
- Add test for extractHostIP helper function
2025-12-27 11:02:00 +00:00
rcourtman
4277aa753c feat(pbs): turnkey PBS setup with password auth
When adding a PBS node with username/password credentials, Pulse now
automatically:
1. Connects to PBS using the provided credentials
2. Creates a 'pulse-monitor@pbs' user with Audit permissions
3. Generates an API token
4. Stores the token instead of the password

This enables one-click PBS setup for Docker/containerized deployments
where you can't easily run the agent installer. Simply enter root@pam
credentials in the UI and Pulse handles the rest.

Falls back to password auth if token creation fails (e.g., old PBS
version or permission issues).
2025-12-26 10:12:04 +00:00
rcourtman
3d671c1824 feat(pbs): add API-based token creation for turnkey PBS setup
- Added PBS client methods: CreateUser, SetUserACL, CreateUserToken
- Added SetupMonitoringAccess() turnkey method that creates user + token
- Updated handleSecureAutoRegister to use PBS API for token creation
- Enables one-click PBS setup for Docker/containerized deployments

When users provide PBS root credentials, Pulse can now create the
monitoring user and API token remotely via the PBS API, eliminating
the need to SSH/exec into the container manually.
2025-12-26 10:08:41 +00:00
rcourtman
a9078e96d1 fix: apply duplicate hostname fix to HandleAddNode (manual UI)
Extended Issue #891 fix to cover manual node addition via the UI:

1. HandleAddNode now checks for duplicates by Host URL (not name)
2. Disambiguator applied to PVE, PBS, and PMG node creation
3. Error message updated: 'host URL already exists' instead of 'name already exists'

This ensures the fix works whether nodes are added via:
- Agent auto-registration ✓
- Manual UI setup ✓

All node creation paths now consistently:
- Match by Host URL only
- Disambiguate duplicate hostnames with IP: 'px1' → 'px1 (10.0.2.224)'
2025-12-24 16:17:37 +00:00
rcourtman
44b25b43ac fix: handle DHCP IP changes without creating duplicates
Follow-up to #891 fix - also match by name+tokenID to handle the case
where the same physical host gets a new IP (DHCP). This ensures:

1. Same hostname + DIFFERENT token = different physical hosts → create separate nodes
2. Same hostname + SAME token = same host with new IP → update existing node

Also updates the host URL when an existing node is matched, so IP changes
are properly reflected in the saved configuration.
2025-12-24 16:09:22 +00:00
rcourtman
92988ae0e6 fix: allow duplicate hostnames for different Proxmox hosts. Related to #891
PROBLEM:
When two Proxmox hosts have the same hostname (e.g., 'px1' on different networks),
the auto-registration was matching by name and overwriting the first with the second.
This has been a recurring issue (#104) with at least 3 prior fix attempts.

ROOT CAUSE:
The auto-register handler matched existing nodes by BOTH Host URL and Name.
Matching by name is incorrect - different physical hosts can share hostnames.

FIXES:
1. Remove name-based matching in auto-registration - match by Host URL only
2. Add disambiguateNodeName() to append IP when duplicate hostnames exist
3. Add regression tests to prevent this from breaking again

Now when registering two hosts named 'px1':
- First becomes: px1
- Second becomes: px1 (10.0.2.224)
Both are stored as separate nodes with their own credentials.
2025-12-24 16:05:07 +00:00
rcourtman
e0dc6695fc fix: Per-node TLS fingerprints for cluster peers (TOFU)
When a PVE cluster has unique self-signed certificates on each node, Pulse
would mark secondary nodes as unhealthy because only the primary node's
fingerprint was used for all connections.

Now, during cluster discovery, Pulse captures each node's TLS fingerprint
and uses it when connecting to that specific node. This enables
"Trust On First Use" (TOFU) for clusters with unique per-node certs.

Changes:
- Add Fingerprint field to ClusterEndpoint config
- Add FetchFingerprint() to tlsutil for capturing node certs
- validateNodeAPI() now captures and returns fingerprints during discovery
- NewClusterClient() accepts endpointFingerprints map for per-node certs
- All client creation paths use per-endpoint fingerprints when available

Related to #879
2025-12-24 10:05:03 +00:00
rcourtman
e4732af0f5 fix: use configured Guest URLs for PVE/PBS/PMG navigation (#870)
- Fix PVE nodes: buildNodeUrl in ProxmoxNodesSection.tsx now prioritizes
  guestURL over host (was ignoring guestURL entirely)
- Add PBS support: GuestURL field added to PBSInstance config, model,
  and API handlers
- Add PMG support: GuestURL field added to PMGInstance config, model,
  and API handlers
- Update NodeSummaryTable to use guestURL for PBS nodes
- Frontend types updated for PBS/PMG guestURL support

The Guest URL setting in node configuration now works correctly across
all node types. When set, it takes priority over the Host URL when
clicking on node names to navigate to the Proxmox/PBS/PMG web UI.

Closes #870
2025-12-22 22:05:25 +00:00
rcourtman
397871629c fix: cluster-aware guest deduplication and multi-agent token binding
- Add cluster-aware guest ID generation (clusterName-VMID instead of instanceName-VMID)
  to prevent duplicate VMs/containers when multiple cluster nodes are monitored

- Add cluster deduplication at registration time - when a node is added that belongs
  to an already-configured cluster, merge as endpoint instead of creating duplicate

- Add startup consolidation to automatically merge duplicate cluster instances

- Change host agent token binding from agent GUID to hostname, allowing:
  - Multiple host agents to share a token (each bound by hostname)
  - Agent reinstalls on same host without token conflicts

- Remove 12-character password minimum requirement

- Remove emoji from auto-registration success message

- Fix grouped view node lookup to support both cluster-aware node IDs
  (clusterName-nodeName) and legacy guest grouping keys (instance-nodeName)

Fixes duplicate guests appearing when agents are installed on multiple
cluster nodes. Also improves multi-agent UX by allowing shared tokens.
2025-12-14 10:16:17 +00:00
rcourtman
60c980a921 Show AI cost refresh errors and harden log redaction 2025-12-12 11:05:24 +00:00
rcourtman
8948e84fe5 feat: AI features, agent improvements, and host monitoring enhancements
AI Chat Integration:
- Multi-provider support (Anthropic, OpenAI, Ollama)
- Streaming responses with markdown rendering
- Agent command execution for remote troubleshooting
- Context-aware conversations with host/container metadata

Agent Updates:
- Add --enable-proxmox flag for automatic PVE/PBS token setup
- Improve auto-update with semver comparison (prevents downgrades)
- Add updatedFrom tracking to report previous version after update
- Reduce initial update check delay from 30s to 5s
- Add agent version column to Hosts page table

Host Metrics:
- Add DiskIO stats collection (read/write bytes, ops, time)
- Improve disk filtering to exclude Docker overlay mounts
- Add RAID array monitoring via mdadm
- Enhanced temperature sensor parsing

Frontend:
- New Agent Version column on Hosts overview table
- Improved node modal with agent-first installation flow
- Add DiskIO display in host drawer
- Better responsive handling for metric bars
2025-12-05 10:37:02 +00:00
rcourtman
bda8056e48 Add refresh-cluster button to detect new Proxmox cluster members
When new nodes are added to a Proxmox cluster after Pulse was
initially configured, they weren't showing up in Settings. The
existing "Refresh" button only triggered network discovery, not
cluster membership re-detection.

Changes:
- Add POST /api/config/nodes/{id}/refresh-cluster endpoint
- Add "Refresh" button in cluster node panel in Settings
- Re-detect cluster membership and update stored endpoints

Related to #799
2025-12-02 22:01:00 +00:00
rcourtman
b4d497ce3b security: Add request body size limits to API handlers
Add http.MaxBytesReader to 16 additional handlers to prevent memory
exhaustion attacks via oversized request bodies:

- docker_agents.go: HandleReport (512KB), HandleCommandAck (8KB),
  HandleSetCustomDisplayName (8KB)
- alerts.go: UpdateAlertConfig (64KB), BulkAcknowledgeAlerts (32KB),
  BulkClearAlerts (32KB)
- config_handlers.go: HandleAddNode, HandleTestConnection,
  HandleUpdateNode, HandleTestNodeConfig (32KB each),
  HandleVerifyTemperatureSSH, HandleExportConfig, HandleDiscoverServers,
  HandleSetupScriptURL (8KB each), HandleImportConfig (1MB),
  HandleUpdateMockMode (16KB)
2025-12-02 16:43:13 +00:00