Commit graph

78 commits

Author SHA1 Message Date
rcourtman
eb02f28f5b test: Add tests for getInstanceConfig, baseIntervalForInstanceType, LoadOIDCConfig
- getInstanceConfig: 33%→100% (nil handling, case-insensitive matching)
- baseIntervalForInstanceType: 50%→100% (all instance types, clamping)
- LoadOIDCConfig: 35%→94% (file not exist, read errors, decryption)
2025-12-01 18:06:15 +00:00
rcourtman
1a23a7d9ec test: Add error path tests for config persistence functions
- LoadAPITokens: invalid JSON, empty file, file not exist
- LoadEmailConfig: invalid JSON, file not exist
- LoadWebhooks: invalid JSON, legacy migration
- LoadNodesConfig: empty arrays, missing fields, corruption recovery
- cleanupOldBackups: non-existent dir, multiple files cleanup
- Bonus: LoadAppriseConfig, LoadAlertConfig, LoadSystemSettings error paths

config package coverage: 46.3% → 48.2%
2025-12-01 17:45:50 +00:00
rcourtman
ed75f2f096 test: Add comprehensive tests for API token management
- Clone: deep copy verification for pointers and slices
- NewAPITokenRecord/NewHashedAPITokenRecord: creation and validation
- Config methods: HasAPITokens, APITokenCount, ActiveAPITokenHashes
- Config methods: HasAPITokenHash, PrimaryAPITokenHash, PrimaryAPITokenHint
- Config methods: ValidateAPIToken, UpsertAPIToken, RemoveAPIToken, SortAPITokens

config package coverage: 43.5% → 46.3%
2025-12-01 17:37:27 +00:00
rcourtman
4aa53c6ce6 test: Add comprehensive tests for CreateProxmoxConfigFromFields function
Cover all branches: user without realm gets @pam appended, user with
realm unchanged, token auth skips user modification, empty user handling,
and fingerprint/verifySSL preservation. Coverage improved from 66.7% to 100%.
2025-12-01 15:22:50 +00:00
rcourtman
ed4a229c8b test: Add LoadAPITokens error path tests
- Test nonexistent file returns empty slice (not error)
- Test empty file returns empty slice
- Test invalid JSON returns error
- Improves LoadAPITokens coverage from 80% to 93.3%
2025-12-01 15:00:56 +00:00
rcourtman
59970afc65 test: Add HasScope edge case tests for API tokens
- Test empty scope always returns true
- Test explicit wildcard scope in list grants any scope
- Improves coverage from 85.7% to 100%
2025-12-01 14:51:58 +00:00
rcourtman
d548287105 test: Add unit tests for api_tokens.go pure functions
Add comprehensive tests for tokenPrefix, tokenSuffix, normalizeScopes,
and IsKnownScope functions. Coverage increased 42.7% -> 43.3%.
2025-12-01 12:32:37 +00:00
rcourtman
5d61864495 Add unit tests for importTransaction (internal/config)
24 test cases covering transaction lifecycle: staging, commit, rollback, and cleanup.
Tests include atomic commit behavior, backup/restore on failure, directory creation,
permission handling, and idempotent cleanup.

First test file for import_transaction.go. Coverage 42.4% → 42.7%.
2025-11-30 21:04:41 +00:00
rcourtman
33e3744f08 Add unit tests for GuestMetadataStore (internal/config)
22 test cases covering CRUD operations (Get, GetAll, Set, Delete,
ReplaceAll), persistence (Load, save with atomic write, directory
creation), legacy ID migration (GetWithLegacyMigration for clustered
and standalone formats), round-trip verification, and concurrency.

First test file for guest_metadata.go, complementing the similar
docker_metadata_test.go added in previous run.
2025-11-30 20:48:05 +00:00
rcourtman
0223a52f82 Fix cluster proxy token collision - store per-node control tokens
When multiple cluster nodes register sensor-proxy, each registration
was overwriting the previous node's control token on the shared
PVEInstance. This caused "Proxy token not recognized" errors on all
but the last-registered node.

Changes:
- Add TemperatureProxyControlToken field to ClusterEndpoint struct
- Store control tokens per-endpoint for cluster registrations
- Check both instance-level and endpoint-level tokens when validating

Related to #738
2025-11-30 20:37:58 +00:00
rcourtman
94c54ae1df Add unit tests for DockerMetadataStore (internal/config)
22 test cases covering:
- Get/GetAll: retrieval of container/service metadata
- GetHostMetadata/GetAllHostMetadata: retrieval of host metadata
- Set: add/update metadata with persistence
- SetHostMetadata: add/update/delete host metadata
- Delete: remove container metadata
- ReplaceAll: bulk replace with nil entry handling
- Load: versioned format, legacy format, nonexistent file, invalid JSON
- Save: directory creation, atomic write
- Round-trip: full persistence cycle verification
- Concurrent access safety

First test file for docker_metadata.go.
2025-11-30 17:20:46 +00:00
rcourtman
1678f4cc88 Add unit tests for OIDC configuration helpers
Test coverage for internal/config/oidc.go functions:
- normaliseList: 8 cases for deduplication, trimming, empty handling
- parseDelimited: 8 cases for comma/space separation
- DefaultRedirectURL: 7 cases for URL construction
- OIDCConfig.Clone: 2 cases for deep copy behavior
- OIDCConfig.ApplyDefaults: 9 cases for defaults and normalization
- OIDCConfig.Validate: 9 cases for validation rules
- OIDCConfig.MergeFromEnv: 5 cases for environment variable merging
- NewOIDCConfig: constructor verification

Total: 56 new test cases (583 lines).
2025-11-30 16:22:43 +00:00
rcourtman
0254fc7397 Add unit tests for discovery config functions (config)
Add tests for CloneDiscoveryConfig and NormalizeDiscoveryConfig:
- CloneDiscoveryConfig: 7 tests verifying deep copy, slice independence
- NormalizeDiscoveryConfig: 16 tests covering defaults, validation,
  sanitization, edge cases

Total: 23 test cases for discovery config handling.
2025-11-30 10:49:03 +00:00
rcourtman
18d3722fb8 Add unit tests for sanitizeCIDRList (config) 2025-11-30 07:35:52 +00:00
rcourtman
0a9669928a Add unit tests for config utility functions (config)
Test coverage for IsPasswordHashed, IsValidDiscoveryEnvironment, and
splitAndTrim functions. 63 test cases covering bcrypt hash validation,
discovery environment validation, and comma-separated string parsing.
2025-11-30 07:27:59 +00:00
rcourtman
e108dc3180 Add unit tests for config encryption functions
Tests encryptWithPassphrase and decryptWithPassphrase:
- Roundtrip encryption/decryption with various data types
- Unique output verification (random salt/nonce)
- Wrong passphrase rejection
- Tampered ciphertext detection
- Minimum size validation
- Base64 roundtrip (matching export/import flow)
2025-11-30 01:48:20 +00:00
rcourtman
dbf32baaa0 Rebuild agent token bindings on API token config reload
When api_tokens.json is modified on disk, the ConfigWatcher reloads
the tokens into memory. However, the Monitor's dockerTokenBindings and
hostTokenBindings maps were not synchronized with the new token set,
causing orphaned bindings when agents reconnect after reinstall.

Add SetAPITokenReloadCallback to ConfigWatcher that triggers Monitor's
new RebuildTokenBindings method after token reload. This method
reconstructs the binding maps from current Docker host and host agent
state, keeping only bindings for tokens that still exist in config.

Related to #773
2025-11-29 14:09:30 +00:00
rcourtman
584bdc88ff fix: add hash check to polling fallback and persist sidebar state
- Add content hash check to config watcher polling path to match fsnotify
  behavior, preventing unnecessary restarts when .env is touched but
  content unchanged (Related to #748)
- Change settings sidebar to expanded by default and persist user
  preference using usePersistentSignal (Related to #764)
2025-11-27 15:24:23 +00:00
rcourtman
ad998a1e2f style: fix staticcheck style warnings
- Merge variable declaration with assignment (S1021)
- Use unconditional strings.TrimPrefix (S1017)
- Remove unnecessary nil checks around range (S1031)
- Remove unnecessary fmt.Sprintf (S1039)
- Use copy() instead of manual loop (S1001)
- Use time.Until instead of t.Sub(time.Now()) (S1024)
- Use buf.String() instead of string(buf.Bytes()) (S1030)
2025-11-27 09:19:33 +00:00
rcourtman
0dc0235f77 chore: remove dead code and unused files
Remove 604 lines of unreachable code identified by deadcode analysis:
- internal/config/credentials.go: unused credential resolver
- internal/config/registration.go: unused registration config
- internal/monitoring/poller.go: unused channel-based polling (keep types)
- internal/api/middleware.go: unused TimeoutHandler, JSONHandler, NewAPIError, ValidationError
- internal/api/security.go: unused IsLockedOut, SecurityHeaders
- internal/api/auth.go: unused min helper
- internal/config/config.go: unused SaveConfig
- internal/config/client_helpers.go: unused CreatePBSConfigFromFields
- internal/logging/logging.go: unused NewRequestID
2025-11-27 00:05:04 +00:00
courtmanr@gmail.com
930c086556 WIP: Save all pending changes including frontend updates and unified agent scaffolding 2025-11-25 11:27:07 +00:00
courtmanr@gmail.com
f4c2bd7c35 Implement UI toggle for Hide Local Login (related to issue #750) 2025-11-25 08:14:19 +00:00
courtmanr@gmail.com
e9665cb1cd Fix: Prevent unnecessary config reloads by checking file content hash 2025-11-24 22:42:24 +00:00
courtmanr@gmail.com
f347dedcdd Add PULSE_AUTH_HIDE_LOCAL_LOGIN option to hide password form
Implements #750 - allows hiding the username/password login form when
using OIDC SSO to avoid user confusion, while maintaining security.

- Added HideLocalLogin config option (env: PULSE_AUTH_HIDE_LOCAL_LOGIN)
- Exposed hideLocalLogin in /api/security/status endpoint
- Updated Login.tsx to conditionally hide local login form
- Added escape hatch via ?show_local=true URL parameter

This approach avoids the security and upgrade issues that led to
DISABLE_AUTH being removed (see #707, #678), while solving the UX
problem of users being confused by multiple login options.
2025-11-24 17:40:43 +00:00
courtmanr@gmail.com
4168eb41f8 Fix host agent registration verification issues (#746)
- Change default server listen addresses to empty string (listen on all interfaces including IPv6)
- Add short hostname matching fallback in host lookup API to handle FQDN vs short name mismatches
- Implement retry loop (30s) in both Windows and Linux/macOS installers for registration verification
- Fix lint errors: remove unnecessary fmt.Sprintf and nil checks before len()

This resolves the 'Installer could not yet confirm host registration with Pulse' warning
by addressing timing issues, hostname matching, and network connectivity.
2025-11-24 14:28:09 +00:00
courtmanr@gmail.com
0d4406b91f Add mutex protection for config watcher reloads (re #748)
Introduced sync.RWMutex to protect concurrent access to configuration
fields (AuthUser, AuthPass, APITokens) that are modified by the
ConfigWatcher at runtime.

- Added global config.Mu RWMutex in internal/config/config.go
- Protected config updates in ConfigWatcher.reloadConfig() and reloadAPITokens()
- Protected config reads in CheckAuth and all API token handlers
- Protected Router.SetConfig() during full config reloads

This prevents race conditions when .env file changes trigger config
reloads while authentication handlers are reading the same fields.
2025-11-24 07:45:21 +00:00
rcourtman
a0c27e84ae Persist PVE fallback host after portless retry 2025-11-22 17:06:15 +00:00
rcourtman
9fc019b90c Handle PVE portless fallback when default port fails 2025-11-22 17:01:16 +00:00
rcourtman
9c6c8cc0a0 Add OIDC CA bundle support 2025-11-22 09:44:03 +00:00
rcourtman
2207642fa9 Related to #727: normalize persisted Proxmox hosts 2025-11-20 19:58:05 +00:00
rcourtman
17102706ae Related to #727: restore default Proxmox ports 2025-11-20 16:35:08 +00:00
courtmanr@gmail.com
11477546f8 Update config persistence, crypto, and dev script 2025-11-20 11:46:20 +00:00
rcourtman
51b368ddc1 feat: make PVE polling interval configurable (related to #467) 2025-11-18 21:30:04 +00:00
rcourtman
47d5c14aef Improve temperature proxy control-plane flow 2025-11-15 21:49:51 +00:00
rcourtman
2ee693cc63 Add HTTP mode to pulse-sensor-proxy for multi-instance temperature monitoring
This implements HTTP/HTTPS support for pulse-sensor-proxy to enable
temperature monitoring across multiple separate Proxmox instances.

Architecture changes:
- Dual-mode operation: Unix socket (local) + HTTPS (remote)
- Unix socket remains default for security/performance (no breaking change)
- HTTP mode enables temps from external PVE hosts

Backend implementation:
- Add HTTPS server with TLS + Bearer token authentication to sensor-proxy
- Add TemperatureProxyURL and TemperatureProxyToken fields to PVEInstance
- Add HTTP client (internal/tempproxy/http_client.go) for remote proxy calls
- Update temperature collector to prefer HTTP proxy when configured
- Fallback logic: HTTP proxy → Unix socket → direct SSH (if not containerized)

Configuration:
- pulse-sensor-proxy config: http_enabled, http_listen_addr, http_tls_cert/key, http_auth_token
- PVEInstance config: temperature_proxy_url, temperature_proxy_token
- Environment variables: PULSE_SENSOR_PROXY_HTTP_* for all HTTP settings

Security:
- TLS 1.2+ with modern cipher suites
- Constant-time token comparison (timing attack prevention)
- Rate limiting applied to HTTP requests (shared with socket mode)
- Audit logging for all HTTP requests

Next steps:
- Update installer script to support HTTP mode + auto-registration
- Add Pulse API endpoint for proxy registration
- Generate TLS certificates during installation
- Test multi-instance temperature collection

Related to #571 (multi-instance architecture)
2025-11-13 16:13:53 +00:00
rcourtman
accecdb50b Make api_tokens.json authoritative source for API tokens (fixes #685)
This is the proper architectural fix for #685. The previous commit was a
bandaid that prevented unnecessary .env writes. This commit addresses the
root cause: dual-source-of-truth for API tokens (.env vs api_tokens.json).

Changes:

1. Startup Migration (config.go:896-951):
   - When loading config, if API_TOKEN/API_TOKENS exist in .env but not in
     api_tokens.json, automatically migrate them
   - Migrated tokens are named "Migrated from .env (prefix)" for clarity
   - Logs a deprecation warning: API_TOKEN/API_TOKENS in .env are deprecated
   - Leaves .env untouched (safe for existing deployments)

2. Config Watcher Changes (watcher.go:338-424):
   - Only load tokens from .env if api_tokens.json is EMPTY
   - Once api_tokens.json has records, it becomes the authoritative source
   - .env changes no longer trigger token overwrites when api_tokens.json exists
   - Logs debug message when ignoring env tokens

Result:
- Existing deployments: env tokens automatically migrated to api_tokens.json
- UI-created tokens: never overwritten by .env changes
- Dark mode toggle: no longer triggers token reload from .env
- Backward compatible: fresh installs with API_TOKEN in .env still work
- Migration path: users can safely keep API_TOKEN in .env, it will be ignored

Future improvement: Add UI warning when API_TOKEN/API_TOKENS still present
in .env, prompting users to rotate tokens via the UI.
2025-11-11 00:17:40 +00:00
rcourtman
5d99fc2f2d Fix dark mode toggle wiping API tokens (related to #685)
Root cause: SaveSystemSettings calls updateEnvFile which rewrites .env on
any setting change, triggering the config watcher. The watcher sees API_TOKEN
in .env and replaces all UI-created tokens with "Environment token" records,
wiping out host-agent scoped tokens.

Fix: updateEnvFile now compares the new content with existing content and
skips the write if nothing changed. Since dark mode (and other UI settings)
are stored in system.json, not .env, toggling theme no longer triggers
unnecessary .env rewrites.

This prevents the config watcher from being triggered unnecessarily and
preserves UI-created API tokens when changing cosmetic settings.

Future improvement: Deprecate API_TOKEN/API_TOKENS from .env entirely and
make api_tokens.json the single source of truth (requires migration logic).
2025-11-11 00:11:41 +00:00
rcourtman
1b221cca71 feat: Add configurable allowlist for webhook private IP targets (addresses #673)
Allow homelab users to send webhooks to internal services while maintaining security defaults.

Changes:
- Add webhookAllowedPrivateCIDRs field to SystemSettings (persistent config)
- Implement CIDR parsing and validation in NotificationManager
- Convert ValidateWebhookURL to instance method to access allowlist
- Add UI controls in System Settings for configuring trusted CIDR ranges
- Maintain strict security by default (block all private IPs)
- Keep localhost, link-local, and cloud metadata services blocked regardless of allowlist
- Re-validate on both config save and webhook delivery (DNS rebinding protection)
- Add comprehensive tests for CIDR parsing and IP matching

Backend:
- UpdateAllowedPrivateCIDRs() parses comma-separated CIDRs with validation
- Support for bare IPs (auto-converts to /32 or /128)
- Thread-safe allowlist updates with RWMutex
- Logging when allowlist is updated or used
- Validation errors prevent invalid CIDRs from being saved

Frontend:
- New "Webhook Security" section in System Settings
- Input field with examples and helpful placeholder text
- Real-time unsaved changes tracking
- Loads and saves allowlist via system settings API

Security:
- Default behavior unchanged (all private IPs blocked)
- Explicit opt-in required via configuration
- Localhost (127/8) always blocked
- Link-local (169.254/16) always blocked
- Cloud metadata services always blocked
- DNS resolution checked at both save and send time

Testing:
- Tests for CIDR parsing (valid/invalid inputs)
- Tests for IP allowlist matching
- Tests for bare IP address handling
- Tests for security boundaries (localhost, link-local remain blocked)

Related to #673

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 08:31:12 +00:00
rcourtman
8f05fc0a57 Improve backup-age alerts to show VM/CT names in multi-cluster setups (related to #668)
This change fixes backup-age alert notifications to display VM/CT names
instead of just "VMID XXX" in multi-cluster environments where backups
are stored on PBS.

Changes:
- Store all guests per VMID (not just first match) to handle VMID collisions across clusters
- Persist last-known guest names/types in metadata store for deleted VMs
- Enrich backup correlation with persisted metadata when live inventory is empty
- Update CheckBackups to handle multiple VMID matches intelligently

The fix addresses two scenarios:
1. Multiple PVE clusters with same VMID backing up to one PBS
2. VMs deleted from Proxmox but backups still exist on PBS

Backup-age alerts will now show proper VM/CT names when:
- A unique guest exists with that VMID (live or persisted)
- Multiple guests share a VMID (uses first match, consistent with current behavior)

When truly ambiguous (multiple live VMs, same VMID, no way to determine origin),
the alert gracefully falls back to showing "VMID XXX".
2025-11-08 18:24:04 +00:00
rcourtman
3ad35976b2 Clarify Docker agent cycling troubleshooting for cloned VMs/LXCs (related to #648)
Enhanced the "Docker hosts cycling" troubleshooting entry to explicitly
call out VM/LXC cloning as a cause of identical agent IDs. Added specific
remediation steps for regenerating machine IDs on cloned systems.

This addresses the resolution path discovered in discussion #648 where a
user cloned a Proxmox LXC and encountered cycling behavior even with
separate API tokens because the agent IDs were duplicated.
2025-11-07 22:59:19 +00:00
rcourtman
48fabdd827 Improve Docker temperature monitoring documentation for clarity (related to #600)
Updated the Quick Start for Docker section in TEMPERATURE_MONITORING.md to be
more user-friendly and address common setup issues:

- Added clear explanation of why the proxy is needed (containers can't access hardware)
- Provided concrete IP example instead of placeholder
- Showed full docker-compose.yml context with proper YAML structure
- Added sudo to commands where needed
- Updated docker-compose commands to v2 syntax with note about v1
- Expanded verification steps with clearer success indicators
- Added reminder to check container name in verification commands

These improvements should help users who encounter blank temperature displays
due to missing proxy installation or bind mount configuration.
2025-11-07 15:09:42 +00:00
rcourtman
431769024f Fix P1: Config Persistence transaction field synchronization
**Problem**: writeConfigFileLocked() accessed c.tx field without synchronization
- Function reads c.tx to check if transaction is active (line 109)
- c.tx modified by begin/endTransaction under lock, but read without lock
- Race condition: c.tx could change between check and use

**Impact**:
- Inconsistent transaction handling
- File could be written directly when it should be staged
- Or staged when it should be written directly
- Data corruption risk during config imports

**Fix** (lines 108-128):
- Added documentation that caller MUST hold c.mu lock
- Read c.tx into local variable tx while lock is held
- Use local copy for transaction check
- Safe because all callers hold c.mu when calling writeConfigFileLocked
- Transaction field only modified while holding c.mu in begin/endTransaction

This maintains the existing contract (callers hold lock) while making the transaction read safe and explicit.
2025-11-07 10:00:31 +00:00
rcourtman
6ca4d9b750 Fix P1/P2 infrastructure issues: panic recovery and optimizations
This commit addresses 4 P1 important issues and 1 P2 optimization in infrastructure components:

**P1-1: Missing Panic Recovery in Discovery Service** (service.go:172-195, 499-542)
- **Problem**: No panic recovery in Start(), ForceRefresh(), SetSubnet() goroutines
- **Impact**: Silent service death if scan panics, broken discovery with no monitoring
- **Fix**:
  - Wrapped initial scan goroutine with defer/recover (lines 172-182)
  - Wrapped scanLoop goroutine with defer/recover (lines 185-195)
  - Wrapped ForceRefresh scan with defer/recover (lines 499-509)
  - Wrapped SetSubnet scan with defer/recover (lines 532-542)
  - All log panics with stack traces for debugging

**P1-2: Missing Panic Recovery in Config Watcher Callback** (watcher.go:546-556)
- **Problem**: User-provided onMockReload callback could panic and crash watcher
- **Impact**: Panicking callback kills watcher goroutine, no config updates
- **Fix**: Wrapped callback invocation with defer/recover and stack trace logging

**P1-3: Session Store Stop() Using Send Instead of Close** (session_store.go:16-84)
- **Problem**: Stop() used channel send which blocks if nobody reads
- **Impact**: Stop() hangs if backgroundWorker already exited
- **Fix**:
  - Added sync.Once field stopOnce (line 22)
  - Changed Stop() to use close() within stopOnce.Do() (lines 80-84)
  - Prevents double-close panic and ensures all readers are signaled

**P2-1: Backup Cleanup Inefficient O(n²) Sort** (persistence.go:1424-1427)
- **Problem**: Bubble sort used to sort backups by modification time
- **Impact**: Inefficient for large backup counts (>100 files)
- **Fix**:
  - Replaced bubble sort with sort.Slice() using O(n log n) algorithm
  - Added "sort" import (line 9)
  - Maintains same oldest-first ordering for deletion logic

All fixes add defensive programming without changing external behavior. Panic recovery ensures services continue operating even with bugs, while optimization reduces cleanup time for backup-heavy environments.
2025-11-07 09:55:22 +00:00
rcourtman
ba6d934204 Fix critical P0 infrastructure concurrency issues
This commit addresses 3 critical P0 race conditions and resource leaks in core infrastructure:

**P0-1: Discovery Service Goroutine Leak** (service.go:468, 488)
- **Problem**: ForceRefresh() and SetSubnet() spawned unbounded goroutines without checking if scan already in progress
- **Impact**: Rapid API calls create goroutine explosion, resource exhaustion
- **Fix**:
  - ForceRefresh: Check isScanning before spawning goroutine (lines 470-476)
  - SetSubnet: Check isScanning, defer scan if already running (lines 491-504)
  - Both now log when skipping to aid debugging

**P0-2: Config Persistence Unlock/Relock Race** (persistence.go:1177-1206)
- **Problem**: LoadNodesConfig() unlocked RLock, called SaveNodesConfig (acquires Lock), then relocked
- **Impact**: Another goroutine could modify config between unlock/relock, causing migrated data loss
- **Fix**:
  - Copy instance slices while holding RLock to ensure consistency (lines 1189-1194)
  - Release lock, save copies, then return without relocking (lines 1196-1205)
  - Prevents TOCTOU vulnerability where migrations could be overwritten

**P0-3: Config Watcher Channel Close Race** (watcher.go:19-178)
- **Problem**: Stop() used select-check-close pattern vulnerable to concurrent calls
- **Impact**: Multiple Stop() calls panic on double-close
- **Fix**:
  - Added sync.Once field stopOnce to ConfigWatcher struct (line 26)
  - Changed Stop() to use stopOnce.Do() ensuring single execution (lines 175-178)
  - Removed racy select-based guard

All fixes maintain backwards compatibility and add defensive logging for operational visibility.
2025-11-07 09:49:55 +00:00
rcourtman
9257071ca1 Add encryption status to notification health endpoint (P2)
Backend:
- Add IsEncryptionEnabled() method to ConfigPersistence
- Include encryption status in /api/notifications/health response
- Allows frontend to warn when credentials are stored in plaintext

Frontend:
- Update NotificationHealth type to include encryption.enabled field
- Frontend can now display warnings when encryption is disabled

This addresses the P2 requirement for encryption visibility, allowing
operators to know when notification credentials are not encrypted at rest.
2025-11-07 08:36:55 +00:00
rcourtman
20854256c3 Fix VM migration issue where custom alert thresholds are lost
Resolves #641

## Problem
When a VM migrates between Proxmox nodes, Pulse was treating it as a new
resource and discarding custom alert threshold overrides. This occurred
because guest IDs included the node name (e.g., `instance-node-VMID`),
causing the ID to change when the VM moved to a different node.

Users reported that after migrating a VM, previously disabled alerts
(e.g., memory threshold set to 0) would resume firing.

## Root Cause
Guest IDs were constructed as:
- Standalone: `node-VMID`
- Cluster: `instance-node-VMID`

When a VM migrated from node1 to node2, the ID changed from
`instance-node1-100` to `instance-node2-100`, causing:
- Alert threshold overrides to be orphaned (keyed by old ID)
- Guest metadata (custom URLs, descriptions) to be orphaned
- Active alerts to reference the wrong resource ID

## Solution
Changed guest ID format to be stable across node migrations:
- New format: `instance-VMID` (for both standalone and cluster)
- Retains uniqueness across instances while being node-independent
- Allows VMs to migrate freely without losing configuration

## Implementation

### Backend Changes
1. **Guest ID Construction** (`monitor_polling.go`):
   - Simplified to always use `instance-VMID` format
   - Removed node from the ID construction logic

2. **Alert Override Migration** (`alerts.go`):
   - Added lazy migration in `getGuestThresholds()`
   - Detects legacy ID formats and migrates to new format
   - Preserves user configurations automatically

3. **Guest Metadata Migration** (`guest_metadata.go`):
   - Added `GetWithLegacyMigration()` helper method
   - Called during VM/container polling to migrate metadata
   - Preserves custom URLs and descriptions

4. **Active Alerts Migration** (`alerts.go`):
   - Added migration logic in `LoadActiveAlerts()`
   - Translates legacy alert resource IDs to new format
   - Preserves alert acknowledgments across restarts

### Frontend Changes
5. **ID Construction Updates**:
   - `ThresholdsTable.tsx`: Updated fallback from `instance-node-vmid` to `instance-vmid`
   - `Dashboard.tsx`: Simplified guest ID construction
   - `GuestRow.tsx`: Updated `buildGuestId()` helper

## Migration Strategy
- **Lazy Migration**: Configs are migrated as guests are discovered
- **Backwards Compatible**: Old IDs are detected and automatically converted
- **Zero Downtime**: No manual intervention required
- **Persisted**: Migrated configs are saved on next config write cycle

## Testing Recommendations
After deployment:
1. Verify existing alert overrides still apply
2. Test VM migration - confirm thresholds persist
3. Check guest metadata (custom URLs) survive migration
4. Verify active alerts maintain acknowledgment state

## Related
- Addresses similar issues with guest metadata and active alert tracking
- Lays groundwork for any future guest-specific configuration features
- Aligns with project philosophy: correctness and UX over implementation complexity
2025-11-06 10:27:15 +00:00
rcourtman
e21a72578f Add configurable SSH port for temperature monitoring
Related to #595

This change adds support for custom SSH ports when collecting temperature
data from Proxmox nodes, resolving issues for users who run SSH on non-standard
ports.

**Why SSH is still needed:**
Temperature monitoring requires reading /sys/class/hwmon sensors on Proxmox
nodes, which is not exposed via the Proxmox API. Even when using API tokens
for authentication, Pulse needs SSH access to collect temperature data.

**Changes:**
- Add `sshPort` configuration to SystemSettings (system.json)
- Add `SSHPort` field to Config with environment variable support (SSH_PORT)
- Add per-node SSH port override capability for PVE, PBS, and PMG instances
- Update TemperatureCollector to accept and use custom SSH port
- Update SSH known_hosts manager to support non-standard ports
- Add NewTemperatureCollectorWithPort() constructor with port parameter
- Maintain backward compatibility with NewTemperatureCollector() (uses port 22)
- Update frontend TypeScript types for SSH port configuration

**Configuration methods:**
1. Environment variable: SSH_PORT=2222
2. system.json: {"sshPort": 2222}
3. Per-node override in nodes.enc (future UI support)

**Default behavior:**
- Defaults to port 22 if not configured
- Maintains full backward compatibility
- No changes required for existing deployments

The implementation includes proper ssh-keyscan port handling and known_hosts
management for non-standard ports using [host]:port notation per SSH standards.
2025-11-05 20:03:29 +00:00
rcourtman
b1831d7b3e Add guest URL support for PVE hosts
Related to discussion #615

Add optional GuestURL field to PVE instances and cluster endpoints,
allowing users to specify a separate guest-accessible URL for web UI
navigation that differs from the internal management URL.

Backend changes:
- Add GuestURL field to PVEInstance and ClusterEndpoint structs
- Add GuestURL field to Node model
- Update cluster auto-discovery to preserve existing GuestURL values
- Update node creation logic to populate GuestURL from config
- Update API handlers to accept and persist GuestURL field

Frontend changes:
- Add GuestURL input field to NodeModal for configuration
- Update NodeGroupHeader and NodeSummaryTable to use GuestURL for navigation
- Add GuestURL to Node and PVENodeConfig TypeScript interfaces

When GuestURL is configured, it will be used for navigation links
instead of the Host URL, allowing users to access PVE hosts through
a reverse proxy or different domain while maintaining internal API
connections.
2025-11-05 19:06:08 +00:00
rcourtman
c93581e1aa Add DNS caching to reduce excessive DNS queries
Related to #608

Implements DNS caching using rs/dnscache to dramatically reduce DNS query
volume for frequently accessed Proxmox hosts. Users were reporting 260,000+
DNS queries in 37 hours for the same hostnames.

Changes:
- Added rs/dnscache dependency for DNS resolution caching
- Created pkg/tlsutil/dnscache.go with DNS cache wrapper
- Updated HTTP client creation to use cached DNS resolver
- Added DNSCacheTimeout configuration option (default: 5 minutes)
- Made DNS cache timeout configurable via:
  - system.json: dnsCacheTimeout field (seconds)
  - Environment variable: DNS_CACHE_TIMEOUT (duration string)
- DNS cache periodically refreshes to prevent stale entries

Benefits:
- Reduces DNS query load on local DNS servers by ~99%
- Reduces network traffic and DNS query log volume
- Maintains fresh DNS entries through periodic refresh
- Configurable timeout for different network environments

Default behavior: 5-minute cache timeout with automatic refresh
2025-11-05 18:25:38 +00:00
rcourtman
27f2038dab Add per-node temperature monitoring and fix critical config update bug
This commit implements per-node temperature monitoring control and fixes a critical
bug where partial node updates were destroying existing configuration.

Backend changes:
- Add TemperatureMonitoringEnabled field (*bool) to PVEInstance, PBSInstance, and PMGInstance
- Update monitor.go to check per-node temperature setting with global fallback
- Convert all NodeConfigRequest boolean fields to *bool pointers
- Add nil checks in HandleUpdateNode to prevent overwriting unmodified fields
- Fix critical bug where partial updates zeroed out MonitorVMs, MonitorContainers, etc.
- Update NodeResponse, NodeFrontend, and StateSnapshot to include temperature setting
- Fix HandleAddNode and test connection handlers to use pointer-based boolean fields

Frontend changes:
- Add temperatureMonitoringEnabled to Node interface and config types
- Create per-node temperature monitoring toggle handler with optimistic updates
- Update NodeModal to wire up per-node temperature toggle
- Add isTemperatureMonitoringEnabled helper to check effective monitoring state
- Update ConfiguredNodeTables to show/hide temperature badge based on monitoring state
- Update NodeSummaryTable to conditionally show temperature column
- Pass globalTemperatureMonitoringEnabled prop through component tree

The critical bug fix ensures that when updating a single field (like temperature
monitoring), the backend only modifies that specific field instead of zeroing out
all other boolean configuration fields.
2025-11-05 14:11:53 +00:00