Commit graph

84 commits

Author SHA1 Message Date
rcourtman
c25b6f4e94 Fix setup-script tokens and proxy registration timing 2025-11-18 10:22:54 +00:00
rcourtman
50f8b76921 Fix auto-registration token parsing and hostname 2025-11-18 09:10:03 +00:00
rcourtman
13daa61d1d Harden turnkey install and proxy auto-registration 2025-11-18 00:24:50 +00:00
rcourtman
f9341ae1fc Improve temperature proxy workflow 2025-11-17 14:25:46 +00:00
rcourtman
48b5bc5489 Auto-deploy proxy for standalone temp monitoring 2025-11-16 09:47:07 +00:00
rcourtman
47d5c14aef Improve temperature proxy control-plane flow 2025-11-15 21:49:51 +00:00
rcourtman
c1f636edb9 Fix critical cleanup implementation issues found by Codex review
**Host Detection**:
- Now detects localhost by hostname and FQDN, not just IP
- Fixes issue where nodes configured as https://hostname:8006 would skip
  localhost cleanup (API tokens, bind mounts, service removal)

**Systemd Sandbox**:
- Added /etc/pve and /etc/systemd/system to ReadWritePaths
- Allows cleanup script to modify Proxmox configs and systemd units

**Uninstaller Improvements**:
- Use UUID for transient unit names (prevents same-second collisions)
- Added --purge flag for complete removal
- Added --wait and --collect flags to capture exit code
- Now fails cleanup if uninstaller exits non-zero

**Path Migration**:
- Fixed all /usr/local references to use /opt/pulse/sensor-proxy
- Updated forced command in SSH authorized_keys
- Updated self-heal script installer path
- Updated Go backend removal helpers (supports both new and legacy paths)

These fixes address Codex findings: hostname detection, sandbox permissions,
transient unit collisions, incomplete purging, and incomplete path migration.

Related to cleanup implementation testing.
2025-11-15 00:33:41 +00:00
rcourtman
c3df013242 Allow dev builds to skip proxy version gate 2025-11-14 21:34:55 +00:00
rcourtman
6bf32f98d6 Fix storage/disk/backup disappearing for clusters with VerifySSL enabled
Related to #670, #657

The fix in v4.26.5 (commit 59a97f2e3) attempted to resolve storage disappearing
by preferring hostnames over IPs when TLS hostname verification is required
(VerifySSL=true and no fingerprint). However, that fix was ineffective because
the cluster discovery code was populating BOTH the Host and IP fields with the
IP address.

**Root Cause:**
In internal/api/config_handlers.go, the detectPVECluster function was setting:
- endpoint.Host = schemePrefix + clusterNode.IP (when IP was available)
- endpoint.IP = clusterNode.IP

This meant both fields contained the same IP address. When the monitoring code
tried to prefer endpoint.Host for TLS validation (internal/monitoring/monitor.go:
361-368), it was still getting an IP, causing certificate validation to fail
with "certificate is valid for pve01.example.com, not 10.0.0.44".

**Solution:**
Separate the Host and IP fields properly during cluster discovery:
- endpoint.Host = hostname (e.g., "https://pve01:8006") for TLS validation
- endpoint.IP = IP address (e.g., "10.0.0.44") for DNS-free connections

The existing logic in clusterEndpointEffectiveURL() can now correctly choose
between them based on TLS requirements.

**Impact:**
Users with VerifySSL=true who upgraded to v4.26.1-v4.26.5 and lost storage
visibility should now see storage, VM disks, and backups again after this fix.
2025-11-08 23:07:49 +00:00
rcourtman
270840801a Fix setup script fmt.Sprintf argument misalignment (related to #663)
The setup script template had 44 %s placeholders, but the fmt.Sprintf call
arguments were out of order starting at position 15. This caused the Pulse
URL to be inserted where the token name should be, resulting in errors like:

  Token ID: pulse-monitor@pam!http://192.168.0.44:7655

Instead of the correct format:

  Token ID: pulse-monitor@pam!pulse-192-168-0-44-1762545916

Changes:
- Escaped %s in printf helper (line 3949) so it doesn't consume arguments
- Reordered fmt.Sprintf arguments (lines 4727-4732) to match template order
- Removed 2 extra pulseURL arguments that were causing the shift

This fix ensures all 44 placeholders receive the correct values in order.
2025-11-08 07:52:19 +00:00
rcourtman
7ed9203e4b Fix config backup/restore failures (related to #646)
Addresses two issues preventing configuration backup/restore:

1. Export passphrase validation mismatch: UI only validated 12+ char
   requirement when using custom passphrase, but backend always enforced
   it. Users with shorter login passwords saw unexplained failures.
   - Frontend now validates all passphrases meet 12-char minimum
   - Clear error message suggests custom passphrase if login password too short

2. Import data parsing failed silently: Frontend sent `exportData.data`
   which was undefined for legacy/CLI backups (raw base64 strings).
   Backend rejected these with no logs.
   - Frontend now handles both formats: {status, data} and raw strings
   - Backend logs validation failures for easier troubleshooting

Related to #646 where user reported "error after entering password" with
no container logs. These changes ensure proper validation feedback and
make the backup system resilient to different export formats.
2025-11-06 17:53:54 +00:00
rcourtman
dfe960deb4 Fix container SSH detection and improve troubleshooting for issue #617
Related to #617

This fixes a misconfiguration scenario where Docker containers could
attempt direct SSH connections (producing [preauth] log spam) instead
of using the sensor proxy.

Changes:
- Fix container detection to check PULSE_DOCKER=true in addition to
  system.InContainer() heuristics (both temperature.go and config_handlers.go)
- Upgrade temperature collection log from Error to Warn with actionable
  guidance about mounting the proxy socket
- Add Info log when dev mode override is active so operators understand
  the security posture
- Add troubleshooting section to docs for SSH [preauth] logs from containers

The container detection was inconsistent - monitor.go checked both flags
but temperature.go and config_handlers.go only checked InContainer().
Now all locations consistently check PULSE_DOCKER || InContainer().
2025-11-06 09:57:53 +00:00
rcourtman
0647a76c55 Fix temperature monitoring SSH key availability in containerized setup flow
Addresses issue #635 where users encounter "can't find the SSH key" errors
when enabling temperature monitoring during automated PVE setup with Pulse
running in Docker.

Root cause:
- Setup script embeds SSH keys at generation time (when downloaded)
- For containerized Pulse, keys are empty until pulse-sensor-proxy is installed
- Script auto-installs proxy, but didn't refresh keys after installation
- This caused temperature monitoring setup to fail with confusing errors

Changes:
1. After successful proxy installation, immediately fetch and populate the
   proxy's SSH public key (lines 4068-4080)
2. Update bash variables SSH_SENSORS_PUBLIC_KEY and SSH_SENSORS_KEY_ENTRY
   so temperature monitoring setup can proceed in the same script run
3. Improve error messaging when keys aren't available (lines 4424-4453):
   - Clear explanation of containerized Pulse requirements
   - Step-by-step instructions for container restart and verification
   - Separate guidance for bare-metal vs containerized deployments

Flow improvements:
- Initial run: Proxy installs → keys fetched → temp monitoring configures
- Rerun after container restart: Keys fetched at script start → works
- Both scenarios now handled correctly

Related to #635
2025-11-05 23:11:45 +00:00
rcourtman
d28cfed3c7 Improve temperature monitoring setup messaging for containerized deployments
When Pulse is running in a container and the SSH key is not available,
provide clearer guidance about the pulse-sensor-proxy requirement and
include documentation link for Docker deployments.

This helps users understand that containerized Pulse needs the host-side
sensor proxy to access temperature data from Proxmox hosts.
2025-11-05 23:05:47 +00:00
rcourtman
e21a72578f Add configurable SSH port for temperature monitoring
Related to #595

This change adds support for custom SSH ports when collecting temperature
data from Proxmox nodes, resolving issues for users who run SSH on non-standard
ports.

**Why SSH is still needed:**
Temperature monitoring requires reading /sys/class/hwmon sensors on Proxmox
nodes, which is not exposed via the Proxmox API. Even when using API tokens
for authentication, Pulse needs SSH access to collect temperature data.

**Changes:**
- Add `sshPort` configuration to SystemSettings (system.json)
- Add `SSHPort` field to Config with environment variable support (SSH_PORT)
- Add per-node SSH port override capability for PVE, PBS, and PMG instances
- Update TemperatureCollector to accept and use custom SSH port
- Update SSH known_hosts manager to support non-standard ports
- Add NewTemperatureCollectorWithPort() constructor with port parameter
- Maintain backward compatibility with NewTemperatureCollector() (uses port 22)
- Update frontend TypeScript types for SSH port configuration

**Configuration methods:**
1. Environment variable: SSH_PORT=2222
2. system.json: {"sshPort": 2222}
3. Per-node override in nodes.enc (future UI support)

**Default behavior:**
- Defaults to port 22 if not configured
- Maintains full backward compatibility
- No changes required for existing deployments

The implementation includes proper ssh-keyscan port handling and known_hosts
management for non-standard ports using [host]:port notation per SSH standards.
2025-11-05 20:03:29 +00:00
rcourtman
b1831d7b3e Add guest URL support for PVE hosts
Related to discussion #615

Add optional GuestURL field to PVE instances and cluster endpoints,
allowing users to specify a separate guest-accessible URL for web UI
navigation that differs from the internal management URL.

Backend changes:
- Add GuestURL field to PVEInstance and ClusterEndpoint structs
- Add GuestURL field to Node model
- Update cluster auto-discovery to preserve existing GuestURL values
- Update node creation logic to populate GuestURL from config
- Update API handlers to accept and persist GuestURL field

Frontend changes:
- Add GuestURL input field to NodeModal for configuration
- Update NodeGroupHeader and NodeSummaryTable to use GuestURL for navigation
- Add GuestURL to Node and PVENodeConfig TypeScript interfaces

When GuestURL is configured, it will be used for navigation links
instead of the Host URL, allowing users to access PVE hosts through
a reverse proxy or different domain while maintaining internal API
connections.
2025-11-05 19:06:08 +00:00
rcourtman
b972b7f05f Fix broken documentation links for containerized deployments
Replace non-functional docs.pulseapp.io URLs with direct GitHub repository
links. The containerized deployment security documentation exists in
SECURITY.md and was previously inaccessible via the external link.

Changes:
- Update SECURITY.md documentation reference
- Fix three documentation links in config_handlers.go (SSH verification,
  setup script, and security block error messages)
- All links now point to GitHub repository where docs actually live

Related to #607
2025-11-05 18:46:41 +00:00
rcourtman
449d77504f Improve PMG connection testing to validate metrics endpoints
Related to #551

Enhanced the PMG connection test to actually validate the metrics
endpoints that Pulse uses for monitoring, rather than only checking
the version endpoint. This provides users with immediate feedback if
their PMG credentials lack the necessary permissions to collect metrics.

Backend changes:
- Test mail statistics, cluster status, and quarantine endpoints during
  connection test (internal/api/config_handlers.go:1695-1714)
- Return warnings array in test response when endpoints are unavailable
- Increased timeout from 10s to 15s to accommodate multiple endpoint checks
- Added warning logs for failed endpoint checks

Frontend changes:
- Added showWarning() toast function for warning messages
- Enhanced NodeModal to display warning status with amber styling
- Added warnings list display in test results UI
- Updated Settings.tsx to show warnings from connection tests

This change helps users identify permission issues immediately rather
than discovering later that metrics aren't being collected despite a
"successful" connection.
2025-11-05 18:40:39 +00:00
rcourtman
f434a7b9e7 Fix fmt.Sprintf argument count in setup script after Docker/LXC changes
The previous commit added 4 new %s format specifiers for Docker/LXC
instructions but didn't add the corresponding arguments to fmt.Sprintf.

Added 4 pulseURL arguments to match the new format specifiers in the
'unknown environment' section of the setup script.
2025-11-05 18:18:04 +00:00
rcourtman
a1fb79ae6a Fix temperature proxy documentation and setup script for Docker vs LXC clarity
This addresses confusion around temperature monitoring setup for Docker
deployments where users expected a turnkey experience similar to LXC.

The core issue: The setup script and documentation suggested that
temperature monitoring was "automatically configured" for all containerized
deployments, but in reality only LXC containers have a fully automatic
setup. Docker requires manual steps.

Changes:

**Setup Script (config_handlers.go):**
- Fixed "unknown environment" path to show separate instructions for LXC vs Docker
- Docker instructions now correctly show --standalone flag (was incorrectly showing --ctid)
- Added docker-compose.yml bind mount instructions inline
- Added restart command for Docker deployments

**Documentation (TEMPERATURE_MONITORING.md):**
- Added prominent "Deployment-Specific Setup" callout at the top
- Clarified that LXC is fully automatic, Docker requires manual steps
- Reorganized "Setup (Automatic)" section to clearly distinguish:
  - LXC: Fully turnkey (no manual steps)
  - Docker: Manual proxy installation required
  - Node configuration: Works for both
- Updated "Host-side responsibilities" to specify it's Docker-only
- Fixed architecture benefits to reflect LXC vs Docker differences

Why this matters:
- LXC setup script auto-detects the container and runs install-sensor-proxy.sh --ctid
- Docker deployments can't be auto-detected and require --standalone flag
- Users running Docker were getting incorrect instructions (--ctid instead of --standalone)
- Documentation suggested everything was automatic, leading to confusion

Now the documentation and setup script accurately reflect that:
- LXC = Turnkey (automatic)
- Docker = Manual steps required (but well-documented)
- Native = Direct SSH (no proxy)

Related to GitHub Discussion #605
2025-11-05 18:18:04 +00:00
rcourtman
27f2038dab Add per-node temperature monitoring and fix critical config update bug
This commit implements per-node temperature monitoring control and fixes a critical
bug where partial node updates were destroying existing configuration.

Backend changes:
- Add TemperatureMonitoringEnabled field (*bool) to PVEInstance, PBSInstance, and PMGInstance
- Update monitor.go to check per-node temperature setting with global fallback
- Convert all NodeConfigRequest boolean fields to *bool pointers
- Add nil checks in HandleUpdateNode to prevent overwriting unmodified fields
- Fix critical bug where partial updates zeroed out MonitorVMs, MonitorContainers, etc.
- Update NodeResponse, NodeFrontend, and StateSnapshot to include temperature setting
- Fix HandleAddNode and test connection handlers to use pointer-based boolean fields

Frontend changes:
- Add temperatureMonitoringEnabled to Node interface and config types
- Create per-node temperature monitoring toggle handler with optimistic updates
- Update NodeModal to wire up per-node temperature toggle
- Add isTemperatureMonitoringEnabled helper to check effective monitoring state
- Update ConfiguredNodeTables to show/hide temperature badge based on monitoring state
- Update NodeSummaryTable to conditionally show temperature column
- Pass globalTemperatureMonitoringEnabled prop through component tree

The critical bug fix ensures that when updating a single field (like temperature
monitoring), the backend only modifies that specific field instead of zeroing out
all other boolean configuration fields.
2025-11-05 14:11:53 +00:00
rcourtman
10862db4e4 Enhance container detection for temperature SSH safeguards (refs #601) 2025-11-04 22:30:35 +00:00
rcourtman
6eb1a10d9b Refactor: Code cleanup and localStorage consolidation
This commit includes comprehensive codebase cleanup and refactoring:

## Code Cleanup
- Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate)
- Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo)
- Clean up commented-out code blocks across multiple files
- Remove unused TypeScript exports (helpTextClass, private tag color helpers)
- Delete obsolete test files and components

## localStorage Consolidation
- Centralize all storage keys into STORAGE_KEYS constant
- Update 5 files to use centralized keys:
  * utils/apiClient.ts (AUTH, LEGACY_TOKEN)
  * components/Dashboard/Dashboard.tsx (GUEST_METADATA)
  * components/Docker/DockerHosts.tsx (DOCKER_METADATA)
  * App.tsx (PLATFORMS_SEEN)
  * stores/updates.ts (UPDATES)
- Benefits: Single source of truth, prevents typos, better maintainability

## Previous Work Committed
- Docker monitoring improvements and disk metrics
- Security enhancements and setup fixes
- API refactoring and cleanup
- Documentation updates
- Build system improvements

## Testing
- All frontend tests pass (29 tests)
- All Go tests pass (15 packages)
- Production build successful
- Zero breaking changes

Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)
2025-11-04 21:50:46 +00:00
rcourtman
334ed3aedc Improve setup script auth usability 2025-10-25 19:08:48 +00:00
rcourtman
5a2d808aa1 Harden setup token flow and enforce encrypted persistence 2025-10-25 16:00:37 +00:00
rcourtman
d813f2396f Respect custom ports when discovering Proxmox clusters 2025-10-22 17:42:52 +00:00
rcourtman
77108abc65 Propagate config updates to settings nodes (#588) 2025-10-22 13:45:13 +00:00
rcourtman
ff4dc49ae4 Update Pulse install flow and related components 2025-10-21 19:58:53 +00:00
rcourtman
d430efcecb fix: correct fmt.Sprintf argument alignment in PVE setup script
Critical bug fix: The setup script's format string had 33 placeholders
but was only receiving 27 arguments, causing:
- INSTALLER_URL to receive authToken instead of pulseURL
- This made curl try to resolve the token value as a hostname
- Error: 'curl: (6) Could not resolve host: N7AE3P'
- Token ID showed '%!s(MISSING)' in manual setup instructions

Fixed by:
- Added missing tokenName at position 7
- Added literal '%s' strings for version_ge printf placeholders
- Added authToken arguments for Authorization headers (positions 29, 31)
- Ensured all 33 format placeholders have corresponding arguments

Now generates correct URLs:
- INSTALLER_URL: http://192.168.0.160:7655/api/install/install-sensor-proxy.sh
- --pulse-server: http://192.168.0.160:7655
- Token ID: pulse-monitor@pam!pulse-192-168-0-160-[timestamp]
2025-10-20 21:58:37 +00:00
rcourtman
d421f101ba feat: harden temperature proxy installation with better validation and error handling
Setup script improvements (config_handlers.go):
- Remove redundant mount configuration and container restart logic
- Let installer handle all mount/restart operations (single source of truth)
- Eliminate hard-coded mp0 assumption

Installer improvements (install-sensor-proxy.sh):
- Add mount configuration persistence validation via pct config check
- Surface pct set errors instead of silencing with 2>/dev/null
- Capture and display curl download errors with temp files
- Check systemd daemon-reload/enable/restart exit codes
- Show journalctl output when service fails to start
- Make socket verification fatal (was warning)
- Provide clear manual steps when hot-plug fails on running container

This makes the installation fail fast with actionable error messages
instead of silently proceeding with broken configuration.
2025-10-20 21:14:00 +00:00
rcourtman
07f198da63 fix: pass Pulse server URL as argument instead of env var for proxy installer
Changes:
- Replace PULSE_SENSOR_PROXY_FALLBACK_URL env export with --pulse-server argument
- Remove --quiet flag from installer invocation to show download progress
- More reliable than environment variable inheritance in subshells

This ensures the proxy installer can reliably download the binary from the
Pulse server fallback when GitHub is unavailable.
2025-10-20 20:58:25 +00:00
rcourtman
db54233769 fix: show full installer output instead of filtering
The setup script was filtering installer output to only show lines with
✓|⚠️|ERROR, which hid successful download messages like:
'Downloading pulse-sensor-proxy-linux-amd64 from Pulse server...'

This made it appear the installer failed even when the Pulse server
fallback download succeeded. Changed to show all installer output for
better visibility and debugging.

Users will now see the complete installation flow including:
- GitHub download attempt (expected to fail for dev builds)
- Pulse server fallback download (should succeed)
- All setup steps and validations

Improves transparency and reduces confusion during setup
2025-10-20 20:47:41 +00:00
rcourtman
dcad3a3a27 fix: allow dev/main builds to bypass version check
Version check was blocking dev/main builds (e.g., '0.0.0-main-da9da6f')
from using temperature proxy, even though they have the latest code.

Added regex to skip version check for builds matching:
- ^0\.0\.0-main (main branch builds)
- ^dev (dev builds)
- ^main (main version strings)

These builds are assumed to have proxy support since they're built from
the latest codebase.

Fixes testing workflow when installing Pulse with --main flag
2025-10-20 18:19:31 +00:00
rcourtman
93a601d7c7 fix: only check Pulse version for containerized deployments
The version check was blocking ALL v4.23.0 users from temperature monitoring,
even non-containerized ones who don't need the proxy.

Changed to only check version when PULSE_IS_CONTAINERIZED=true, since:
- Non-containerized Pulse can use direct SSH on any version
- Containerized Pulse requires v4.24.0+ for proxy support

This ensures non-containerized v4.23.0 users can still use temperature monitoring
via direct SSH while properly blocking proxy setup for containerized v4.23.0.

Fixes regression introduced in commit fbe4ab83a
2025-10-20 18:03:09 +00:00
rcourtman
001d7f5f1c fix: comprehensive temperature proxy setup improvements
Addresses multiple issues that prevented successful temperature monitoring setup:

1. **Missing log directory (install-sensor-proxy.sh)**
   - Added LogsDirectory=pulse/sensor-proxy to both systemd service templates
   - Fixes crash: "open /var/log/pulse/sensor-proxy/audit.log: read-only file system"
   - Uses systemd's LogsDirectory directive for proper permissions

2. **Invalid pct restart command (install-sensor-proxy.sh:822)**
   - Changed from `pct restart` (doesn't exist) to `pct stop && sleep 2 && pct start`
   - Fixes container restart failures during proxy setup

3. **Version compatibility check (config_handlers.go)**
   - Added const minProxyReadyVersion = "4.24.0"
   - Setup script now queries /api/version endpoint
   - Blocks proxy setup on Pulse < v4.24.0 with clear upgrade message
   - Prevents users from attempting proxy setup on incompatible versions

4. **Proxy service health validation (config_handlers.go)**
   - Verifies pulse-sensor-proxy service is actually running
   - Checks socket exists at /run/pulse-sensor-proxy/pulse-sensor-proxy.sock
   - Shows journalctl command for troubleshooting on failure
   - Sets TEMP_MONITORING_AVAILABLE=false to skip remaining steps

5. **Interactive LXC restart prompt (config_handlers.go)**
   - Replaced passive "please restart" message with interactive prompt
   - Default action is "yes" for easy acceptance
   - Actually executes pct stop/start on confirmation
   - Handles non-interactive environments gracefully

6. **Post-restart socket verification (config_handlers.go)**
   - Validates socket is accessible inside container after restart
   - Provides clear error if mount didn't work
   - Prevents claiming success when setup is incomplete

All changes tested with fresh LXC installation. Temperature monitoring now
works end-to-end with proper error handling and user guidance.

Fixes temperature proxy setup flow for v4.24.0+
2025-10-20 18:00:21 +00:00
rcourtman
5ebb32ce10 feat: enhance runtime configuration and system settings management
Improves configuration handling and system settings APIs to support
v4.24.0 features including runtime logging controls, adaptive polling
configuration, and enhanced config export/persistence.

Changes:
- Add config override system for discovery service
- Enhance system settings API with runtime logging controls
- Improve config persistence and export functionality
- Update security setup handling
- Refine monitoring and discovery service integration

These changes provide the backend support for the configuration
features documented in the v4.24.0 release.
2025-10-20 17:41:19 +00:00
rcourtman
c91b7874ac docs: comprehensive v4.24.0 documentation audit and updates
Complete documentation overhaul for Pulse v4.24.0 release covering all new
features and operational procedures.

Documentation Updates (19 files):

P0 Release-Critical:
- Operations: Rewrote ADAPTIVE_POLLING_ROLLOUT.md as GA operations runbook
- Operations: Updated ADAPTIVE_POLLING_MANAGEMENT_ENDPOINTS.md with DEFERRED status
- Operations: Enhanced audit-log-rotation.md with scheduler health checks
- Security: Updated proxy hardening docs with rate limit defaults
- Docker: Added runtime logging and rollback procedures

P1 Deployment & Integration:
- KUBERNETES.md: Runtime logging config, adaptive polling, post-upgrade verification
- PORT_CONFIGURATION.md: Service naming, change tracking via update history
- REVERSE_PROXY.md: Rate limit headers, error pass-through, v4.24.0 verification
- PROXY_AUTH.md, OIDC.md, WEBHOOKS.md: Runtime logging integration
- TROUBLESHOOTING.md, VM_DISK_MONITORING.md, zfs-monitoring.md: Updated workflows

Features Documented:
- X-RateLimit-* headers for all API responses
- Updates rollback workflow (UI & CLI)
- Scheduler health API with rich metadata
- Runtime logging configuration (no restart required)
- Adaptive polling (GA, enabled by default)
- Enhanced audit logging
- Circuit breakers and dead-letter queue

Supporting Changes:
- Discovery service enhancements
- Config handlers updates
- Sensor proxy installer improvements

Total Changes: 1,626 insertions(+), 622 deletions(-)
Files Modified: 24 (19 docs, 5 code)

All documentation is production-ready for v4.24.0 release.
2025-10-20 17:20:13 +00:00
rcourtman
57429900a6 feat: add adaptive polling scheduler infrastructure (Phase 2 Tasks 1-3)
Implements adaptive scheduling foundation for Phase 2:
- Poll cycle metrics: duration, staleness, queue depth, in-flight counters
- Adaptive scheduler with pluggable staleness/interval/enqueue interfaces
- Config support: ADAPTIVE_POLLING_ENABLED flag + min/max/base intervals
- Feature flag defaults to disabled for safe rollout
- Scheduler wiring into Monitor with conditional instantiation

Tasks 1-3 of 10 complete. Ready for staleness tracker implementation.
2025-10-20 15:13:37 +00:00
rcourtman
524f42cc28 security: complete Phase 1 sensor proxy hardening
Implements comprehensive security hardening for pulse-sensor-proxy:
- Privilege drop from root to unprivileged user (UID 995)
- Hash-chained tamper-evident audit logging with remote forwarding
- Per-UID rate limiting (0.2 QPS, burst 2) with concurrency caps
- Enhanced command validation with 10+ attack pattern tests
- Fuzz testing (7M+ executions, 0 crashes)
- SSH hardening, AppArmor/seccomp profiles, operational runbooks

All 27 Phase 1 tasks complete. Ready for production deployment.
2025-10-20 15:13:37 +00:00
rcourtman
a841a1a6fe fix: show success message instead of warning when using pulse-sensor-proxy
When the setup script detects TEMPERATURE_PROXY_KEY (proxy is available),
it now shows a clear success message instead of attempting SSH verification.

The verification check doesn't work with proxy-based setups since the
container doesn't have SSH keys - all temperature collection happens via
the Unix socket to pulse-sensor-proxy, which handles SSH.

Now shows:
✓ Temperature monitoring configured via pulse-sensor-proxy
  Temperature data will appear in the dashboard within 10 seconds

Instead of the misleading:
⚠️  Unable to verify SSH connectivity.
   Temperature data will appear once SSH connectivity is configured.
2025-10-19 14:06:18 +00:00
rcourtman
557eedb247 fix: detect and use proxy SSH key in setup script for Docker deployments
When pulse-sensor-proxy is available, the setup script now automatically
detects and uses the proxy's SSH public key instead of trying to generate
keys inside the container.

This fixes temperature monitoring setup for Docker deployments where:
- Container has proxy socket mounted at /mnt/pulse-proxy
- Proxy handles SSH connections to nodes
- Setup script needs to distribute the proxy's key, not container's key

The fix queries /api/system/proxy-public-key during setup script generation
and overrides SSH_SENSORS_PUBLIC_KEY if the proxy is available.

Tested with Docker on native Proxmox host (delly) - temperatures collected
successfully via proxy socket.
2025-10-19 13:50:08 +00:00
rcourtman
21712111e7 fix: enable variable expansion in cluster node SSH key heredoc
Changed heredoc delimiter from <<'EOF' to <<EOF to allow bash variable
expansion. Previously $SSH_PUBLIC_KEY and $SSH_RESTRICTED_KEY_ENTRY
were being passed as literal strings instead of their actual values,
so cluster nodes never received the correct SSH keys.

This fixes cluster node ProxyJump setup - now both restricted and
unrestricted keys are properly added to cluster nodes.
2025-10-19 09:08:00 +00:00
rcourtman
c17059ca8e fix: add ProxyJump key to all cluster nodes automatically
The setup script now adds both the restricted and unrestricted SSH keys
to ALL cluster nodes, not just the first one. This makes temperature
monitoring truly turnkey - you say 'yes' to configure cluster nodes and
it automatically sets up both keys on each node.

This ensures:
- All nodes can act as ProxyJump hosts if needed
- All nodes can provide temperature data via sensors
- No manual SSH key configuration required

Fixes turnkey cluster temperature monitoring setup.
2025-10-19 09:02:28 +00:00
rcourtman
bfde490ad4 fix: add unrestricted SSH key for ProxyJump on jump host
When using ProxyJump for cluster temperature monitoring, the jump host
(typically the first cluster node) needs an unrestricted SSH key to allow
connection forwarding. Previously only the restricted key with
command="sensors -j" was added, which blocked ProxyJump.

Now the setup script adds TWO keys:
1. Unrestricted key (for ProxyJump/connection forwarding)
2. Restricted key (for running sensors -j directly)

This allows containerized Pulse to:
- Connect through the jump host to other cluster nodes
- Collect temperature data from all cluster members

Fixes cluster temperature monitoring for Docker/LXC deployments.
2025-10-19 08:56:52 +00:00
rcourtman
78c2228b89 fix: add HostName entries for cluster nodes in SSH config
Added logic to resolve IP addresses for cluster nodes and include them as
HostName entries in the SSH config. Without this, Pulse couldn't connect
to cluster nodes like 'minipc' because the container couldn't resolve
the hostname.

Uses getent to resolve node names to IPs, with fallback to hostname if
resolution fails (for environments where DNS works).
2025-10-19 08:48:25 +00:00
rcourtman
dd70bdee08 feat: switch to Ed25519 SSH keys and add openssh-client to container
- Changed SSH key generation from RSA 2048 to Ed25519 (more secure, faster, smaller)
- Added openssh-client package to Docker image (required for temperature monitoring)
- Updated SSH config template to use id_ed25519
- Removed unused crypto/rsa and crypto/x509 imports

Ed25519 provides better security with shorter keys and faster operations
compared to RSA. The container now has SSH client tools needed to connect
to Proxmox nodes for temperature data collection.
2025-10-19 08:43:20 +00:00
rcourtman
6acfc3f121 fix: use id_rsa in SSH config instead of id_ed25519
The setup script was generating SSH config with IdentityFile ~/.ssh/id_ed25519
but Pulse generates id_rsa keys. Updated SSH config template to use id_rsa
to match the actual key type generated by the monitoring system.
2025-10-19 08:39:55 +00:00
rcourtman
8c51ba727d fix: pass authToken to verify-temperature-ssh endpoint
The setup script was passing pulseURL instead of authToken as the last
parameter, causing 'Authentication required' errors when verifying SSH
connectivity. Fixed parameter order in fmt.Sprintf call.
2025-10-19 08:23:31 +00:00
rcourtman
71abcb2a37 fix: harden SSH config endpoint per Codex security review
Addressed security concerns identified by Codex code review:

1. **Memory exhaustion protection**
   - Added http.MaxBytesReader with 32KB limit
   - Prevents malicious large POST from killing server

2. **Dangerous directive blocking**
   - Reject ProxyCommand, LocalCommand, RemoteCommand
   - Prevents command injection via SSH config

3. **Improved error handling**
   - Check all error returns properly
   - Return 5xx on failures
   - Log file size and path for debugging

4. **Scoped SSH config (critical fix)**
   - Changed from `Host *` to specific cluster nodes
   - Prevents overriding ALL SSH connections
   - Only affects Proxmox nodes for temperature monitoring
   - Preserves other SSH functionality (git, etc.)

Before: Host * broke all SSH connections from Pulse
After: Only Proxmox cluster nodes use ProxyJump

Credit: Codex code review identified these issues
2025-10-18 23:21:59 +00:00
rcourtman
8595b4c001 feat: automatic ProxyJump for turnkey temperature monitoring
Make temperature monitoring truly turnkey by automatically configuring
SSH ProxyJump when running in containers without pulse-sensor-proxy.

How it works:
1. Setup script runs on Proxmox host (e.g., delly)
2. Detects Pulse is containerized but proxy unavailable
3. Automatically configures SSH ProxyJump through the current host
4. Writes SSH config to /home/pulse/.ssh/config in container
5. Temperature monitoring "just works" without manual configuration

Changes:
- Track TEMP_MONITORING_AVAILABLE flag during proxy installation
- Auto-configure ProxyJump if proxy installation fails
- Add /api/system/ssh-config endpoint to write SSH config
- Only prompt for temperature monitoring if it can actually work
- Automatic SSH config: ProxyJump through Proxmox host

Before: User had to manually configure ProxyJump or install proxy
After: Temperature monitoring works automatically after setup script

This makes Docker deployments as turnkey as LXC deployments.
2025-10-18 23:17:38 +00:00