Commit graph

774 commits

Author SHA1 Message Date
rcourtman
966b3a7ebe fix: Setup script fmt.Sprintf argument mismatch causing bash syntax error
Fixed a fmt.Sprintf argument alignment issue in the PVE setup script that
caused a bash syntax error at the end of script execution. The error
manifested as "syntax error near unexpected token EXTRA" followed by the
serverHost URL.

Root cause: 23 arguments were provided for 22 %s placeholders. An extra
tokenName at position 15 pushed all subsequent arguments off by one,
leaving the final serverHost with no placeholder to fill.

Fix: Removed duplicate tokenName at position 15 and ensured serverHost
is correctly positioned at position 22 for the "Host URL" placeholder.
2025-10-13 19:36:37 +00:00
rcourtman
6d83f52763 fix: Add missing tokenName parameter for PVE auto-registration JSON tokenId field
The REGISTER_JSON template at line 3311 was getting storagePerms instead of tokenName
for the tokenId field, causing 'Missing required fields' errors during auto-registration.

Added tokenName parameter before storagePerms to shift all subsequent parameters.

Fixes #<issue-number>
2025-10-13 17:50:12 +00:00
rcourtman
b931e1e126 fix: Setup script UX and auth issues for rc.2
Fixes two issues found in v4.24.0-rc.1:

1. Setup script menu now uses numbered options [1/2/3] instead of
   [I/r/c] for better UX (maintains backward compatibility)

2. Temperature verification endpoint now requires authentication
   (wraps HandleVerifyTemperatureSSH with RequireAuth middleware)

These fixes address user feedback and prepare for v4.24.0-rc.2.
2025-10-13 16:36:25 +00:00
rcourtman
6475ba8374 fix: Update test version fallback and fix lint warnings #64 2025-10-13 15:50:23 +00:00
rcourtman
9362614c66 fix: Address Codex feedback on legacy SSH detection before release
Codex identified critical issues preventing release. All issues resolved:

1. FIXED: LXC container detection reliability
   - Added 4 detection methods (was 2):
     * Method 1: /.dockerenv (Docker)
     * Method 2: /proc/1/cgroup with more patterns (Docker/LXC)
     * Method 3: /run/systemd/container (systemd containers)
     * Method 4: /proc/1/environ container markers
   - Tested on LXC container (debian-go): detection confirmed working

2. FIXED: False positives from proxy outages
   - Now distinguishes "not configured" vs "temporarily down"
   - Checks if /usr/local/bin/pulse-sensor-proxy exists
   - If binary exists but socket missing = transient issue (no banner)
   - If binary missing and SSH keys present = legacy setup (show banner)

3. FIXED: Banner guidance insufficient
   - Added "Go to Nodes →" button that navigates to /settings/nodes
   - Users now have direct path to fix the issue
   - Banner message remains clear and concise

4. ADDED: Telemetry for removal criteria tracking
   - Backend logs: "Legacy SSH configuration detected" (WARN level)
   - Frontend logs: Banner shown/dismissed events to console
   - Enables data-driven removal per criteria: <1% for 30+ days
   - Log format: detection_type=legacy_ssh_migration for easy filtering

Testing:
- Created fake SSH key in /etc/pulse/.ssh/ on LXC container
- Verified detection triggered (legacySSHDetected: true)
- Verified telemetry logged: "Legacy SSH configuration detected"
- Removed fake key, verified detection cleared (null values)
- Container detection working via /run/systemd/container

Ready for release per Codex review.
2025-10-13 15:06:40 +00:00
rcourtman
21714fdf7a refactor: Mark legacy SSH detection as temporary migration scaffolding
Addresses user concern about technical debt: detection code exists only
to handle migration from SSH-in-container to proxy architecture, not to
serve functional purpose of the application.

Changes:
- Add PULSE_LEGACY_DETECTION env var to disable detection without redeployment
- Add explicit removal criteria: v5.0 or <1% detection rate for 30+ days
- Mark all detection code with "MIGRATION SCAFFOLDING" warnings
- Create MIGRATION_SCAFFOLDING.md to track temporary code across codebase
- Document removal instructions for when migration period ends

Backend:
- internal/api/router.go: detectLegacySSH() checks env var and has removal plan
- internal/api/types.go: HealthResponse fields documented as temporary

Frontend:
- src/components/LegacySSHBanner.tsx: Component marked with removal criteria
- src/App.tsx: Banner integration (will be removed with component)

This approach balances user safety during migration (auto-detection catches
rushed admins who skip changelogs) with long-term code cleanliness (explicit
removal plan prevents indefinite technical debt).
2025-10-13 14:54:52 +00:00
rcourtman
6d56917cd9 feat: Add detection for legacy SSH temperature monitoring
Added automatic detection to alert users when they're using the old
SSH-in-container method for temperature monitoring so they can upgrade
to the secure proxy architecture.

**Detection Logic:**
- Checks if Pulse is running in a container (Docker or LXC)
- Checks if SSH keys exist in data directory (/etc/pulse/.ssh)
- Checks if pulse-sensor-proxy socket is NOT available
- Sets legacySSHDetected and recommendProxyUpgrade flags in health endpoint

**API Changes:**
- Added fields to HealthResponse:
  - legacySSHDetected: true when old method detected
  - recommendProxyUpgrade: true when upgrade is recommended
  - proxyInstallScriptAvailable: always true

**Use Case:**
Users who set up temperature monitoring before the proxy feature
won't know they should upgrade. This detection allows the frontend
to show a banner prompting them to re-run the setup script to
migrate to the secure proxy architecture.

**Frontend Integration (to be added):**
Frontend can poll /api/health and show a dismissible banner similar
to UpdateBanner when legacySSHDetected is true, with a button to
view the setup script.

Addresses #123
2025-10-13 14:40:03 +00:00
rcourtman
8d6ab4113d fix: Handle authorized_keys removal when all keys are managed
Codex caught an edge case in the authorized_keys removal logic:

**Problem:**
When authorized_keys contains ONLY pulse-managed keys, `grep -vF` returns
exit code 1 (no lines matched the inverse filter). The previous code only
executed the rewrite on exit 0, leaving managed keys in place when they
should have been removed.

**Solution:**
- Capture grep exit code explicitly
- Treat both exit 0 (lines remain) and exit 1 (all removed) as success
- Only treat exit codes > 1 as actual errors
- Properly handles the "remove all keys" scenario

This ensures complete removal works even when the file contains nothing
but Pulse-managed entries.

Addresses #123
2025-10-13 14:35:06 +00:00
rcourtman
e0d7cc7f58 fix: Address final Codex review findings
Fixed three remaining issues from Codex's final review:

**1. nullglob State Management (line 3124)**
- Replaced shopt -s/u nullglob with compgen -G check
- Prevents changing global shell behavior that could affect later globs
- More explicit and safer pattern matching

**2. authorized_keys Permission Preservation (lines 3116-3117)**
- Now uses chmod/chown --reference to preserve original ownership/perms
- Falls back gracefully if --reference not available
- Proper cleanup on mv failure to prevent temp file leaks
- Aborts atomically if operations fail, leaving original untouched

**3. Multi-Address Container Detection (lines 3750-3761)**
- Iterates over ALL IPs from hostname -I, not just first one
- Handles dual-stack (IPv4 + IPv6) and multi-IP containers
- Uses break 2 to exit both loops when match found
- Prevents false negatives when Pulse IP is not the first address

All operations now handle edge cases properly: non-root accounts,
dual-stack networking, empty directories, and partial failures.

Addresses #123
2025-10-13 14:32:38 +00:00
rcourtman
096801e96a fix: Improve setup script robustness and safety (Codex review)
Applied Codex's security and reliability recommendations:

**SSH Key Safety:**
- Added "pulse-managed-key" comment marker to all SSH keys
- Removal now targets only marked keys (prevents deleting operator keys)
- Uses atomic file replacement via mktemp for authorized_keys edits

**Idempotency Improvements:**
- LXC config glob now uses nullglob to handle empty directories
- pveum token removal handles missing users gracefully (|| printf '')
- All systemctl operations wrapped with || true for non-systemd hosts
- sed operations in loops protected with || true

**Container Detection:**
- Validates container is running before IP check (pct status)
- Confirms container exists with pct config before proceeding
- Uses printf '' instead of || true for command substitution
- Handles IPv6 and multi-IP scenarios more reliably

**Network Operations:**
- curl now uses --fail --show-error --silent --location
- Error messages visible to users instead of silenced
- Better diagnostics when download fails

**Migration Safety:**
- Verifies pulse-sensor-proxy service is active before key removal
- Fallback check for binary existence if systemd unavailable
- Preserves legacy SSH keys if proxy not confirmed healthy
- Clear messaging about deferred cleanup

All cleanup operations are now fully idempotent and safe for
repeated execution, even on partially-configured hosts.

Addresses #123
2025-10-13 14:19:17 +00:00
rcourtman
4fef52ab37 feat: Add install/remove menu to setup script
Added a main menu at the beginning of the PVE setup script that gives users three options:

[I]nstall - Continue with normal setup (default)
[R]emove All - Complete uninstall of all Pulse components
[C]ancel - Exit without changes

The removal option comprehensively cleans up:
- pulse-sensor-proxy service, binary, and systemd unit
- pulse-sensor-proxy system user and data directories
- All SSH keys from authorized_keys (legacy and forced-command variants)
- LXC bind mounts from all container configs
- Pulse monitoring API tokens, user, and custom roles

This addresses user request for a clean removal path for everything
Pulse has installed on the host, including legacy components from
previous versions.
2025-10-13 13:59:20 +00:00
rcourtman
6c7314b86b polish: Clean up setup script output for professional presentation
Made the setup and installation output more concise and reassuring for users. Less verbosity, clearer messaging.

**Setup script improvements:**
- Changed "Container Detection" → "Enhanced Security"
- Simplified prompts: "Enable secure proxy? [Y/n]"
- Cleaned up success messages: "✓ Secure proxy architecture enabled"
- Removed verbose status messages (node-by-node cleanup output)
- Only show essential information users need to see

**install-sensor-proxy.sh improvements:**
- Added --quiet flag to suppress verbose output
- In quiet mode, only shows: "✓ pulse-sensor-proxy installed and running"
- Full output still available when run manually
- Removed redundant "Installation complete!" banners
- Cleaner legacy key cleanup messaging

**Result:**
Users see a clean, professional installation flow that builds confidence. Technical details are hidden unless needed. Messages are clear and reassuring rather than verbose.
2025-10-13 13:51:17 +00:00
rcourtman
fd09af6eee feat: Auto-cleanup legacy SSH keys when migrating to proxy
When pulse-sensor-proxy is installed, automatically remove old SSH keys that were stored in the container for security.

Changes:

**install-sensor-proxy.sh:**
- Checks container for SSH private keys (id_rsa, id_ed25519, etc.)
- Removes any found keys from container
- Warns user that legacy keys were cleaned up
- Explains proxy now handles SSH

**Setup script (config_handlers.go):**
- After successful proxy install, removes old SSH keys from all cluster nodes
- Cleans up authorized_keys entries that match the old container-based key
- Keeps only proxy-managed keys (pulse-sensor-proxy comment)

This provides a clean migration path from the old direct-SSH method to the secure proxy architecture. Users upgrading from pre-v4.24 versions get automatic cleanup of insecure container-stored keys.
2025-10-13 13:47:19 +00:00
rcourtman
0044a18295 feat: Auto-install pulse-sensor-proxy during setup for containerized deployments
The setup script now automatically detects when Pulse is running in an LXC container and offers to install pulse-sensor-proxy on the host for enhanced security.

What happens:
1. After temperature monitoring is configured
2. Script detects Pulse IP and finds matching container
3. Prompts: "Install pulse-sensor-proxy for container X? [Y/n]"
4. Downloads and runs install-sensor-proxy.sh automatically
5. Falls back gracefully if proxy install fails

Benefits:
- One-command setup for users (no manual proxy installation)
- SSH keys stay on host (not in container)
- Containerized Pulse gets the secure architecture automatically
- Native installs unaffected (still use direct SSH)

This solves the UX problem where users had to manually run install-sensor-proxy.sh as a separate step.
2025-10-13 13:41:01 +00:00
rcourtman
b952444837 refactor: Rename pulse-temp-proxy to pulse-sensor-proxy
The name "temp-proxy" implied a temporary or incomplete implementation. The new name better reflects its purpose as a secure sensor data bridge for containerized Pulse deployments.

Changes:
- Renamed cmd/pulse-temp-proxy/ to cmd/pulse-sensor-proxy/
- Updated all path constants and binary references
- Renamed environment variables: PULSE_TEMP_PROXY_* to PULSE_SENSOR_PROXY_*
- Updated systemd service and service account name
- Updated installation, rotation, and build scripts
- Renamed hardening documentation
- Maintained backward compatibility for key removal during upgrades
2025-10-13 13:17:05 +00:00
rcourtman
dd468a9e26 docs: Update temperature monitoring security notice for proxy architecture
Replaced outdated security warnings with accurate information about
the pulse-temp-proxy architecture:

- Removed scary 'legacy feature' and 'compromised container' warnings
- Explains secure proxy architecture for containerized deployments
- Notes that SSH keys are stored on Proxmox host (not in container)
- Clarifies container compromise does not expose credentials
- Includes information for both containerized and native installs
- More factual and less alarmist tone

The old message implied temperature monitoring was insecure for
containers, which is no longer true with pulse-temp-proxy.

Related to #528
2025-10-12 22:11:12 +00:00
rcourtman
5c1dec14e1 fix: Remove duplicate sshPublicKey argument in PVE setup script
The setup script generator was passing sshPublicKey twice but only
using it once, causing a Go fmt.Sprintf formatting error that leaked
into the generated bash script as '%!(EXTRA string=...)'.

This resulted in bash syntax errors when running the setup script.

Fixes #528
2025-10-12 22:01:16 +00:00
rcourtman
c8e3c93516 fix: Add security gates for containerized temperature monitoring
Addresses #528

- Added opt-in confirmation prompt to setup script with security notice
- Added runtime warning when containerized Pulse uses SSH temperature monitoring
- Documented security considerations and hardening recommendations
- Users must explicitly confirm understanding before enabling in containers
2025-10-12 21:01:25 +00:00
rcourtman
bebe5efc3d fix: Setup script now verifies temperature SSH connectivity from Pulse
When Pulse runs in a container (LXC/Docker), the setup script would claim
temperature monitoring was enabled on cluster nodes, but Pulse couldn't
actually SSH to them. The script ran on the Proxmox host which could SSH
fine, but didn't verify connectivity from Pulse itself.

Changes:
- Added /api/system/verify-temperature-ssh endpoint that tests SSH from Pulse
- Setup script now calls this endpoint after configuring cluster nodes
- Detects when Pulse is containerized and provides ProxyJump config instructions
- Shows clear success/failure status for each node

Addresses #528
2025-10-12 20:36:48 +00:00
rcourtman
a1ba3c00c1 fix: Prevent caching of Docker agent install script and binaries
Add no-cache headers to both the install script and agent binary download endpoints to prevent browsers and curl from serving stale cached versions. This ensures users always get the latest install script with URL normalization fixes for trailing slash issues.

Fixes #528
2025-10-12 18:04:57 +00:00
rcourtman
c18cf3d4b8 Fix node config API to preserve fields on partial updates
The PUT /api/config/nodes/{id} endpoint was corrupting node configurations
when making partial updates (e.g., updating just monitorPhysicalDisks):

- Authentication fields (tokenName, tokenValue, password) were being cleared
  when updating unrelated settings
- Name field was being blanked when not included in request
- Monitor* boolean fields were defaulting to false

Changes:
- Only update name field if explicitly provided in request
- Only switch authentication method when auth fields are explicitly provided
- Preserve existing auth credentials on non-auth updates
- Applied fix to all node types (PVE, PBS, PMG)

Also enables physical disk monitoring by default (opt-out instead of opt-in)
and preserves disk data between polling intervals.
2025-10-12 17:50:55 +00:00
rcourtman
18a88cb4cc Improve NVMe temperature handling 2025-10-12 16:06:55 +00:00
rcourtman
a74baed121 feat: capture Proxmox memory snapshots in diagnostics 2025-10-12 10:25:43 +00:00
rcourtman
f46ff1792b Fix settings security tab navigation 2025-10-11 23:29:47 +00:00