Commit graph

184 commits

Author SHA1 Message Date
rcourtman
dd1d222ad0 Improve bootstrap token UX for easier discovery
The bootstrap token security requirement was added proactively but
lacked discoverability, causing user friction during first-run setup.
These improvements make the token easier to find while maintaining
the security benefit.

Improvements:
- Display bootstrap token prominently in startup logs with ASCII box
  (previously: single line log message)
- Add `pulse bootstrap-token` CLI command to display token on demand
  (Docker: docker exec <container> /app/pulse bootstrap-token)
- Improve error messages in quick-setup API to show exact commands
  for retrieving token when missing or invalid
- Error messages now include both Docker and bare metal examples

User experience improvements:
- Token visible in `docker logs` output immediately
- Clear instructions printed with token
- Helpful error messages if token is wrong/missing
- CLI helper for operators who need to retrieve token later

Security unchanged:
- Bootstrap token still required for first-run setup
- Token still auto-deleted after successful setup
- No bypass mechanism added

Related to discussion about bootstrap token UX friction.
2025-11-06 17:29:49 +00:00
rcourtman
f9ca2c0e68 Add hashpw utility for generating password hashes
Simple CLI utility to generate bcrypt password hashes for admin users.

Usage: hashpw <password>

This utility helps administrators generate properly hashed passwords
for use in configuration files or manual user setup.
2025-11-06 16:46:56 +00:00
rcourtman
20099549c6 Add comprehensive release validation to prevent missing artifacts
Adds automated validation script to prevent the pattern of patch
releases caused by missing files/artifacts.

scripts/validate-release.sh validates all 40+ artifacts including:
- Docker image scripts (8 install/uninstall scripts)
- Docker image binaries (17 across all platforms)
- Release tarballs (5 including universal and macOS)
- Standalone binaries (12+)
- Checksums for all distributable assets
- Version embedding in every binary type
- Tarball contents (binaries + scripts + VERSION)
- Binary architectures and file types

The script catches 100% of issues from the last 3 patch releases
(missing scripts, missing install.sh, missing binaries, broken
version embedding).

Updated RELEASE_CHECKLIST.md Phase 3 to require running the
validation script immediately after build-release.sh and before
proceeding to Docker build/publish phases.

Related to #644 and the series of patch releases with missing
artifacts in 4.26.x.
2025-11-06 16:33:49 +00:00
rcourtman
5b89b2371a Make pulse-sensor-proxy resilient to read-only filesystems
Related to #637

The sensor-proxy was failing to start on systems with read-only filesystems
because audit logging required a writable /var/log/pulse/sensor-proxy directory.

Changes:
- Modified newAuditLogger() to automatically fall back to stderr (systemd journal)
  if the audit log file cannot be opened
- Removed error return from newAuditLogger() since it now always succeeds
- Added warning logs when fallback mode is used to alert operators
- Updated tests to handle the new signature
- Added better debugging to audit log tests

This allows the sensor-proxy to run on:
- Immutable/read-only root filesystems
- Hardened systems with restricted /var mounts
- Containerized environments with limited write access

Audit events are still captured via systemd journal when file logging is
unavailable, maintaining the security audit trail.
2025-11-06 00:18:51 +00:00
rcourtman
930ad20921 Add configurable log level for pulse-sensor-proxy
Users can now control logging verbosity through:
- YAML config file: log_level: "debug|info|warn|error"
- Environment variable: PULSE_SENSOR_PROXY_LOG_LEVEL

Default log level is set to "info" instead of debug, reducing verbose output.
Supported levels: trace, debug, info, warn, error, fatal, panic, disabled

Related to #629
2025-11-05 19:48:00 +00:00
rcourtman
3194b10398 Improve Alpine Linux support and agent startup validation
Related to #612

This commit addresses the Alpine Linux installation issues reported where:
1. The OpenRC init system was not properly detected
2. Manual startup instructions were unclear and used placeholder values
3. The agent didn't validate configuration properly at startup

Changes:

Install Script (install-docker-agent.sh):
- Improved OpenRC detection to check for rc-service and rc-update commands
  instead of looking for openrc-run binary in specific paths
- Added specific Alpine Linux detection via /etc/alpine-release and /etc/os-release
- Enhanced manual startup instructions to show actual values instead of placeholders
- Added clearer warnings and guidance when no init system is detected
- Included comprehensive startup command with all required parameters

Agent Startup Validation (pulse-docker-agent):
- Added validation to detect unexpected command-line arguments
- Added helpful note about double-dash flag requirements (--token vs -token)
- Improved error messages to include example usage patterns
- Added warning when defaulting to localhost without explicit URL configuration
- Provide both command-line and environment variable examples in error messages

These improvements ensure that:
- Alpine Linux installations will properly detect and configure OpenRC services
- Users who must start the agent manually get clear, copy-pasteable commands
- Configuration errors are caught early with actionable error messages
- Common mistakes (like missing --url) are clearly explained
2025-11-05 19:01:09 +00:00
rcourtman
fdf0977be2 Add host agent multi-platform binary distribution and improve host details UI
- Build host agent binaries for all platforms (linux/darwin/windows, amd64/arm64/armv7) in Docker
- Add Makefile target for building agent binaries locally
- Add startup validation to check for missing agent binaries
- Improve download endpoint error messages with troubleshooting guidance
- Enhance host details drawer layout with better organization and visual hierarchy
- Update base images to rolling versions (node:20-alpine, golang:1.24-alpine, alpine:3.20)
2025-11-05 17:38:17 +00:00
rcourtman
6eb1a10d9b Refactor: Code cleanup and localStorage consolidation
This commit includes comprehensive codebase cleanup and refactoring:

## Code Cleanup
- Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate)
- Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo)
- Clean up commented-out code blocks across multiple files
- Remove unused TypeScript exports (helpTextClass, private tag color helpers)
- Delete obsolete test files and components

## localStorage Consolidation
- Centralize all storage keys into STORAGE_KEYS constant
- Update 5 files to use centralized keys:
  * utils/apiClient.ts (AUTH, LEGACY_TOKEN)
  * components/Dashboard/Dashboard.tsx (GUEST_METADATA)
  * components/Docker/DockerHosts.tsx (DOCKER_METADATA)
  * App.tsx (PLATFORMS_SEEN)
  * stores/updates.ts (UPDATES)
- Benefits: Single source of truth, prevents typos, better maintainability

## Previous Work Committed
- Docker monitoring improvements and disk metrics
- Security enhancements and setup fixes
- API refactoring and cleanup
- Documentation updates
- Build system improvements

## Testing
- All frontend tests pass (29 tests)
- All Go tests pass (15 packages)
- Production build successful
- Zero breaking changes

Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)
2025-11-04 21:50:46 +00:00
rcourtman
32392d1212 Add disk metrics, block I/O, and mount details to Docker monitoring
Extends Docker container monitoring with comprehensive disk and storage information:
- Writable layer size and root filesystem usage displayed in new Disk column
- Block I/O statistics (read/write bytes totals) shown in container drawer
- Mount metadata including type, source, destination, mode, and driver details
- Configurable via --collect-disk flag (enabled by default, can be disabled for large fleets)

Also fixes config watcher to consistently use production auth config path instead of following PULSE_DATA_DIR when in mock mode.
2025-10-29 12:05:36 +00:00
rcourtman
68ce8e7520 feat: finalize swarm service monitoring (#598) 2025-10-26 09:35:49 +00:00
rcourtman
8e83eaf823 Add container state filtering to Docker agent 2025-10-25 21:40:59 +00:00
rcourtman
6333a445e9 feat: add native Windows service support and expandable host details
Windows Host Agent Enhancements:
- Implement native Windows service support using golang.org/x/sys/windows/svc
- Add Windows Event Log integration for troubleshooting
- Create professional PowerShell installation/uninstallation scripts
- Add process termination and retry logic to handle Windows file locking
- Register uninstall endpoint at /uninstall-host-agent.ps1

Host Agent UI Improvements:
- Add expandable drawer to Hosts page (click row to view details)
- Display system info, network interfaces, disks, and temperatures in cards
- Replace status badges with subtle colored indicators
- Remove redundant master-detail sidebar layout
- Add search filtering for hosts

Technical Details:
- service_windows.go: Windows service lifecycle management with graceful shutdown
- service_stub.go: Cross-platform compatibility for non-Windows builds
- install-host-agent.ps1: Full Windows installation with validation
- uninstall-host-agent.ps1: Clean removal with process termination and retries
- HostsOverview.tsx: Expandable row pattern matching Docker/Proxmox pages

Files Added:
- cmd/pulse-host-agent/service_windows.go
- cmd/pulse-host-agent/service_stub.go
- scripts/install-host-agent.ps1
- scripts/uninstall-host-agent.ps1
- frontend-modern/src/components/Hosts/HostsOverview.tsx
- frontend-modern/src/components/Hosts/HostsFilter.tsx

The Windows service now starts reliably with automatic restart on failure,
and the uninstall script handles file locking gracefully without requiring reboots.
2025-10-23 22:11:56 +00:00
rcourtman
5c54685f04 Add API token scopes and standalone host agent
Introduces granular permission scopes for API tokens (docker:report, docker:manage, host-agent:report, monitoring:read/write, settings:read/write) allowing tokens to be restricted to minimum required access. Legacy tokens default to full access until scopes are explicitly configured.

Adds standalone host agent for monitoring Linux, macOS, and Windows servers outside Proxmox/Docker estates. New Servers workspace in UI displays uptime, OS metadata, and capacity metrics from enrolled agents.

Includes comprehensive token management UI overhaul with scope presets, inline editing, and visual scope indicators.
2025-10-23 11:40:31 +00:00
rcourtman
77108abc65 Propagate config updates to settings nodes (#588) 2025-10-22 13:45:13 +00:00
rcourtman
35adcf104f docs: add guidance for large deployments (30+ nodes) in rate limit config
Update config.example.yaml with:
- Recommendations for very large deployments (30+ nodes)
- Formula for calculating optimal rate limits based on node count
- Example calculation: 30 nodes with 10s polling = 300ms interval
- Security note about minimum safe intervals

This helps admins properly configure the proxy for enterprise
deployments with dozens of nodes.
2025-10-21 11:27:13 +00:00
rcourtman
44d5f91e92 feat: make pulse-sensor-proxy rate limits configurable
Add support for configuring rate limits via config.yaml to allow
administrators to tune the proxy for different deployment sizes.

Changes:
- Add RateLimitConfig struct to config.go with per_peer_interval_ms and per_peer_burst
- Update newRateLimiter() to accept optional RateLimitConfig parameter
- Load rate limit config from YAML and apply overrides to defaults
- Update tests to pass nil for default behavior
- Add comprehensive config.example.yaml with documentation

Configuration examples:
- Small (1-3 nodes): 1000ms interval, burst 5 (default)
- Medium (4-10 nodes): 500ms interval, burst 10
- Large (10+ nodes): 250ms interval, burst 20

Defaults remain conservative (1 req/sec, burst 5) to support most
deployments while allowing customization for larger environments.

Related: #46b8b8d08 (rate limit fix for multi-node support)
2025-10-21 11:25:21 +00:00
rcourtman
d856e75018 fix: increase pulse-sensor-proxy rate limits for multi-node support
- Increase rate limit from 1 req/5sec to 1 req/sec (60/min)
- Increase burst from 2 to 5 requests
- Fixes temperature collection failures when monitoring 3+ nodes
- All requests from containerized Pulse use same UID, causing rate limiting
- New limits support 5-10 node deployments comfortably

Resolves issue where adding standalone nodes broke temperature monitoring
for all nodes due to aggressive rate limiting.
2025-10-21 11:21:12 +00:00
rcourtman
73fb9d986f feat: add PBS/PMG stubs to test harness and implement HTTP config fetch
Resolves two remaining TODOs from codebase audit.

## 1. PBS/PMG Test Harness Stubs

**Location:** internal/monitoring/harness_integration.go:149-151

**Changes:**
- Added PBS client stub registration: `monitor.pbsClients[inst.Name] = &pbs.Client{}`
- Added PMG client stub registration: `monitor.pmgClients[inst.Name] = &pmg.Client{}`
- Added imports for pkg/pbs and pkg/pmg

**Purpose:**
Enables integration test scenarios to include PBS and PMG instance types
alongside existing PVE support. Stubs allow scheduler to register and
execute tasks for these instance types during integration testing.

**Testing:**
 TestAdaptiveSchedulerIntegration passes (55.5s)
 Integration test harness now supports all three instance types

## 2. HTTP Config URL Fetch

**Location:** cmd/pulse/config.go:226-261

**Problem:**
`PULSE_INIT_CONFIG_URL` was recognized but not implemented, returning
"URL import not yet implemented" error.

**Implementation:**
- URL validation (http/https schemes only)
- HTTP client with 15 second timeout
- Status code validation (2xx required)
- Empty response detection
- Base64 decoding with fallback to raw data
- Matches existing env-var behavior for `PULSE_INIT_CONFIG_DATA`

**Security:**
- Both HTTP and HTTPS supported (HTTPS recommended for production)
- URL scheme validation prevents file:// or other protocols
- Timeout prevents hanging on unresponsive servers

**Usage:**
```bash
export PULSE_INIT_CONFIG_URL="https://config-server/encrypted-config"
export PULSE_INIT_CONFIG_PASSPHRASE="secret"
pulse config auto-import
```

**Testing:**
 Code compiles cleanly
 Follows same pattern as existing PULSE_INIT_CONFIG_DATA handling

## Impact

- Completes integration test infrastructure for all instance types
- Enables automated config distribution via HTTP(S) for container deployments
- Removes last TODOs from codebase (no TODO/FIXME remaining in Go files)
2025-10-20 16:05:45 +00:00
rcourtman
7d422d2909 feat: add professional logging with runtime configuration and performance optimization
Implements structured logging package with LOG_LEVEL/LOG_FORMAT env support, debug level guards for hot paths, enriched error messages with actionable context, and stack trace capture for production debugging. Improves observability and reduces log overhead in high-frequency polling loops.
2025-10-20 15:13:38 +00:00
rcourtman
57429900a6 feat: add adaptive polling scheduler infrastructure (Phase 2 Tasks 1-3)
Implements adaptive scheduling foundation for Phase 2:
- Poll cycle metrics: duration, staleness, queue depth, in-flight counters
- Adaptive scheduler with pluggable staleness/interval/enqueue interfaces
- Config support: ADAPTIVE_POLLING_ENABLED flag + min/max/base intervals
- Feature flag defaults to disabled for safe rollout
- Scheduler wiring into Monitor with conditional instantiation

Tasks 1-3 of 10 complete. Ready for staleness tracker implementation.
2025-10-20 15:13:37 +00:00
rcourtman
524f42cc28 security: complete Phase 1 sensor proxy hardening
Implements comprehensive security hardening for pulse-sensor-proxy:
- Privilege drop from root to unprivileged user (UID 995)
- Hash-chained tamper-evident audit logging with remote forwarding
- Per-UID rate limiting (0.2 QPS, burst 2) with concurrency caps
- Enhanced command validation with 10+ attack pattern tests
- Fuzz testing (7M+ executions, 0 crashes)
- SSH hardening, AppArmor/seccomp profiles, operational runbooks

All 27 Phase 1 tasks complete. Ready for production deployment.
2025-10-20 15:13:37 +00:00
rcourtman
29f4879cd4 test: add comprehensive security tests and documentation
Implements all remaining Codex recommendations before launch:

1. Privileged Methods Tests:
   - TestPrivilegedMethodsCompleteness ensures all host-side RPCs are protected
   - Will fail if new privileged RPC is added without authorization
   - Verifies read-only methods are NOT in privilegedMethods

2. ID-Mapped Root Detection Tests:
   - TestIDMappedRootDetection covers all boundary conditions
   - Tests UID/GID range detection (both must be in range)
   - Tests multiple ID ranges, edge cases, disabled mode
   - 100% coverage of container identification logic

3. Authorization Tests:
   - TestPrivilegedMethodsBlocked verifies containers can't call privileged RPCs
   - TestIDMappedRootDisabled ensures feature can be disabled
   - Tests both container and host credentials

4. Comprehensive Security Documentation (23 KB):
   - Architecture overview with diagrams
   - Complete authentication & authorization flow
   - Rate limiting details (already implemented: 20/min per peer)
   - SSH security model and forced commands
   - Container isolation mechanisms
   - Monitoring & alerting recommendations
   - Development mode documentation (PULSE_DEV_ALLOW_CONTAINER_SSH)
   - Troubleshooting guide with common issues
   - Incident response procedures

Rate Limiting Status:
- Already implemented in throttle.go (20 req/min, burst 10, max 10 concurrent)
- Per-peer rate limiting at line 328 in main.go
- Per-node concurrency control at line 825 in main.go
- Exceeds Codex's requirements

All tests pass. Documentation covers all security aspects.

Addresses final Codex recommendations for production readiness.
2025-10-19 16:47:13 +00:00
rcourtman
1519390f08 security: enhance logging for denied privileged method calls
Improved security audit trail for attempted container privilege escalation:

- Added detailed logging when containers attempt privileged methods
- Logs UID, GID, PID, correlation ID, and method name
- Marked with "SECURITY:" prefix for easy filtering/alerting
- Helps operators detect and investigate compromise attempts

Example log output:
  SECURITY: Container attempted to call privileged method - access denied
  method=ensure_cluster_keys uid=101000 gid=101000 pid=12345

Addresses Codex recommendation for comprehensive logging of denied
privileged RPCs to enable monitoring and alerting on attempted abuse.
2025-10-19 16:40:42 +00:00
rcourtman
026b9c5b77 security: add method-level authorization for privileged RPC methods
RELEASE BLOCKER FIX - Prevents containers from triggering host-level operations.

Added host-only method restrictions:
- RPCEnsureClusterKeys (SSH key distribution)
- RPCRegisterNodes (node registration)
- RPCRequestCleanup (cleanup operations)

Implementation:
- New privilegedMethods map defines host-only methods
- Request handler checks if method is privileged
- If privileged AND caller is from ID-mapped UID range (container), reject
- Host processes (real root, configured UIDs) can still call privileged methods
- Containers can still call get_temperature and get_status

Security impact:
- Prevents compromised containers from:
  • Triggering unwanted SSH key distribution to cluster nodes
  • Learning about cluster topology via forced registration
  • DOS attacks by repeatedly calling key distribution
  • Other host-level privileged operations

Without this fix, any container with root could call these methods after
authentication, undermining the security isolation between container and host.

Addresses high-severity finding #2 from security audit.
2025-10-19 16:31:50 +00:00
rcourtman
3a6a4fd362 security: fix SSH command injection vulnerabilities in pulse-sensor-proxy
CRITICAL security fixes for pulse-sensor-proxy:

1. Strengthened hostname validation regex:
   - Now requires hostnames to start with alphanumeric character
   - Prevents SSH option injection via hostnames starting with '-'
   - Pattern: ^[a-zA-Z0-9][a-zA-Z0-9._-]{0,63}$ (1-64 chars total)
   - Added IPv4 and IPv6 validation regexes for future use

2. Added validation to vulnerable V1 RPC handlers:
   - handleGetTemperature: Now validates node parameter before SSH
   - handleRegisterNodes: Now validates discovered cluster nodes
   - Previously these handlers passed unsanitized input directly to SSH

3. Defense in depth:
   - V2 handlers already had validation (now using improved regex)
   - Multiple layers of protection against malicious node identifiers
   - Validation prevents container from passing SSH options as hostnames

Without these fixes, a compromised container could potentially inject SSH
options by providing malicious node names, though the 'root@' prefix
provided some mitigation.

Addresses high-severity finding from security audit.
2025-10-19 16:28:38 +00:00
rcourtman
123e0f04ca feat: add comprehensive node cleanup system
Implements automated cleanup workflow when nodes are deleted from Pulse, removing all monitoring footprint from the host. Changes include a new RPC handler in the sensor proxy for cleanup requests, enhanced node deletion modal with detailed cleanup explanations, and improved SSH key management with proper tagging for atomic updates.
2025-10-17 18:53:45 +00:00
rcourtman
f141f7db33 feat: enhance sensor proxy with improved cluster discovery and SSH management
Improvements to pulse-sensor-proxy:
- Fix cluster discovery to use pvecm status for IP addresses instead of node names
- Add standalone node support for non-clustered Proxmox hosts
- Enhanced SSH key push with detailed logging, success/failure tracking, and error reporting
- Add --pulse-server flag to installer for custom Pulse URLs
- Configure www-data group membership for Proxmox IPC access

UI and API cleanup:
- Remove unused "Ensure cluster keys" button from Settings
- Remove /api/diagnostics/temperature-proxy/ensure-cluster-keys endpoint
- Remove EnsureClusterKeys method from tempproxy client

The setup script already handles SSH key distribution during initial configuration,
making the manual refresh button redundant.
2025-10-17 11:43:26 +00:00
rcourtman
91fecacfef feat: add docker agent command handling 2025-10-15 19:27:19 +00:00
rcourtman
e4c3b06f14 Automate sensor proxy container mount and auth 2025-10-14 12:41:48 +00:00
rcourtman
b952444837 refactor: Rename pulse-temp-proxy to pulse-sensor-proxy
The name "temp-proxy" implied a temporary or incomplete implementation. The new name better reflects its purpose as a secure sensor data bridge for containerized Pulse deployments.

Changes:
- Renamed cmd/pulse-temp-proxy/ to cmd/pulse-sensor-proxy/
- Updated all path constants and binary references
- Renamed environment variables: PULSE_TEMP_PROXY_* to PULSE_SENSOR_PROXY_*
- Updated systemd service and service account name
- Updated installation, rotation, and build scripts
- Renamed hardening documentation
- Maintained backward compatibility for key removal during upgrades
2025-10-13 13:17:05 +00:00
rcourtman
c7bb76c12e fix: Switch proxy socket to directory-level bind mount for stability
Fixes LXC bind mount issue where socket-level mounts break when the
socket is recreated by systemd. Following Codex's recommendation to
bind mount the directory instead of the file.

Changes:
- Socket path: /run/pulse-temp-proxy/pulse-temp-proxy.sock
- Systemd: RuntimeDirectory=pulse-temp-proxy (auto-creates /run/pulse-temp-proxy)
- Systemd: RuntimeDirectoryMode=0770 for group access
- LXC mount: Bind entire /run/pulse-temp-proxy directory
- Install script: Upgrades old socket-level mounts to directory-level
- Install script: Detects and handles bind mount changes

This survives socket recreations and container restarts. The directory
mount persists even when systemd unlinks/recreates the socket file.

Related to #528
2025-10-12 22:33:53 +00:00
rcourtman
6d4694f019 security: Add SO_PEERCRED authentication to temperature proxy
Addresses security concern raised in code review:
- Socket permissions changed from 0666 to 0660
- Added SO_PEERCRED verification to authenticate connecting processes
- Only allows root (UID 0) or proxy's own user
- Prevents unauthorized processes from triggering SSH key rollout
- Documented passwordless root SSH requirement for clusters

This prevents any process on the host or in other containers from
accessing the proxy RPC endpoints.
2025-10-12 21:42:22 +00:00
rcourtman
e7bc338891 feat: Implement secure temperature proxy for containerized deployments
Addresses #528

Introduces pulse-temp-proxy architecture to eliminate SSH key exposure in containers:

**Architecture:**
- pulse-temp-proxy runs on Proxmox host (outside LXC/Docker)
- SSH keys stored on host filesystem (/var/lib/pulse-temp-proxy/ssh/)
- Pulse communicates via unix socket (bind-mounted into container)
- Proxy handles cluster discovery, key rollout, and temperature fetching

**Components:**
- cmd/pulse-temp-proxy: Standalone Go binary with unix socket RPC server
- internal/tempproxy: Client library for Pulse backend
- scripts/install-temp-proxy.sh: Idempotent installer for existing deployments
- scripts/pulse-temp-proxy.service: Systemd service for proxy

**Integration:**
- Pulse automatically detects and uses proxy when socket exists
- Falls back to direct SSH for native installations
- Installer automatically configures proxy for new LXC deployments
- Existing LXC users can upgrade by running install-temp-proxy.sh

**Security improvements:**
- Container compromise no longer exposes SSH keys
- SSH keys never enter container filesystem
- Maintains forced command restrictions
- Transparent to users - no workflow changes

**Documentation:**
- Updated TEMPERATURE_MONITORING.md with new architecture
- Added verification steps and upgrade instructions
- Preserved legacy documentation for native installs
2025-10-12 21:35:35 +00:00
rcourtman
f46ff1792b Fix settings security tab navigation 2025-10-11 23:29:47 +00:00