Commit graph

190 commits

Author SHA1 Message Date
rcourtman
8f05fc0a57 Improve backup-age alerts to show VM/CT names in multi-cluster setups (related to #668)
This change fixes backup-age alert notifications to display VM/CT names
instead of just "VMID XXX" in multi-cluster environments where backups
are stored on PBS.

Changes:
- Store all guests per VMID (not just first match) to handle VMID collisions across clusters
- Persist last-known guest names/types in metadata store for deleted VMs
- Enrich backup correlation with persisted metadata when live inventory is empty
- Update CheckBackups to handle multiple VMID matches intelligently

The fix addresses two scenarios:
1. Multiple PVE clusters with same VMID backing up to one PBS
2. VMs deleted from Proxmox but backups still exist on PBS

Backup-age alerts will now show proper VM/CT names when:
- A unique guest exists with that VMID (live or persisted)
- Multiple guests share a VMID (uses first match, consistent with current behavior)

When truly ambiguous (multiple live VMs, same VMID, no way to determine origin),
the alert gracefully falls back to showing "VMID XXX".
2025-11-08 18:24:04 +00:00
rcourtman
3ad35976b2 Clarify Docker agent cycling troubleshooting for cloned VMs/LXCs (related to #648)
Enhanced the "Docker hosts cycling" troubleshooting entry to explicitly
call out VM/LXC cloning as a cause of identical agent IDs. Added specific
remediation steps for regenerating machine IDs on cloned systems.

This addresses the resolution path discovered in discussion #648 where a
user cloned a Proxmox LXC and encountered cycling behavior even with
separate API tokens because the agent IDs were duplicated.
2025-11-07 22:59:19 +00:00
rcourtman
48fabdd827 Improve Docker temperature monitoring documentation for clarity (related to #600)
Updated the Quick Start for Docker section in TEMPERATURE_MONITORING.md to be
more user-friendly and address common setup issues:

- Added clear explanation of why the proxy is needed (containers can't access hardware)
- Provided concrete IP example instead of placeholder
- Showed full docker-compose.yml context with proper YAML structure
- Added sudo to commands where needed
- Updated docker-compose commands to v2 syntax with note about v1
- Expanded verification steps with clearer success indicators
- Added reminder to check container name in verification commands

These improvements should help users who encounter blank temperature displays
due to missing proxy installation or bind mount configuration.
2025-11-07 15:09:42 +00:00
rcourtman
431769024f Fix P1: Config Persistence transaction field synchronization
**Problem**: writeConfigFileLocked() accessed c.tx field without synchronization
- Function reads c.tx to check if transaction is active (line 109)
- c.tx modified by begin/endTransaction under lock, but read without lock
- Race condition: c.tx could change between check and use

**Impact**:
- Inconsistent transaction handling
- File could be written directly when it should be staged
- Or staged when it should be written directly
- Data corruption risk during config imports

**Fix** (lines 108-128):
- Added documentation that caller MUST hold c.mu lock
- Read c.tx into local variable tx while lock is held
- Use local copy for transaction check
- Safe because all callers hold c.mu when calling writeConfigFileLocked
- Transaction field only modified while holding c.mu in begin/endTransaction

This maintains the existing contract (callers hold lock) while making the transaction read safe and explicit.
2025-11-07 10:00:31 +00:00
rcourtman
6ca4d9b750 Fix P1/P2 infrastructure issues: panic recovery and optimizations
This commit addresses 4 P1 important issues and 1 P2 optimization in infrastructure components:

**P1-1: Missing Panic Recovery in Discovery Service** (service.go:172-195, 499-542)
- **Problem**: No panic recovery in Start(), ForceRefresh(), SetSubnet() goroutines
- **Impact**: Silent service death if scan panics, broken discovery with no monitoring
- **Fix**:
  - Wrapped initial scan goroutine with defer/recover (lines 172-182)
  - Wrapped scanLoop goroutine with defer/recover (lines 185-195)
  - Wrapped ForceRefresh scan with defer/recover (lines 499-509)
  - Wrapped SetSubnet scan with defer/recover (lines 532-542)
  - All log panics with stack traces for debugging

**P1-2: Missing Panic Recovery in Config Watcher Callback** (watcher.go:546-556)
- **Problem**: User-provided onMockReload callback could panic and crash watcher
- **Impact**: Panicking callback kills watcher goroutine, no config updates
- **Fix**: Wrapped callback invocation with defer/recover and stack trace logging

**P1-3: Session Store Stop() Using Send Instead of Close** (session_store.go:16-84)
- **Problem**: Stop() used channel send which blocks if nobody reads
- **Impact**: Stop() hangs if backgroundWorker already exited
- **Fix**:
  - Added sync.Once field stopOnce (line 22)
  - Changed Stop() to use close() within stopOnce.Do() (lines 80-84)
  - Prevents double-close panic and ensures all readers are signaled

**P2-1: Backup Cleanup Inefficient O(n²) Sort** (persistence.go:1424-1427)
- **Problem**: Bubble sort used to sort backups by modification time
- **Impact**: Inefficient for large backup counts (>100 files)
- **Fix**:
  - Replaced bubble sort with sort.Slice() using O(n log n) algorithm
  - Added "sort" import (line 9)
  - Maintains same oldest-first ordering for deletion logic

All fixes add defensive programming without changing external behavior. Panic recovery ensures services continue operating even with bugs, while optimization reduces cleanup time for backup-heavy environments.
2025-11-07 09:55:22 +00:00
rcourtman
ba6d934204 Fix critical P0 infrastructure concurrency issues
This commit addresses 3 critical P0 race conditions and resource leaks in core infrastructure:

**P0-1: Discovery Service Goroutine Leak** (service.go:468, 488)
- **Problem**: ForceRefresh() and SetSubnet() spawned unbounded goroutines without checking if scan already in progress
- **Impact**: Rapid API calls create goroutine explosion, resource exhaustion
- **Fix**:
  - ForceRefresh: Check isScanning before spawning goroutine (lines 470-476)
  - SetSubnet: Check isScanning, defer scan if already running (lines 491-504)
  - Both now log when skipping to aid debugging

**P0-2: Config Persistence Unlock/Relock Race** (persistence.go:1177-1206)
- **Problem**: LoadNodesConfig() unlocked RLock, called SaveNodesConfig (acquires Lock), then relocked
- **Impact**: Another goroutine could modify config between unlock/relock, causing migrated data loss
- **Fix**:
  - Copy instance slices while holding RLock to ensure consistency (lines 1189-1194)
  - Release lock, save copies, then return without relocking (lines 1196-1205)
  - Prevents TOCTOU vulnerability where migrations could be overwritten

**P0-3: Config Watcher Channel Close Race** (watcher.go:19-178)
- **Problem**: Stop() used select-check-close pattern vulnerable to concurrent calls
- **Impact**: Multiple Stop() calls panic on double-close
- **Fix**:
  - Added sync.Once field stopOnce to ConfigWatcher struct (line 26)
  - Changed Stop() to use stopOnce.Do() ensuring single execution (lines 175-178)
  - Removed racy select-based guard

All fixes maintain backwards compatibility and add defensive logging for operational visibility.
2025-11-07 09:49:55 +00:00
rcourtman
9257071ca1 Add encryption status to notification health endpoint (P2)
Backend:
- Add IsEncryptionEnabled() method to ConfigPersistence
- Include encryption status in /api/notifications/health response
- Allows frontend to warn when credentials are stored in plaintext

Frontend:
- Update NotificationHealth type to include encryption.enabled field
- Frontend can now display warnings when encryption is disabled

This addresses the P2 requirement for encryption visibility, allowing
operators to know when notification credentials are not encrypted at rest.
2025-11-07 08:36:55 +00:00
rcourtman
20854256c3 Fix VM migration issue where custom alert thresholds are lost
Resolves #641

## Problem
When a VM migrates between Proxmox nodes, Pulse was treating it as a new
resource and discarding custom alert threshold overrides. This occurred
because guest IDs included the node name (e.g., `instance-node-VMID`),
causing the ID to change when the VM moved to a different node.

Users reported that after migrating a VM, previously disabled alerts
(e.g., memory threshold set to 0) would resume firing.

## Root Cause
Guest IDs were constructed as:
- Standalone: `node-VMID`
- Cluster: `instance-node-VMID`

When a VM migrated from node1 to node2, the ID changed from
`instance-node1-100` to `instance-node2-100`, causing:
- Alert threshold overrides to be orphaned (keyed by old ID)
- Guest metadata (custom URLs, descriptions) to be orphaned
- Active alerts to reference the wrong resource ID

## Solution
Changed guest ID format to be stable across node migrations:
- New format: `instance-VMID` (for both standalone and cluster)
- Retains uniqueness across instances while being node-independent
- Allows VMs to migrate freely without losing configuration

## Implementation

### Backend Changes
1. **Guest ID Construction** (`monitor_polling.go`):
   - Simplified to always use `instance-VMID` format
   - Removed node from the ID construction logic

2. **Alert Override Migration** (`alerts.go`):
   - Added lazy migration in `getGuestThresholds()`
   - Detects legacy ID formats and migrates to new format
   - Preserves user configurations automatically

3. **Guest Metadata Migration** (`guest_metadata.go`):
   - Added `GetWithLegacyMigration()` helper method
   - Called during VM/container polling to migrate metadata
   - Preserves custom URLs and descriptions

4. **Active Alerts Migration** (`alerts.go`):
   - Added migration logic in `LoadActiveAlerts()`
   - Translates legacy alert resource IDs to new format
   - Preserves alert acknowledgments across restarts

### Frontend Changes
5. **ID Construction Updates**:
   - `ThresholdsTable.tsx`: Updated fallback from `instance-node-vmid` to `instance-vmid`
   - `Dashboard.tsx`: Simplified guest ID construction
   - `GuestRow.tsx`: Updated `buildGuestId()` helper

## Migration Strategy
- **Lazy Migration**: Configs are migrated as guests are discovered
- **Backwards Compatible**: Old IDs are detected and automatically converted
- **Zero Downtime**: No manual intervention required
- **Persisted**: Migrated configs are saved on next config write cycle

## Testing Recommendations
After deployment:
1. Verify existing alert overrides still apply
2. Test VM migration - confirm thresholds persist
3. Check guest metadata (custom URLs) survive migration
4. Verify active alerts maintain acknowledgment state

## Related
- Addresses similar issues with guest metadata and active alert tracking
- Lays groundwork for any future guest-specific configuration features
- Aligns with project philosophy: correctness and UX over implementation complexity
2025-11-06 10:27:15 +00:00
rcourtman
e21a72578f Add configurable SSH port for temperature monitoring
Related to #595

This change adds support for custom SSH ports when collecting temperature
data from Proxmox nodes, resolving issues for users who run SSH on non-standard
ports.

**Why SSH is still needed:**
Temperature monitoring requires reading /sys/class/hwmon sensors on Proxmox
nodes, which is not exposed via the Proxmox API. Even when using API tokens
for authentication, Pulse needs SSH access to collect temperature data.

**Changes:**
- Add `sshPort` configuration to SystemSettings (system.json)
- Add `SSHPort` field to Config with environment variable support (SSH_PORT)
- Add per-node SSH port override capability for PVE, PBS, and PMG instances
- Update TemperatureCollector to accept and use custom SSH port
- Update SSH known_hosts manager to support non-standard ports
- Add NewTemperatureCollectorWithPort() constructor with port parameter
- Maintain backward compatibility with NewTemperatureCollector() (uses port 22)
- Update frontend TypeScript types for SSH port configuration

**Configuration methods:**
1. Environment variable: SSH_PORT=2222
2. system.json: {"sshPort": 2222}
3. Per-node override in nodes.enc (future UI support)

**Default behavior:**
- Defaults to port 22 if not configured
- Maintains full backward compatibility
- No changes required for existing deployments

The implementation includes proper ssh-keyscan port handling and known_hosts
management for non-standard ports using [host]:port notation per SSH standards.
2025-11-05 20:03:29 +00:00
rcourtman
b1831d7b3e Add guest URL support for PVE hosts
Related to discussion #615

Add optional GuestURL field to PVE instances and cluster endpoints,
allowing users to specify a separate guest-accessible URL for web UI
navigation that differs from the internal management URL.

Backend changes:
- Add GuestURL field to PVEInstance and ClusterEndpoint structs
- Add GuestURL field to Node model
- Update cluster auto-discovery to preserve existing GuestURL values
- Update node creation logic to populate GuestURL from config
- Update API handlers to accept and persist GuestURL field

Frontend changes:
- Add GuestURL input field to NodeModal for configuration
- Update NodeGroupHeader and NodeSummaryTable to use GuestURL for navigation
- Add GuestURL to Node and PVENodeConfig TypeScript interfaces

When GuestURL is configured, it will be used for navigation links
instead of the Host URL, allowing users to access PVE hosts through
a reverse proxy or different domain while maintaining internal API
connections.
2025-11-05 19:06:08 +00:00
rcourtman
c93581e1aa Add DNS caching to reduce excessive DNS queries
Related to #608

Implements DNS caching using rs/dnscache to dramatically reduce DNS query
volume for frequently accessed Proxmox hosts. Users were reporting 260,000+
DNS queries in 37 hours for the same hostnames.

Changes:
- Added rs/dnscache dependency for DNS resolution caching
- Created pkg/tlsutil/dnscache.go with DNS cache wrapper
- Updated HTTP client creation to use cached DNS resolver
- Added DNSCacheTimeout configuration option (default: 5 minutes)
- Made DNS cache timeout configurable via:
  - system.json: dnsCacheTimeout field (seconds)
  - Environment variable: DNS_CACHE_TIMEOUT (duration string)
- DNS cache periodically refreshes to prevent stale entries

Benefits:
- Reduces DNS query load on local DNS servers by ~99%
- Reduces network traffic and DNS query log volume
- Maintains fresh DNS entries through periodic refresh
- Configurable timeout for different network environments

Default behavior: 5-minute cache timeout with automatic refresh
2025-11-05 18:25:38 +00:00
rcourtman
27f2038dab Add per-node temperature monitoring and fix critical config update bug
This commit implements per-node temperature monitoring control and fixes a critical
bug where partial node updates were destroying existing configuration.

Backend changes:
- Add TemperatureMonitoringEnabled field (*bool) to PVEInstance, PBSInstance, and PMGInstance
- Update monitor.go to check per-node temperature setting with global fallback
- Convert all NodeConfigRequest boolean fields to *bool pointers
- Add nil checks in HandleUpdateNode to prevent overwriting unmodified fields
- Fix critical bug where partial updates zeroed out MonitorVMs, MonitorContainers, etc.
- Update NodeResponse, NodeFrontend, and StateSnapshot to include temperature setting
- Fix HandleAddNode and test connection handlers to use pointer-based boolean fields

Frontend changes:
- Add temperatureMonitoringEnabled to Node interface and config types
- Create per-node temperature monitoring toggle handler with optimistic updates
- Update NodeModal to wire up per-node temperature toggle
- Add isTemperatureMonitoringEnabled helper to check effective monitoring state
- Update ConfiguredNodeTables to show/hide temperature badge based on monitoring state
- Update NodeSummaryTable to conditionally show temperature column
- Pass globalTemperatureMonitoringEnabled prop through component tree

The critical bug fix ensures that when updating a single field (like temperature
monitoring), the backend only modifies that specific field instead of zeroing out
all other boolean configuration fields.
2025-11-05 14:11:53 +00:00
rcourtman
d52ac6d8b5 Fix CSRF token validation and improve token management
- Add Access-Control-Expose-Headers to allow frontend to read X-CSRF-Token response header
- Implement proactive CSRF token issuance on GET requests when session exists but CSRF cookie is missing
- Ensures frontend always has valid CSRF token before making POST requests
- Fixes 403 Forbidden errors when toggling system settings

This resolves CSRF validation failures that occurred when CSRF tokens expired or were missing while valid sessions existed.
2025-11-05 09:23:44 +00:00
rcourtman
6eb1a10d9b Refactor: Code cleanup and localStorage consolidation
This commit includes comprehensive codebase cleanup and refactoring:

## Code Cleanup
- Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate)
- Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo)
- Clean up commented-out code blocks across multiple files
- Remove unused TypeScript exports (helpTextClass, private tag color helpers)
- Delete obsolete test files and components

## localStorage Consolidation
- Centralize all storage keys into STORAGE_KEYS constant
- Update 5 files to use centralized keys:
  * utils/apiClient.ts (AUTH, LEGACY_TOKEN)
  * components/Dashboard/Dashboard.tsx (GUEST_METADATA)
  * components/Docker/DockerHosts.tsx (DOCKER_METADATA)
  * App.tsx (PLATFORMS_SEEN)
  * stores/updates.ts (UPDATES)
- Benefits: Single source of truth, prevents typos, better maintainability

## Previous Work Committed
- Docker monitoring improvements and disk metrics
- Security enhancements and setup fixes
- API refactoring and cleanup
- Documentation updates
- Build system improvements

## Testing
- All frontend tests pass (29 tests)
- All Go tests pass (15 packages)
- Production build successful
- Zero breaking changes

Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)
2025-11-04 21:50:46 +00:00
rcourtman
32392d1212 Add disk metrics, block I/O, and mount details to Docker monitoring
Extends Docker container monitoring with comprehensive disk and storage information:
- Writable layer size and root filesystem usage displayed in new Disk column
- Block I/O statistics (read/write bytes totals) shown in container drawer
- Mount metadata including type, source, destination, mode, and driver details
- Configurable via --collect-disk flag (enabled by default, can be disabled for large fleets)

Also fixes config watcher to consistently use production auth config path instead of following PULSE_DATA_DIR when in mock mode.
2025-10-29 12:05:36 +00:00
rcourtman
b3285c05c8 Consolidate pending changes
- Add Docker metadata test comment
- Update alerts configuration and thresholds
- Enhance config file watcher
- Update documentation
- Refine settings UI
2025-10-28 23:20:44 +00:00
rcourtman
99b11760ac Implement Docker metadata API endpoints
Add backend support for storing and managing Docker resource metadata:

- Create DockerMetadataStore for managing Docker container/service metadata
- Implement DockerMetadataHandler with GET/PUT/DELETE operations
- Register /api/docker/metadata routes with proper authentication
- Store metadata in docker_metadata.json file
- Validate custom URLs (http/https scheme, valid host)
- Supports resource IDs in format: {hostId}:container:{containerId}

Enables the frontend Docker URL editing feature to persist data.
2025-10-28 22:56:53 +00:00
rcourtman
e07336dd9f refactor: remove legacy DISABLE_AUTH flag and enhance authentication UX
Major authentication system improvements:

- Remove deprecated DISABLE_AUTH environment variable support
- Update all documentation to remove DISABLE_AUTH references
- Add auth recovery instructions to docs (create .auth_recovery file)
- Improve first-run setup and Quick Security wizard flows
- Enhance login page with better error messaging and validation
- Refactor Docker hosts view with new unified table and tree components
- Add useDebouncedValue hook for better search performance
- Improve Settings page with better security configuration UX
- Update mock mode and development scripts for consistency
- Add ScrollableTable persistence and improved responsive design

Backend changes:
- Remove DISABLE_AUTH flag detection and handling
- Improve auth configuration validation and error messages
- Enhance security status endpoint responses
- Update router integration tests

Frontend changes:
- New Docker components: DockerUnifiedTable, DockerTree, DockerSummaryStats
- Better connection status indicator positioning
- Improved authentication state management
- Enhanced CSRF and session handling
- Better loading states and error recovery

This completes the migration away from the insecure DISABLE_AUTH pattern
toward proper authentication with recovery mechanisms.
2025-10-27 19:46:51 +00:00
rcourtman
5a2d808aa1 Harden setup token flow and enforce encrypted persistence 2025-10-25 16:00:37 +00:00
rcourtman
d643dcf0bc perf: reduce polling allocations and guest metadata load 2025-10-25 13:12:47 +00:00
rcourtman
6333a445e9 feat: add native Windows service support and expandable host details
Windows Host Agent Enhancements:
- Implement native Windows service support using golang.org/x/sys/windows/svc
- Add Windows Event Log integration for troubleshooting
- Create professional PowerShell installation/uninstallation scripts
- Add process termination and retry logic to handle Windows file locking
- Register uninstall endpoint at /uninstall-host-agent.ps1

Host Agent UI Improvements:
- Add expandable drawer to Hosts page (click row to view details)
- Display system info, network interfaces, disks, and temperatures in cards
- Replace status badges with subtle colored indicators
- Remove redundant master-detail sidebar layout
- Add search filtering for hosts

Technical Details:
- service_windows.go: Windows service lifecycle management with graceful shutdown
- service_stub.go: Cross-platform compatibility for non-Windows builds
- install-host-agent.ps1: Full Windows installation with validation
- uninstall-host-agent.ps1: Clean removal with process termination and retries
- HostsOverview.tsx: Expandable row pattern matching Docker/Proxmox pages

Files Added:
- cmd/pulse-host-agent/service_windows.go
- cmd/pulse-host-agent/service_stub.go
- scripts/install-host-agent.ps1
- scripts/uninstall-host-agent.ps1
- frontend-modern/src/components/Hosts/HostsOverview.tsx
- frontend-modern/src/components/Hosts/HostsFilter.tsx

The Windows service now starts reliably with automatic restart on failure,
and the uninstall script handles file locking gracefully without requiring reboots.
2025-10-23 22:11:56 +00:00
rcourtman
5c54685f04 Add API token scopes and standalone host agent
Introduces granular permission scopes for API tokens (docker:report, docker:manage, host-agent:report, monitoring:read/write, settings:read/write) allowing tokens to be restricted to minimum required access. Legacy tokens default to full access until scopes are explicitly configured.

Adds standalone host agent for monitoring Linux, macOS, and Windows servers outside Proxmox/Docker estates. New Servers workspace in UI displays uptime, OS metadata, and capacity metrics from enrolled agents.

Includes comprehensive token management UI overhaul with scope presets, inline editing, and visual scope indicators.
2025-10-23 11:40:31 +00:00
rcourtman
be26f957c0 Add snapshot size alert thresholds (#585) 2025-10-22 13:30:40 +00:00
rcourtman
ff4dc49ae4 Update Pulse install flow and related components 2025-10-21 19:58:53 +00:00
rcourtman
999da6d900 feat: production-ready import/export with API tokens and transactional rollback
Export/import payload bumped to v4.1 to include API tokens alongside existing
config bundle, eliminating blind spots in disaster recovery scenarios.

## Key Features

**API Tokens in Exports (v4.1)**
- Exports now include API token metadata (ID, name, hash, prefix, suffix, timestamps)
- Export format version bumped from 4.0 to 4.1
- Fixes gap where API tokens were lost during config migrations

**Transactional Atomic Imports**
- New importTransaction helper stages all writes before committing
- On failure, automatic rollback restores original configs
- Prevents partial/corrupted imports that could break running systems
- All config writes (nodes, alerts, email, webhooks, apprise, system, OIDC, API tokens, guest metadata) now transaction-aware

**Backward Compatibility**
- Version 4.0 exports (without API tokens) still import successfully
- System logs notice but proceeds, leaving existing API tokens untouched
- No breaking changes to existing export/import workflows

## Implementation

**Files Added:**
- internal/config/import_transaction.go - Transaction helper with staging/rollback

**Files Modified:**
- internal/config/export.go - v4.1 export, transactional ImportConfig wrapper
- internal/config/persistence.go - Transaction-aware Save* methods, beginTransaction/endTransaction helpers
- internal/config/persistence_test.go - 4 comprehensive unit tests

**Testing:**
- TestExportConfigIncludesAPITokens - Verifies API tokens in v4.1 exports
- TestImportConfigTransactionalSuccess - Validates atomic import success path
- TestImportConfigRollbackOnFailure - Confirms rollback on mid-import failure
- TestImportAcceptsVersion40Bundle - Ensures backward compatibility with v4.0

All tests passing 

## Migration Notes

- No manual migration required
- Users can re-export to generate v4.1 bundles with API tokens
- Existing 4.0 bundles remain valid for import
- Recommended: Re-run export after upgrade to ensure API tokens are captured

Co-authored-by: Codex (implementation)
Co-authored-by: Claude (coordination and testing)
2025-10-21 14:37:44 +00:00
rcourtman
bd13b966d0 feat: complete API token export/import with version handling
Complete the API token export/import feature with proper version
handling and backward compatibility:

- Bump export format to version 4.1 to indicate API token support
- Import API tokens when loading v4.1 exports
- Handle version compatibility gracefully:
  - v4.1: Full support including API tokens
  - v4.0: Notice that tokens weren't included (backward compatible)
  - Other: Warning but best-effort import
- Initialize empty array instead of nil for cleaner JSON

This ensures API tokens are properly preserved when migrating or
restoring Pulse instances while maintaining backward compatibility
with older exports.
2025-10-21 11:38:23 +00:00
rcourtman
cdbc6057b0 feat: export API tokens in config export
Add API tokens to the export data so they are included when
exporting/backing up configuration. This ensures API tokens are
preserved when migrating or restoring Pulse instances.

Changes:
- Add APITokens field to ExportData struct
- Load API tokens during export process
- Include tokens in exported JSON (omitempty if none exist)
2025-10-21 11:37:25 +00:00
rcourtman
56c6c0cc0c feat: improve discovery with progress tracking, validation, and structured errors
Significantly enhanced network discovery feature to eliminate false positives,
provide real-time progress updates, and better error reporting.

Key improvements:
- Require positive Proxmox identification (version data, auth headers, or certificates)
  instead of reporting any service on ports 8006/8007
- Add real-time progress tracking with phase/target counts and completion percentage
- Implement structured error reporting with IP, phase, type, and timestamp details
- Fix TLS timeout handling to prevent hangs on unresponsive hosts
- Expose progress and structured errors via WebSocket for UI consumption
- Reduce log verbosity by moving discovery logs to debug level
- Fix duplicate IP counting to ensure progress reaches 100%

Breaking changes: None (backward compatible with legacy API methods)
2025-10-20 22:29:30 +00:00
rcourtman
5ebb32ce10 feat: enhance runtime configuration and system settings management
Improves configuration handling and system settings APIs to support
v4.24.0 features including runtime logging controls, adaptive polling
configuration, and enhanced config export/persistence.

Changes:
- Add config override system for discovery service
- Enhance system settings API with runtime logging controls
- Improve config persistence and export functionality
- Update security setup handling
- Refine monitoring and discovery service integration

These changes provide the backend support for the configuration
features documented in the v4.24.0 release.
2025-10-20 17:41:19 +00:00
rcourtman
7d422d2909 feat: add professional logging with runtime configuration and performance optimization
Implements structured logging package with LOG_LEVEL/LOG_FORMAT env support, debug level guards for hot paths, enriched error messages with actionable context, and stack trace capture for production debugging. Improves observability and reduces log overhead in high-frequency polling loops.
2025-10-20 15:13:38 +00:00
rcourtman
57429900a6 feat: add adaptive polling scheduler infrastructure (Phase 2 Tasks 1-3)
Implements adaptive scheduling foundation for Phase 2:
- Poll cycle metrics: duration, staleness, queue depth, in-flight counters
- Adaptive scheduler with pluggable staleness/interval/enqueue interfaces
- Config support: ADAPTIVE_POLLING_ENABLED flag + min/max/base intervals
- Feature flag defaults to disabled for safe rollout
- Scheduler wiring into Monitor with conditional instantiation

Tasks 1-3 of 10 complete. Ready for staleness tracker implementation.
2025-10-20 15:13:37 +00:00
rcourtman
524f42cc28 security: complete Phase 1 sensor proxy hardening
Implements comprehensive security hardening for pulse-sensor-proxy:
- Privilege drop from root to unprivileged user (UID 995)
- Hash-chained tamper-evident audit logging with remote forwarding
- Per-UID rate limiting (0.2 QPS, burst 2) with concurrency caps
- Enhanced command validation with 10+ attack pattern tests
- Fuzz testing (7M+ executions, 0 crashes)
- SSH hardening, AppArmor/seccomp profiles, operational runbooks

All 27 Phase 1 tasks complete. Ready for production deployment.
2025-10-20 15:13:37 +00:00
Pulse Automation Bot
cfdfe896be Adjust backup and snapshot alert handling 2025-10-18 20:11:01 +00:00
Pulse Automation Bot
80b9d0602a Add Apprise notification integration (#570) 2025-10-18 16:39:39 +00:00
Pulse Automation Bot
0b4e4f9c59 Add configurable backup polling interval 2025-10-18 13:06:41 +00:00
rcourtman
4838793677 feat: enhance alerts system with tests and improved thresholds
- Add comprehensive test coverage for alerts package with 285+ new tests
- Implement ThresholdsTable component with metric thresholds display
- Enhance Alerts page UI with improved layout and metric filtering
- Add frontend component tests for Alerts page and ThresholdsTable
- Set up Vitest testing infrastructure for SolidJS components
- Improve config persistence with better validation
- Expand discovery tests with 333+ test cases
- Update API, configuration, and Docker monitoring documentation
2025-10-15 22:25:04 +00:00
rcourtman
261bd7ac74 Adopt multi-token auth across docs, UI, and tooling 2025-10-14 15:47:49 +00:00
rcourtman
5c79d2516d feat: streamline docker agent onboarding 2025-10-14 09:45:32 +00:00
rcourtman
c18cf3d4b8 Fix node config API to preserve fields on partial updates
The PUT /api/config/nodes/{id} endpoint was corrupting node configurations
when making partial updates (e.g., updating just monitorPhysicalDisks):

- Authentication fields (tokenName, tokenValue, password) were being cleared
  when updating unrelated settings
- Name field was being blanked when not included in request
- Monitor* boolean fields were defaulting to false

Changes:
- Only update name field if explicitly provided in request
- Only switch authentication method when auth fields are explicitly provided
- Preserve existing auth credentials on non-auth updates
- Applied fix to all node types (PVE, PBS, PMG)

Also enables physical disk monitoring by default (opt-out instead of opt-in)
and preserves disk data between polling intervals.
2025-10-12 17:50:55 +00:00
rcourtman
f46ff1792b Fix settings security tab navigation 2025-10-11 23:29:47 +00:00