Commit graph

64 commits

Author SHA1 Message Date
rcourtman
706502c22d fix(alerts): default NotifyOnResolve to true and prevent patrol queue spam (#1259, #1258)
Recovery notifications were silently disabled for users with pre-5.1.12
configs because the NotifyOnResolve bool field defaults to false when
absent from JSON. Use a *bool probe to detect missing field and default
to true.

Patrol trigger queue filled with warnings when the patrol loop wasn't
running. Gate TriggerPatrolForAlert on p.running and clear the flag
via defer when the loop exits.
2026-02-20 17:56:41 +00:00
rcourtman
896b5bfc89 Fix: enable backup monitoring for PVE instances via config migration
Adds a config migration that ensures MonitorBackups is enabled for PVE
instances, matching the existing PBS migration from issue #411. This fixes
issue #1139 where local PVE backups weren't appearing in the backup overview
because the MonitorBackups field defaulted to false when not explicitly set.

Fixes #1139
2026-02-03 13:38:41 +00:00
rcourtman
4af5fc4246 refactor(config): rename BackendHost/BackendPort to BindAddress
Simplify server config by consolidating BackendHost and BackendPort into
a single BindAddress field. The port is now solely controlled by FrontendPort.

Changes:
- Replace BackendHost/BackendPort with BindAddress in Config struct
- Add deprecation warning for BACKEND_HOST env var (use BIND_ADDRESS)
- Update connection timeout default from 45s to 60s
- Remove backendPort from SystemSettings and frontend types
- Update server.go to use cfg.BindAddress
- Update all tests to use new config field names
2026-02-01 23:26:32 +00:00
rcourtman
95a0d7a6bd feat(backend): implement AI Patrol, Investigation, and system-wide refactors 2026-01-30 19:02:14 +00:00
rcourtman
19a67dd4f3 Update core infrastructure components
Config:
- AI configuration improvements
- API tokens handling
- Persistence layer updates

Host Agent:
- Command execution improvements
- Better test coverage

Infrastructure Discovery:
- Service improvements
- Enhanced test coverage

Models:
- State snapshot updates
- Model improvements

Monitoring:
- Polling improvements
- Guest config handling
- Storage config support

WebSocket:
- Hub tenant test updates

Service Discovery:
- New service discovery module
2026-01-28 16:52:35 +00:00
rcourtman
4a8f9827fe feat: add config migration system and multi-tenant support
Migration System:
- Add migration framework for config schema updates
- Add migration tests

Config Enhancements:
- Add multi-tenant configuration support
- Add DeepCopy for tenant isolation
- Enhance AI config options
- Improve API token handling
- Update persistence layer

Documentation:
- Update multi-tenant documentation
2026-01-24 22:43:10 +00:00
rcourtman
d909f319a5 feat: improve AI config and persistence
- Enhance AI configuration options
- Improve persistence layer
- Add AI config tests
2026-01-22 22:31:42 +00:00
rcourtman
289d95374f feat: add multi-tenancy foundation (directory-per-tenant)
Implements Phase 1-2 of multi-tenancy support using a directory-per-tenant
strategy that preserves existing file-based persistence.

Key changes:
- Add MultiTenantPersistence manager for org-scoped config routing
- Add TenantMiddleware for X-Pulse-Org-ID header extraction and context propagation
- Add MultiTenantMonitor for per-tenant monitor lifecycle management
- Refactor handlers (ConfigHandlers, AlertHandlers, AIHandlers, etc.) to be
  context-aware with getConfig(ctx)/getMonitor(ctx) helpers
- Add Organization model for future tenant metadata
- Update server and router to wire multi-tenant components

All handlers maintain backward compatibility via legacy field fallbacks
for single-tenant deployments using the "default" org.
2026-01-22 13:39:06 +00:00
rcourtman
633eea83db refactor: remove deprecated config fields
- Remove unused envconfig tags (BackendHost, FrontendHost, etc.)
- Remove APITokenEnabled (infer from token count)
- Remove IframeEmbeddingAllow, Port, Debug, ConcurrentPolling
- Clean up temperature proxy comments from ClusterEndpoint
- Simplify API token diagnostic to use config field directly
2026-01-22 00:43:27 +00:00
rcourtman
cdcd50c8c1 fix: persist full-width layout preference on server. Related to #1130
Full-width mode now syncs to server like dark mode, ensuring the setting
persists across Proxmox helper script updates. Previously only used
localStorage which gets cleared on some update methods.
2026-01-20 23:01:33 +00:00
rcourtman
ecc31730f6 Remove OpenCode references 2026-01-20 16:56:41 +00:00
rcourtman
035436ad6e fix: add mutex to prevent concurrent map writes in Docker agent CPU tracking
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.

Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()

Related to #1063
2026-01-15 21:10:55 +00:00
rcourtman
9cd53814a3 feat(alerts): add per-volume disk thresholds for host agents
Allow users to set custom disk usage thresholds per mounted filesystem
on host agents, rather than applying a single threshold to all volumes.

This addresses NAS/NVR use cases where some volumes (e.g., NVR storage)
intentionally run at 99% while others need strict monitoring.

Backend:
- Check for disk-specific overrides before using HostDefaults.Disk
- Override key format: host:<hostId>/disk:<mountpoint>
- Support both custom thresholds and disable per-disk

Frontend:
- Add 'hostDisk' resource type
- Add "Host Disks" collapsible section in Thresholds → Hosts tab
- Group disks by host for easier navigation

Closes #1103
2026-01-13 23:38:20 +00:00
rcourtman
b2a6cd0fa3 fix(agent): add FreeBSD platform support to agent download and UI (#1051)
- Add freebsd-amd64 and freebsd-arm64 to normalizeUnifiedAgentArch()
  so the download endpoint serves FreeBSD binaries when requested
- Add FreeBSD/pfSense/OPNsense platform option to agent setup UI
  with note about bash installation requirement
- Add FreeBSD test cases to unified_agent_test.go

Fixes installation on pfSense/OPNsense where users were getting 404
errors because the backend didn't recognize the freebsd-amd64 arch
parameter from install.sh.
2026-01-11 23:51:12 +00:00
rcourtman
f527e6ebd0 docs: fix Kubernetes DaemonSet deployment guide
Fixes #1091 - addresses all three documentation issues reported:

1. Binary path: Changed from /usr/local/bin/pulse-agent (which doesn't
   exist in the main image) to /opt/pulse/bin/pulse-agent-linux-amd64

2. PULSE_AGENT_ID: Added to example and documented why it's required
   for DaemonSets (prevents token conflicts when all pods share one
   API token)

3. Resource visibility flags: Added PULSE_KUBE_INCLUDE_ALL_PODS and
   PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS to example, with explanation
   of the default behavior (show only problematic resources)

Also added tolerations, resource requests/limits, and ARM64 note.
2026-01-11 21:43:23 +00:00
rcourtman
7db6b3e47d feat: Add AI chat session sync across devices
Implements server-side persistence for AI chat sessions, allowing users
to continue conversations across devices and browser sessions. Related
to #1059.

Backend:
- Add chat session CRUD API endpoints (GET/PUT/DELETE)
- Add persistence layer with per-user session storage
- Support session cleanup for old sessions (90 days)
- Multi-user support via auth context

Frontend:
- Rewrite aiChat store with server sync (debounced)
- Add session management UI (new conversation, switch, delete)
- Local storage as fallback/cache
- Initialize sync on app startup when AI is enabled
2026-01-08 10:47:45 +00:00
rcourtman
ed78509f92 Fix flaky tests and improve coverage across alerts, api, and config packages
- Fix deadlock and race conditions in internal/alerts
- Add comprehensive error path tests for internal/config
- Fix 401 handling in internal/api
- Fix Docker Swarm task filtering test logic
2026-01-03 18:36:17 +00:00
rcourtman
3029cce172 fix(patrol): address multiple issues in patrol service
- Add missing KubernetesChecked field to persistence (data was being lost)
- Fix Duration field to properly convert between ms and nanoseconds
- Add automatic cleanup of stale stream subscribers (memory leak fix)
- Add error tracking for findings persistence with callback support
- Add GetPersistenceStatus() and SetOnSaveError() methods
- Add tests for new error tracking functionality
2026-01-02 12:45:00 +00:00
rcourtman
60220ee161 feat: Add server-wide control to disable Docker update actions
Implements PULSE_DISABLE_DOCKER_UPDATE_ACTIONS environment variable and
Settings UI toggle to hide Docker container update buttons while still
allowing update detection. This addresses requests for a 'read-only' mode
in production environments.

Backend:
- Add DisableDockerUpdateActions to SystemSettings and Config structs
- Add environment variable parsing with EnvOverrides tracking
- Expose setting in GET/POST /api/config/system endpoints
- Block update API with 403 when disabled (defense-in-depth)

Frontend:
- Add disableDockerUpdateActions to SystemConfig type
- Create systemSettings store for reactive access to server config
- Add Docker Settings card in Settings → Agents tab with toggle
- Show env lock badge when set via environment variable

UpdateButton improvements:
- Properly handle loading state (disabled + visual indicator)
- Use Solid.js Show components for proper reactivity
- Show read-only UpdateBadge when updates disabled
- Show interactive button when updates enabled

Closes discussion #982
2026-01-02 10:29:43 +00:00
rcourtman
3fdf753a5b Enhance devcontainer and CI workflows
- Add persistent volume mounts for Go/npm caches (faster rebuilds)
- Add shell config with helpful aliases and custom prompt
- Add comprehensive devcontainer documentation
- Add pre-commit hooks for Go formatting and linting
- Use go-version-file in CI workflows instead of hardcoded versions
- Simplify docker compose commands with --wait flag
- Add gitignore entries for devcontainer auth files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 22:29:15 +00:00
rcourtman
6f794753ee fix: Add Public URL setting for email notifications
Docker deployments with custom port mappings would show incorrect URLs
in email alerts because the auto-detection couldn't determine the
external port.

Added a "Public URL" setting in Settings > Network that allows users
to configure the dashboard URL used in email notifications.

- Added publicURL field to SystemSettings (persistence.go)
- Load/save publicURL in system settings handler
- Apply publicURL to notification manager on change
- Added UI input in NetworkSettingsPanel
- Shows env override warning if PULSE_PUBLIC_URL is set

Related to #944
2025-12-28 16:08:22 +00:00
rcourtman
cb3444dd9b fix: Prevent re-migration of deleted env-based API tokens
When a user deletes an API token that was migrated from .env, track
the hash in a suppression list to prevent it from being re-migrated
on the next restart.

Changes:
- Add SuppressedEnvMigrations field to Config
- Add env_token_suppressions.json persistence
- Check suppression list during env token migration
- Record suppressed hash when deleting "Migrated from .env" tokens
- Update RemoveAPIToken to return the removed record

Related to #871
2025-12-23 05:10:47 +00:00
rcourtman
59a4843f20 fix: persist finding dismissal state across restarts
User feedback fields (DismissedReason, UserNote, TimesRaised, Suppressed, Source)
were not being saved to disk, causing 'expected behavior' dismissals to be lost
after Pulse restarted.

- Add missing fields to AIFindingRecord in persistence.go
- Update FindingsPersistenceAdapter to save/load these fields
- Add comprehensive tests for dismissal persistence round-trip

Fixes issue where Frigate storage warning kept reappearing despite being
marked as expected behavior.
2025-12-22 11:18:43 +00:00
rcourtman
07c5880b0a fix: AI settings persistence and UI improvements
Bug Fixes:
- Fix boolean fields with 'omitempty' not persisting false values
  - AlertTriggeredAnalysis, PatrolAnalyzeNodes/Guests/Docker/Storage
  - omitempty causes Go to skip false (zero value) when marshaling JSON
  - On reload, NewDefaultAIConfig() sets true, and missing field stays true

- Fix model dropdown losing selection after save (SolidJS reactivity issue)
  - Added explicit 'selected' attribute to option elements
  - Ensures browser maintains selection with optgroups during re-renders

Improvements:
- Change patrol type label from 'Quick' to 'Patrol' in history table
- Add chat_model and patrol_model to AI settings update log
- Add alert_triggered_analysis to AI config load log for debugging
2025-12-21 21:48:09 +00:00
rcourtman
ae522c9a2b fix: Allow all threshold types (Storage, Temperature, Host Agent) to be set to 0 to disable alerting
- Fixed normalizeStorageDefaults to allow Trigger=0
- Fixed normalizeNodeDefaults (Temperature) to allow Trigger=0
- Added comprehensive tests for all threshold normalization patterns
- Updated existing test that expected old behavior

Related to #864
2025-12-20 20:42:23 +00:00
rcourtman
db5e79bb37 fix: Allow Host Agent thresholds to be set to 0 to disable alerting. Related to #864 2025-12-20 20:25:20 +00:00
rcourtman
0ee6e50c8b fix(config): avoid deadlock saving empty nodes config 2025-12-17 13:28:06 +00:00
rcourtman
cf44352c83 feat: configurable backup freshness thresholds for dashboard indicator
Adds FreshHours and StaleHours settings to control when the dashboard
backup indicator shows green (fresh), amber (stale), or red (critical).

- Backend: Added FreshHours/StaleHours to BackupAlertConfig (default 24/72 hours)
- Frontend: getBackupInfo() now accepts optional thresholds parameter
- Dashboard/GuestRow components use thresholds from alert config
- Settings saved/loaded with alert configuration

Closes #839
2025-12-16 16:36:08 +00:00
rcourtman
e6d07c3294 style: remove emojis from log messages
Replaced emoji icons with plain text for cleaner logs and cross-platform compatibility.
2025-12-13 21:29:11 +00:00
rcourtman
97f2bfa1ed feat: add configurable metrics retention settings
- Add MetricsRetentionRawHours, MetricsRetentionMinuteHours, MetricsRetentionHourlyDays, MetricsRetentionDailyDays to SystemSettings
- Wire settings from system.json through Config to metrics store initialization
- Set sensible defaults: Raw=2h, Minute=24h, Hourly=7d, Daily=90d
- Log active retention values on startup for transparency

Users can now customize how long metrics are stored at each aggregation tier.
2025-12-13 14:14:07 +00:00
rcourtman
88d419dd5b feat(ai): Add enriched context with historical trends and predictions
Phase 1 of Pulse AI differentiation:

- Create internal/ai/context package with types, trends, builder, formatter
- Implement linear regression for trend computation (growing/declining/stable/volatile)
- Add storage capacity predictions (predicts days until 90% and 100%)
- Wire MetricsHistory from monitor to patrol service
- Update patrol to use buildEnrichedContext instead of basic summary
- Update patrol prompt to reference trend indicators and predictions

This gives the AI awareness of historical patterns, enabling it to:
- Identify resources with concerning growth rates
- Predict capacity exhaustion before it happens
- Distinguish between stable high usage vs growing problems
- Provide more actionable, time-aware insights

All tests passing. Falls back to basic summary if metrics history unavailable.
2025-12-12 09:45:57 +00:00
rcourtman
1e3fdb6f63 feat(ai): Enhanced AI patrol system with alert triggers and history persistence
- Add alert-triggered AI analysis for real-time incident response
- Implement patrol history persistence across restarts
- Add patrol schedule configuration UI in AI Settings
- Enhance AIChat with patrol status and manual trigger controls
- Add resource store improvements for AI context building
- Expand Alerts page with AI-powered analysis integration
- Add Vite proxy config for AI API endpoints
- Support both Anthropic and OpenAI providers with streaming
2025-12-10 21:08:22 +00:00
rcourtman
ae7b66ecff refactor(ai): Remove over-engineered URL discovery service
Keep only the simple AI-powered approach:
- set_resource_url tool lets AI save discovered URLs
- Users ask AI directly: 'Find URLs for my containers'
- AI uses its intelligence to discover and set URLs

Removed:
- URLDiscoveryService (rigid port scanning)
- Bulk discovery API endpoints
- Frontend discovery button

The AI itself is smart enough to iterate through resources
and discover URLs when asked.
2025-12-10 08:35:24 +00:00
rcourtman
bcd7b550d4 AI Problem Solver implementation and various fixes
- Implement 'Show Problems Only' toggle combining degraded status, high CPU/memory alerts, and needs backup filters
- Add 'Investigate with AI' button to filter bar for problematic guests
- Fix dashboard column sizing inconsistencies between bars and sparklines view modes
- Fix PBS backups display and polling
- Refine AI prompt for general-purpose usage
- Fix frontend flickering and reload loops during initial load
- Integrate persistent SQLite metrics store with Monitor
- Fortify AI command routing with improved validation and logging
- Fix CSRF token handling for note deletion
- Debug and fix AI command execution issues
- Various AI reliability improvements and command safety enhancements
2025-12-06 23:46:08 +00:00
rcourtman
53d7776d6b wip: AI chat integration with multi-provider support
- Add AI service with Anthropic, OpenAI, and Ollama providers
- Add AI chat UI component with streaming responses
- Add AI settings page for configuration
- Add agent exec framework for command execution
- Add API endpoints for AI chat and configuration
2025-12-04 20:16:53 +00:00
rcourtman
884c85c2ab chore: Remove debug logging that exposed config JSON
Removed two DEBUG log statements that were logging full nodes config
JSON at Info level. This was verbose and potentially exposed sensitive
configuration data (credentials, tokens) in logs.
2025-12-02 15:32:02 +00:00
courtmanr@gmail.com
f4c2bd7c35 Implement UI toggle for Hide Local Login (related to issue #750) 2025-11-25 08:14:19 +00:00
rcourtman
2207642fa9 Related to #727: normalize persisted Proxmox hosts 2025-11-20 19:58:05 +00:00
courtmanr@gmail.com
11477546f8 Update config persistence, crypto, and dev script 2025-11-20 11:46:20 +00:00
rcourtman
51b368ddc1 feat: make PVE polling interval configurable (related to #467) 2025-11-18 21:30:04 +00:00
rcourtman
5d99fc2f2d Fix dark mode toggle wiping API tokens (related to #685)
Root cause: SaveSystemSettings calls updateEnvFile which rewrites .env on
any setting change, triggering the config watcher. The watcher sees API_TOKEN
in .env and replaces all UI-created tokens with "Environment token" records,
wiping out host-agent scoped tokens.

Fix: updateEnvFile now compares the new content with existing content and
skips the write if nothing changed. Since dark mode (and other UI settings)
are stored in system.json, not .env, toggling theme no longer triggers
unnecessary .env rewrites.

This prevents the config watcher from being triggered unnecessarily and
preserves UI-created API tokens when changing cosmetic settings.

Future improvement: Deprecate API_TOKEN/API_TOKENS from .env entirely and
make api_tokens.json the single source of truth (requires migration logic).
2025-11-11 00:11:41 +00:00
rcourtman
1b221cca71 feat: Add configurable allowlist for webhook private IP targets (addresses #673)
Allow homelab users to send webhooks to internal services while maintaining security defaults.

Changes:
- Add webhookAllowedPrivateCIDRs field to SystemSettings (persistent config)
- Implement CIDR parsing and validation in NotificationManager
- Convert ValidateWebhookURL to instance method to access allowlist
- Add UI controls in System Settings for configuring trusted CIDR ranges
- Maintain strict security by default (block all private IPs)
- Keep localhost, link-local, and cloud metadata services blocked regardless of allowlist
- Re-validate on both config save and webhook delivery (DNS rebinding protection)
- Add comprehensive tests for CIDR parsing and IP matching

Backend:
- UpdateAllowedPrivateCIDRs() parses comma-separated CIDRs with validation
- Support for bare IPs (auto-converts to /32 or /128)
- Thread-safe allowlist updates with RWMutex
- Logging when allowlist is updated or used
- Validation errors prevent invalid CIDRs from being saved

Frontend:
- New "Webhook Security" section in System Settings
- Input field with examples and helpful placeholder text
- Real-time unsaved changes tracking
- Loads and saves allowlist via system settings API

Security:
- Default behavior unchanged (all private IPs blocked)
- Explicit opt-in required via configuration
- Localhost (127/8) always blocked
- Link-local (169.254/16) always blocked
- Cloud metadata services always blocked
- DNS resolution checked at both save and send time

Testing:
- Tests for CIDR parsing (valid/invalid inputs)
- Tests for IP allowlist matching
- Tests for bare IP address handling
- Tests for security boundaries (localhost, link-local remain blocked)

Related to #673

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-09 08:31:12 +00:00
rcourtman
431769024f Fix P1: Config Persistence transaction field synchronization
**Problem**: writeConfigFileLocked() accessed c.tx field without synchronization
- Function reads c.tx to check if transaction is active (line 109)
- c.tx modified by begin/endTransaction under lock, but read without lock
- Race condition: c.tx could change between check and use

**Impact**:
- Inconsistent transaction handling
- File could be written directly when it should be staged
- Or staged when it should be written directly
- Data corruption risk during config imports

**Fix** (lines 108-128):
- Added documentation that caller MUST hold c.mu lock
- Read c.tx into local variable tx while lock is held
- Use local copy for transaction check
- Safe because all callers hold c.mu when calling writeConfigFileLocked
- Transaction field only modified while holding c.mu in begin/endTransaction

This maintains the existing contract (callers hold lock) while making the transaction read safe and explicit.
2025-11-07 10:00:31 +00:00
rcourtman
6ca4d9b750 Fix P1/P2 infrastructure issues: panic recovery and optimizations
This commit addresses 4 P1 important issues and 1 P2 optimization in infrastructure components:

**P1-1: Missing Panic Recovery in Discovery Service** (service.go:172-195, 499-542)
- **Problem**: No panic recovery in Start(), ForceRefresh(), SetSubnet() goroutines
- **Impact**: Silent service death if scan panics, broken discovery with no monitoring
- **Fix**:
  - Wrapped initial scan goroutine with defer/recover (lines 172-182)
  - Wrapped scanLoop goroutine with defer/recover (lines 185-195)
  - Wrapped ForceRefresh scan with defer/recover (lines 499-509)
  - Wrapped SetSubnet scan with defer/recover (lines 532-542)
  - All log panics with stack traces for debugging

**P1-2: Missing Panic Recovery in Config Watcher Callback** (watcher.go:546-556)
- **Problem**: User-provided onMockReload callback could panic and crash watcher
- **Impact**: Panicking callback kills watcher goroutine, no config updates
- **Fix**: Wrapped callback invocation with defer/recover and stack trace logging

**P1-3: Session Store Stop() Using Send Instead of Close** (session_store.go:16-84)
- **Problem**: Stop() used channel send which blocks if nobody reads
- **Impact**: Stop() hangs if backgroundWorker already exited
- **Fix**:
  - Added sync.Once field stopOnce (line 22)
  - Changed Stop() to use close() within stopOnce.Do() (lines 80-84)
  - Prevents double-close panic and ensures all readers are signaled

**P2-1: Backup Cleanup Inefficient O(n²) Sort** (persistence.go:1424-1427)
- **Problem**: Bubble sort used to sort backups by modification time
- **Impact**: Inefficient for large backup counts (>100 files)
- **Fix**:
  - Replaced bubble sort with sort.Slice() using O(n log n) algorithm
  - Added "sort" import (line 9)
  - Maintains same oldest-first ordering for deletion logic

All fixes add defensive programming without changing external behavior. Panic recovery ensures services continue operating even with bugs, while optimization reduces cleanup time for backup-heavy environments.
2025-11-07 09:55:22 +00:00
rcourtman
ba6d934204 Fix critical P0 infrastructure concurrency issues
This commit addresses 3 critical P0 race conditions and resource leaks in core infrastructure:

**P0-1: Discovery Service Goroutine Leak** (service.go:468, 488)
- **Problem**: ForceRefresh() and SetSubnet() spawned unbounded goroutines without checking if scan already in progress
- **Impact**: Rapid API calls create goroutine explosion, resource exhaustion
- **Fix**:
  - ForceRefresh: Check isScanning before spawning goroutine (lines 470-476)
  - SetSubnet: Check isScanning, defer scan if already running (lines 491-504)
  - Both now log when skipping to aid debugging

**P0-2: Config Persistence Unlock/Relock Race** (persistence.go:1177-1206)
- **Problem**: LoadNodesConfig() unlocked RLock, called SaveNodesConfig (acquires Lock), then relocked
- **Impact**: Another goroutine could modify config between unlock/relock, causing migrated data loss
- **Fix**:
  - Copy instance slices while holding RLock to ensure consistency (lines 1189-1194)
  - Release lock, save copies, then return without relocking (lines 1196-1205)
  - Prevents TOCTOU vulnerability where migrations could be overwritten

**P0-3: Config Watcher Channel Close Race** (watcher.go:19-178)
- **Problem**: Stop() used select-check-close pattern vulnerable to concurrent calls
- **Impact**: Multiple Stop() calls panic on double-close
- **Fix**:
  - Added sync.Once field stopOnce to ConfigWatcher struct (line 26)
  - Changed Stop() to use stopOnce.Do() ensuring single execution (lines 175-178)
  - Removed racy select-based guard

All fixes maintain backwards compatibility and add defensive logging for operational visibility.
2025-11-07 09:49:55 +00:00
rcourtman
9257071ca1 Add encryption status to notification health endpoint (P2)
Backend:
- Add IsEncryptionEnabled() method to ConfigPersistence
- Include encryption status in /api/notifications/health response
- Allows frontend to warn when credentials are stored in plaintext

Frontend:
- Update NotificationHealth type to include encryption.enabled field
- Frontend can now display warnings when encryption is disabled

This addresses the P2 requirement for encryption visibility, allowing
operators to know when notification credentials are not encrypted at rest.
2025-11-07 08:36:55 +00:00
rcourtman
e21a72578f Add configurable SSH port for temperature monitoring
Related to #595

This change adds support for custom SSH ports when collecting temperature
data from Proxmox nodes, resolving issues for users who run SSH on non-standard
ports.

**Why SSH is still needed:**
Temperature monitoring requires reading /sys/class/hwmon sensors on Proxmox
nodes, which is not exposed via the Proxmox API. Even when using API tokens
for authentication, Pulse needs SSH access to collect temperature data.

**Changes:**
- Add `sshPort` configuration to SystemSettings (system.json)
- Add `SSHPort` field to Config with environment variable support (SSH_PORT)
- Add per-node SSH port override capability for PVE, PBS, and PMG instances
- Update TemperatureCollector to accept and use custom SSH port
- Update SSH known_hosts manager to support non-standard ports
- Add NewTemperatureCollectorWithPort() constructor with port parameter
- Maintain backward compatibility with NewTemperatureCollector() (uses port 22)
- Update frontend TypeScript types for SSH port configuration

**Configuration methods:**
1. Environment variable: SSH_PORT=2222
2. system.json: {"sshPort": 2222}
3. Per-node override in nodes.enc (future UI support)

**Default behavior:**
- Defaults to port 22 if not configured
- Maintains full backward compatibility
- No changes required for existing deployments

The implementation includes proper ssh-keyscan port handling and known_hosts
management for non-standard ports using [host]:port notation per SSH standards.
2025-11-05 20:03:29 +00:00
rcourtman
c93581e1aa Add DNS caching to reduce excessive DNS queries
Related to #608

Implements DNS caching using rs/dnscache to dramatically reduce DNS query
volume for frequently accessed Proxmox hosts. Users were reporting 260,000+
DNS queries in 37 hours for the same hostnames.

Changes:
- Added rs/dnscache dependency for DNS resolution caching
- Created pkg/tlsutil/dnscache.go with DNS cache wrapper
- Updated HTTP client creation to use cached DNS resolver
- Added DNSCacheTimeout configuration option (default: 5 minutes)
- Made DNS cache timeout configurable via:
  - system.json: dnsCacheTimeout field (seconds)
  - Environment variable: DNS_CACHE_TIMEOUT (duration string)
- DNS cache periodically refreshes to prevent stale entries

Benefits:
- Reduces DNS query load on local DNS servers by ~99%
- Reduces network traffic and DNS query log volume
- Maintains fresh DNS entries through periodic refresh
- Configurable timeout for different network environments

Default behavior: 5-minute cache timeout with automatic refresh
2025-11-05 18:25:38 +00:00
rcourtman
d52ac6d8b5 Fix CSRF token validation and improve token management
- Add Access-Control-Expose-Headers to allow frontend to read X-CSRF-Token response header
- Implement proactive CSRF token issuance on GET requests when session exists but CSRF cookie is missing
- Ensures frontend always has valid CSRF token before making POST requests
- Fixes 403 Forbidden errors when toggling system settings

This resolves CSRF validation failures that occurred when CSRF tokens expired or were missing while valid sessions existed.
2025-11-05 09:23:44 +00:00
rcourtman
5a2d808aa1 Harden setup token flow and enforce encrypted persistence 2025-10-25 16:00:37 +00:00