Recovery notifications were silently disabled for users with pre-5.1.12
configs because the NotifyOnResolve bool field defaults to false when
absent from JSON. Use a *bool probe to detect missing field and default
to true.
Patrol trigger queue filled with warnings when the patrol loop wasn't
running. Gate TriggerPatrolForAlert on p.running and clear the flag
via defer when the loop exits.
Adds a config migration that ensures MonitorBackups is enabled for PVE
instances, matching the existing PBS migration from issue #411. This fixes
issue #1139 where local PVE backups weren't appearing in the backup overview
because the MonitorBackups field defaulted to false when not explicitly set.
Fixes#1139
Simplify server config by consolidating BackendHost and BackendPort into
a single BindAddress field. The port is now solely controlled by FrontendPort.
Changes:
- Replace BackendHost/BackendPort with BindAddress in Config struct
- Add deprecation warning for BACKEND_HOST env var (use BIND_ADDRESS)
- Update connection timeout default from 45s to 60s
- Remove backendPort from SystemSettings and frontend types
- Update server.go to use cfg.BindAddress
- Update all tests to use new config field names
Implements Phase 1-2 of multi-tenancy support using a directory-per-tenant
strategy that preserves existing file-based persistence.
Key changes:
- Add MultiTenantPersistence manager for org-scoped config routing
- Add TenantMiddleware for X-Pulse-Org-ID header extraction and context propagation
- Add MultiTenantMonitor for per-tenant monitor lifecycle management
- Refactor handlers (ConfigHandlers, AlertHandlers, AIHandlers, etc.) to be
context-aware with getConfig(ctx)/getMonitor(ctx) helpers
- Add Organization model for future tenant metadata
- Update server and router to wire multi-tenant components
All handlers maintain backward compatibility via legacy field fallbacks
for single-tenant deployments using the "default" org.
- Remove unused envconfig tags (BackendHost, FrontendHost, etc.)
- Remove APITokenEnabled (infer from token count)
- Remove IframeEmbeddingAllow, Port, Debug, ConcurrentPolling
- Clean up temperature proxy comments from ClusterEndpoint
- Simplify API token diagnostic to use config field directly
Full-width mode now syncs to server like dark mode, ensuring the setting
persists across Proxmox helper script updates. Previously only used
localStorage which gets cleared on some update methods.
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.
Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()
Related to #1063
Allow users to set custom disk usage thresholds per mounted filesystem
on host agents, rather than applying a single threshold to all volumes.
This addresses NAS/NVR use cases where some volumes (e.g., NVR storage)
intentionally run at 99% while others need strict monitoring.
Backend:
- Check for disk-specific overrides before using HostDefaults.Disk
- Override key format: host:<hostId>/disk:<mountpoint>
- Support both custom thresholds and disable per-disk
Frontend:
- Add 'hostDisk' resource type
- Add "Host Disks" collapsible section in Thresholds → Hosts tab
- Group disks by host for easier navigation
Closes#1103
- Add freebsd-amd64 and freebsd-arm64 to normalizeUnifiedAgentArch()
so the download endpoint serves FreeBSD binaries when requested
- Add FreeBSD/pfSense/OPNsense platform option to agent setup UI
with note about bash installation requirement
- Add FreeBSD test cases to unified_agent_test.go
Fixes installation on pfSense/OPNsense where users were getting 404
errors because the backend didn't recognize the freebsd-amd64 arch
parameter from install.sh.
Fixes#1091 - addresses all three documentation issues reported:
1. Binary path: Changed from /usr/local/bin/pulse-agent (which doesn't
exist in the main image) to /opt/pulse/bin/pulse-agent-linux-amd64
2. PULSE_AGENT_ID: Added to example and documented why it's required
for DaemonSets (prevents token conflicts when all pods share one
API token)
3. Resource visibility flags: Added PULSE_KUBE_INCLUDE_ALL_PODS and
PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS to example, with explanation
of the default behavior (show only problematic resources)
Also added tolerations, resource requests/limits, and ARM64 note.
Implements server-side persistence for AI chat sessions, allowing users
to continue conversations across devices and browser sessions. Related
to #1059.
Backend:
- Add chat session CRUD API endpoints (GET/PUT/DELETE)
- Add persistence layer with per-user session storage
- Support session cleanup for old sessions (90 days)
- Multi-user support via auth context
Frontend:
- Rewrite aiChat store with server sync (debounced)
- Add session management UI (new conversation, switch, delete)
- Local storage as fallback/cache
- Initialize sync on app startup when AI is enabled
- Add missing KubernetesChecked field to persistence (data was being lost)
- Fix Duration field to properly convert between ms and nanoseconds
- Add automatic cleanup of stale stream subscribers (memory leak fix)
- Add error tracking for findings persistence with callback support
- Add GetPersistenceStatus() and SetOnSaveError() methods
- Add tests for new error tracking functionality
Implements PULSE_DISABLE_DOCKER_UPDATE_ACTIONS environment variable and
Settings UI toggle to hide Docker container update buttons while still
allowing update detection. This addresses requests for a 'read-only' mode
in production environments.
Backend:
- Add DisableDockerUpdateActions to SystemSettings and Config structs
- Add environment variable parsing with EnvOverrides tracking
- Expose setting in GET/POST /api/config/system endpoints
- Block update API with 403 when disabled (defense-in-depth)
Frontend:
- Add disableDockerUpdateActions to SystemConfig type
- Create systemSettings store for reactive access to server config
- Add Docker Settings card in Settings → Agents tab with toggle
- Show env lock badge when set via environment variable
UpdateButton improvements:
- Properly handle loading state (disabled + visual indicator)
- Use Solid.js Show components for proper reactivity
- Show read-only UpdateBadge when updates disabled
- Show interactive button when updates enabled
Closes discussion #982
- Add persistent volume mounts for Go/npm caches (faster rebuilds)
- Add shell config with helpful aliases and custom prompt
- Add comprehensive devcontainer documentation
- Add pre-commit hooks for Go formatting and linting
- Use go-version-file in CI workflows instead of hardcoded versions
- Simplify docker compose commands with --wait flag
- Add gitignore entries for devcontainer auth files
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Docker deployments with custom port mappings would show incorrect URLs
in email alerts because the auto-detection couldn't determine the
external port.
Added a "Public URL" setting in Settings > Network that allows users
to configure the dashboard URL used in email notifications.
- Added publicURL field to SystemSettings (persistence.go)
- Load/save publicURL in system settings handler
- Apply publicURL to notification manager on change
- Added UI input in NetworkSettingsPanel
- Shows env override warning if PULSE_PUBLIC_URL is set
Related to #944
When a user deletes an API token that was migrated from .env, track
the hash in a suppression list to prevent it from being re-migrated
on the next restart.
Changes:
- Add SuppressedEnvMigrations field to Config
- Add env_token_suppressions.json persistence
- Check suppression list during env token migration
- Record suppressed hash when deleting "Migrated from .env" tokens
- Update RemoveAPIToken to return the removed record
Related to #871
User feedback fields (DismissedReason, UserNote, TimesRaised, Suppressed, Source)
were not being saved to disk, causing 'expected behavior' dismissals to be lost
after Pulse restarted.
- Add missing fields to AIFindingRecord in persistence.go
- Update FindingsPersistenceAdapter to save/load these fields
- Add comprehensive tests for dismissal persistence round-trip
Fixes issue where Frigate storage warning kept reappearing despite being
marked as expected behavior.
Bug Fixes:
- Fix boolean fields with 'omitempty' not persisting false values
- AlertTriggeredAnalysis, PatrolAnalyzeNodes/Guests/Docker/Storage
- omitempty causes Go to skip false (zero value) when marshaling JSON
- On reload, NewDefaultAIConfig() sets true, and missing field stays true
- Fix model dropdown losing selection after save (SolidJS reactivity issue)
- Added explicit 'selected' attribute to option elements
- Ensures browser maintains selection with optgroups during re-renders
Improvements:
- Change patrol type label from 'Quick' to 'Patrol' in history table
- Add chat_model and patrol_model to AI settings update log
- Add alert_triggered_analysis to AI config load log for debugging
- Fixed normalizeStorageDefaults to allow Trigger=0
- Fixed normalizeNodeDefaults (Temperature) to allow Trigger=0
- Added comprehensive tests for all threshold normalization patterns
- Updated existing test that expected old behavior
Related to #864
Adds FreshHours and StaleHours settings to control when the dashboard
backup indicator shows green (fresh), amber (stale), or red (critical).
- Backend: Added FreshHours/StaleHours to BackupAlertConfig (default 24/72 hours)
- Frontend: getBackupInfo() now accepts optional thresholds parameter
- Dashboard/GuestRow components use thresholds from alert config
- Settings saved/loaded with alert configuration
Closes#839
- Add MetricsRetentionRawHours, MetricsRetentionMinuteHours, MetricsRetentionHourlyDays, MetricsRetentionDailyDays to SystemSettings
- Wire settings from system.json through Config to metrics store initialization
- Set sensible defaults: Raw=2h, Minute=24h, Hourly=7d, Daily=90d
- Log active retention values on startup for transparency
Users can now customize how long metrics are stored at each aggregation tier.
Phase 1 of Pulse AI differentiation:
- Create internal/ai/context package with types, trends, builder, formatter
- Implement linear regression for trend computation (growing/declining/stable/volatile)
- Add storage capacity predictions (predicts days until 90% and 100%)
- Wire MetricsHistory from monitor to patrol service
- Update patrol to use buildEnrichedContext instead of basic summary
- Update patrol prompt to reference trend indicators and predictions
This gives the AI awareness of historical patterns, enabling it to:
- Identify resources with concerning growth rates
- Predict capacity exhaustion before it happens
- Distinguish between stable high usage vs growing problems
- Provide more actionable, time-aware insights
All tests passing. Falls back to basic summary if metrics history unavailable.
- Add alert-triggered AI analysis for real-time incident response
- Implement patrol history persistence across restarts
- Add patrol schedule configuration UI in AI Settings
- Enhance AIChat with patrol status and manual trigger controls
- Add resource store improvements for AI context building
- Expand Alerts page with AI-powered analysis integration
- Add Vite proxy config for AI API endpoints
- Support both Anthropic and OpenAI providers with streaming
Keep only the simple AI-powered approach:
- set_resource_url tool lets AI save discovered URLs
- Users ask AI directly: 'Find URLs for my containers'
- AI uses its intelligence to discover and set URLs
Removed:
- URLDiscoveryService (rigid port scanning)
- Bulk discovery API endpoints
- Frontend discovery button
The AI itself is smart enough to iterate through resources
and discover URLs when asked.
- Implement 'Show Problems Only' toggle combining degraded status, high CPU/memory alerts, and needs backup filters
- Add 'Investigate with AI' button to filter bar for problematic guests
- Fix dashboard column sizing inconsistencies between bars and sparklines view modes
- Fix PBS backups display and polling
- Refine AI prompt for general-purpose usage
- Fix frontend flickering and reload loops during initial load
- Integrate persistent SQLite metrics store with Monitor
- Fortify AI command routing with improved validation and logging
- Fix CSRF token handling for note deletion
- Debug and fix AI command execution issues
- Various AI reliability improvements and command safety enhancements
- Add AI service with Anthropic, OpenAI, and Ollama providers
- Add AI chat UI component with streaming responses
- Add AI settings page for configuration
- Add agent exec framework for command execution
- Add API endpoints for AI chat and configuration
Removed two DEBUG log statements that were logging full nodes config
JSON at Info level. This was verbose and potentially exposed sensitive
configuration data (credentials, tokens) in logs.
Root cause: SaveSystemSettings calls updateEnvFile which rewrites .env on
any setting change, triggering the config watcher. The watcher sees API_TOKEN
in .env and replaces all UI-created tokens with "Environment token" records,
wiping out host-agent scoped tokens.
Fix: updateEnvFile now compares the new content with existing content and
skips the write if nothing changed. Since dark mode (and other UI settings)
are stored in system.json, not .env, toggling theme no longer triggers
unnecessary .env rewrites.
This prevents the config watcher from being triggered unnecessarily and
preserves UI-created API tokens when changing cosmetic settings.
Future improvement: Deprecate API_TOKEN/API_TOKENS from .env entirely and
make api_tokens.json the single source of truth (requires migration logic).
Allow homelab users to send webhooks to internal services while maintaining security defaults.
Changes:
- Add webhookAllowedPrivateCIDRs field to SystemSettings (persistent config)
- Implement CIDR parsing and validation in NotificationManager
- Convert ValidateWebhookURL to instance method to access allowlist
- Add UI controls in System Settings for configuring trusted CIDR ranges
- Maintain strict security by default (block all private IPs)
- Keep localhost, link-local, and cloud metadata services blocked regardless of allowlist
- Re-validate on both config save and webhook delivery (DNS rebinding protection)
- Add comprehensive tests for CIDR parsing and IP matching
Backend:
- UpdateAllowedPrivateCIDRs() parses comma-separated CIDRs with validation
- Support for bare IPs (auto-converts to /32 or /128)
- Thread-safe allowlist updates with RWMutex
- Logging when allowlist is updated or used
- Validation errors prevent invalid CIDRs from being saved
Frontend:
- New "Webhook Security" section in System Settings
- Input field with examples and helpful placeholder text
- Real-time unsaved changes tracking
- Loads and saves allowlist via system settings API
Security:
- Default behavior unchanged (all private IPs blocked)
- Explicit opt-in required via configuration
- Localhost (127/8) always blocked
- Link-local (169.254/16) always blocked
- Cloud metadata services always blocked
- DNS resolution checked at both save and send time
Testing:
- Tests for CIDR parsing (valid/invalid inputs)
- Tests for IP allowlist matching
- Tests for bare IP address handling
- Tests for security boundaries (localhost, link-local remain blocked)
Related to #673🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
**Problem**: writeConfigFileLocked() accessed c.tx field without synchronization
- Function reads c.tx to check if transaction is active (line 109)
- c.tx modified by begin/endTransaction under lock, but read without lock
- Race condition: c.tx could change between check and use
**Impact**:
- Inconsistent transaction handling
- File could be written directly when it should be staged
- Or staged when it should be written directly
- Data corruption risk during config imports
**Fix** (lines 108-128):
- Added documentation that caller MUST hold c.mu lock
- Read c.tx into local variable tx while lock is held
- Use local copy for transaction check
- Safe because all callers hold c.mu when calling writeConfigFileLocked
- Transaction field only modified while holding c.mu in begin/endTransaction
This maintains the existing contract (callers hold lock) while making the transaction read safe and explicit.
This commit addresses 4 P1 important issues and 1 P2 optimization in infrastructure components:
**P1-1: Missing Panic Recovery in Discovery Service** (service.go:172-195, 499-542)
- **Problem**: No panic recovery in Start(), ForceRefresh(), SetSubnet() goroutines
- **Impact**: Silent service death if scan panics, broken discovery with no monitoring
- **Fix**:
- Wrapped initial scan goroutine with defer/recover (lines 172-182)
- Wrapped scanLoop goroutine with defer/recover (lines 185-195)
- Wrapped ForceRefresh scan with defer/recover (lines 499-509)
- Wrapped SetSubnet scan with defer/recover (lines 532-542)
- All log panics with stack traces for debugging
**P1-2: Missing Panic Recovery in Config Watcher Callback** (watcher.go:546-556)
- **Problem**: User-provided onMockReload callback could panic and crash watcher
- **Impact**: Panicking callback kills watcher goroutine, no config updates
- **Fix**: Wrapped callback invocation with defer/recover and stack trace logging
**P1-3: Session Store Stop() Using Send Instead of Close** (session_store.go:16-84)
- **Problem**: Stop() used channel send which blocks if nobody reads
- **Impact**: Stop() hangs if backgroundWorker already exited
- **Fix**:
- Added sync.Once field stopOnce (line 22)
- Changed Stop() to use close() within stopOnce.Do() (lines 80-84)
- Prevents double-close panic and ensures all readers are signaled
**P2-1: Backup Cleanup Inefficient O(n²) Sort** (persistence.go:1424-1427)
- **Problem**: Bubble sort used to sort backups by modification time
- **Impact**: Inefficient for large backup counts (>100 files)
- **Fix**:
- Replaced bubble sort with sort.Slice() using O(n log n) algorithm
- Added "sort" import (line 9)
- Maintains same oldest-first ordering for deletion logic
All fixes add defensive programming without changing external behavior. Panic recovery ensures services continue operating even with bugs, while optimization reduces cleanup time for backup-heavy environments.
This commit addresses 3 critical P0 race conditions and resource leaks in core infrastructure:
**P0-1: Discovery Service Goroutine Leak** (service.go:468, 488)
- **Problem**: ForceRefresh() and SetSubnet() spawned unbounded goroutines without checking if scan already in progress
- **Impact**: Rapid API calls create goroutine explosion, resource exhaustion
- **Fix**:
- ForceRefresh: Check isScanning before spawning goroutine (lines 470-476)
- SetSubnet: Check isScanning, defer scan if already running (lines 491-504)
- Both now log when skipping to aid debugging
**P0-2: Config Persistence Unlock/Relock Race** (persistence.go:1177-1206)
- **Problem**: LoadNodesConfig() unlocked RLock, called SaveNodesConfig (acquires Lock), then relocked
- **Impact**: Another goroutine could modify config between unlock/relock, causing migrated data loss
- **Fix**:
- Copy instance slices while holding RLock to ensure consistency (lines 1189-1194)
- Release lock, save copies, then return without relocking (lines 1196-1205)
- Prevents TOCTOU vulnerability where migrations could be overwritten
**P0-3: Config Watcher Channel Close Race** (watcher.go:19-178)
- **Problem**: Stop() used select-check-close pattern vulnerable to concurrent calls
- **Impact**: Multiple Stop() calls panic on double-close
- **Fix**:
- Added sync.Once field stopOnce to ConfigWatcher struct (line 26)
- Changed Stop() to use stopOnce.Do() ensuring single execution (lines 175-178)
- Removed racy select-based guard
All fixes maintain backwards compatibility and add defensive logging for operational visibility.
Backend:
- Add IsEncryptionEnabled() method to ConfigPersistence
- Include encryption status in /api/notifications/health response
- Allows frontend to warn when credentials are stored in plaintext
Frontend:
- Update NotificationHealth type to include encryption.enabled field
- Frontend can now display warnings when encryption is disabled
This addresses the P2 requirement for encryption visibility, allowing
operators to know when notification credentials are not encrypted at rest.
Related to #595
This change adds support for custom SSH ports when collecting temperature
data from Proxmox nodes, resolving issues for users who run SSH on non-standard
ports.
**Why SSH is still needed:**
Temperature monitoring requires reading /sys/class/hwmon sensors on Proxmox
nodes, which is not exposed via the Proxmox API. Even when using API tokens
for authentication, Pulse needs SSH access to collect temperature data.
**Changes:**
- Add `sshPort` configuration to SystemSettings (system.json)
- Add `SSHPort` field to Config with environment variable support (SSH_PORT)
- Add per-node SSH port override capability for PVE, PBS, and PMG instances
- Update TemperatureCollector to accept and use custom SSH port
- Update SSH known_hosts manager to support non-standard ports
- Add NewTemperatureCollectorWithPort() constructor with port parameter
- Maintain backward compatibility with NewTemperatureCollector() (uses port 22)
- Update frontend TypeScript types for SSH port configuration
**Configuration methods:**
1. Environment variable: SSH_PORT=2222
2. system.json: {"sshPort": 2222}
3. Per-node override in nodes.enc (future UI support)
**Default behavior:**
- Defaults to port 22 if not configured
- Maintains full backward compatibility
- No changes required for existing deployments
The implementation includes proper ssh-keyscan port handling and known_hosts
management for non-standard ports using [host]:port notation per SSH standards.
Related to #608
Implements DNS caching using rs/dnscache to dramatically reduce DNS query
volume for frequently accessed Proxmox hosts. Users were reporting 260,000+
DNS queries in 37 hours for the same hostnames.
Changes:
- Added rs/dnscache dependency for DNS resolution caching
- Created pkg/tlsutil/dnscache.go with DNS cache wrapper
- Updated HTTP client creation to use cached DNS resolver
- Added DNSCacheTimeout configuration option (default: 5 minutes)
- Made DNS cache timeout configurable via:
- system.json: dnsCacheTimeout field (seconds)
- Environment variable: DNS_CACHE_TIMEOUT (duration string)
- DNS cache periodically refreshes to prevent stale entries
Benefits:
- Reduces DNS query load on local DNS servers by ~99%
- Reduces network traffic and DNS query log volume
- Maintains fresh DNS entries through periodic refresh
- Configurable timeout for different network environments
Default behavior: 5-minute cache timeout with automatic refresh
- Add Access-Control-Expose-Headers to allow frontend to read X-CSRF-Token response header
- Implement proactive CSRF token issuance on GET requests when session exists but CSRF cookie is missing
- Ensures frontend always has valid CSRF token before making POST requests
- Fixes 403 Forbidden errors when toggling system settings
This resolves CSRF validation failures that occurred when CSRF tokens expired or were missing while valid sessions existed.