Commit graph

33 commits

Author SHA1 Message Date
rcourtman
d310c257a1 Prefer monitor connection state in diagnostics 2026-03-26 22:58:57 +00:00
rcourtman
1a582ccc35 fix(diagnostics): honor PVE fingerprint in diagnostics probe 2026-03-10 22:46:12 +00:00
rcourtman
572520ebc6 Promote guest-agent /proc/meminfo fallback for accurate VM memory (#1270)
Move the guest-agent file-read of /proc/meminfo earlier in the memory
fallback chain so it runs before RRD, giving real-time MemAvailable that
correctly excludes reclaimable buff/cache on Linux VMs. Also add
VM.GuestAgent.FileRead permission for PVE 9 and fix install.sh to use
comma-separated privilege strings.
2026-03-09 10:04:28 +00:00
rcourtman
9072b8eaa8 feat: enhance API router with multi-tenant authorization
Router & Middleware:
- Add auth context middleware for user/token extraction
- Add tenant middleware with authorization checking
- Refactor middleware chain ordering for proper isolation
- Add router helpers for common patterns

Authentication & SSO:
- Enhance auth with tenant-aware context
- Update OIDC, SAML, and SSO handlers for multi-tenant
- Add RBAC handler improvements
- Add security enhancements

New Test Coverage:
- API foundation tests
- Auth and authorization tests
- Router state and general tests
- SSO handler CRUD tests
- WebSocket isolation tests
- Resource handler tests
2026-01-24 22:42:23 +00:00
rcourtman
4c19fa3c1b fix: resolve btrfs disk summing (#1158), podman disable flag (#1151), and diagnostics path (#1155) 2026-01-23 19:24:38 +00:00
rcourtman
8bf31214f5 feat: enhance API handlers and router with new endpoints
- Add new AI handler endpoints
- Enhance diagnostics API
- Improve router configuration
2026-01-22 22:31:24 +00:00
rcourtman
289d95374f feat: add multi-tenancy foundation (directory-per-tenant)
Implements Phase 1-2 of multi-tenancy support using a directory-per-tenant
strategy that preserves existing file-based persistence.

Key changes:
- Add MultiTenantPersistence manager for org-scoped config routing
- Add TenantMiddleware for X-Pulse-Org-ID header extraction and context propagation
- Add MultiTenantMonitor for per-tenant monitor lifecycle management
- Refactor handlers (ConfigHandlers, AlertHandlers, AIHandlers, etc.) to be
  context-aware with getConfig(ctx)/getMonitor(ctx) helpers
- Add Organization model for future tenant metadata
- Update server and router to wire multi-tenant components

All handlers maintain backward compatibility via legacy field fallbacks
for single-tenant deployments using the "default" org.
2026-01-22 13:39:06 +00:00
rcourtman
c75972d57c Fix mock metrics history and guest drawer controls 2026-01-22 09:39:53 +00:00
rcourtman
633eea83db refactor: remove deprecated config fields
- Remove unused envconfig tags (BackendHost, FrontendHost, etc.)
- Remove APITokenEnabled (infer from token count)
- Remove IframeEmbeddingAllow, Port, Debug, ConcurrentPolling
- Clean up temperature proxy comments from ClusterEndpoint
- Simplify API token diagnostic to use config field directly
2026-01-22 00:43:27 +00:00
rcourtman
d306e02151 fix: remove unused imports and obsolete tests in API handlers
- diagnostics.go: remove unused path/filepath and syscall imports
- router.go: remove unused errors import
- diagnostics_test.go: remove tests for deleted functions
  (normalizeHostForComparison, matchInstanceNameByHost)

These changes fix build errors after sensor proxy removal.
2026-01-21 11:59:41 +00:00
rcourtman
ecc31730f6 Remove OpenCode references 2026-01-20 16:56:41 +00:00
rcourtman
8c7581d32c feat(profiles): add AI-assisted profile suggestions
Add ability for users to describe what kind of agent profile they need
in natural language, and have AI generate a suggestion with name,
description, config values, and rationale.

- Add ProfileSuggestionHandler with schema-aware prompting
- Add SuggestProfileModal component with example prompts
- Update AgentProfilesPanel with suggest button and description field
- Streamline ValidConfigKeys to only agent-supported settings
- Update profile validation tests for simplified schema
2026-01-15 13:24:18 +00:00
rcourtman
65829983b5 v5: gate legacy sensor-proxy and prune dev docs 2025-12-18 21:51:25 +00:00
rcourtman
8152197207 fix: mark unused parameters to satisfy unparam linter
Mark intentionally unused parameters with underscore to:
- Silence unparam warnings for legitimate unused parameters
- Keep function signatures intact for API compatibility
- Remove unused req from serveChecksum helper
2025-11-27 10:12:48 +00:00
rcourtman
f9341ae1fc Improve temperature proxy workflow 2025-11-17 14:25:46 +00:00
rcourtman
47d5c14aef Improve temperature proxy control-plane flow 2025-11-15 21:49:51 +00:00
rcourtman
c957ccd9e6 Add CI build workflow and tighten proxy diagnostics 2025-11-14 13:32:29 +00:00
rcourtman
61f011af1d Improve temperature proxy diagnostics and tests 2025-11-13 22:31:53 +00:00
rcourtman
62a9f40cc7 Fix diagnostics incorrectly warning about /run mount in Docker (related to #600)
The diagnostic code was warning ALL deployments using /run/pulse-sensor-proxy
socket path to "remove and re-add" their configuration to use /mnt/pulse-proxy
instead. This was incorrect for Docker deployments where /run is the correct
and documented mount path (see docker-compose.yml line 15).

The warning was only meant for LXC containers where the managed mount at
/mnt/pulse-proxy is preferred over a legacy hand-crafted /run mount.

Fix: Only show the warning in non-Docker environments (check PULSE_DOCKER env).
Docker deployments correctly use /run/pulse-sensor-proxy per compose file.

Impact: Docker users were seeing confusing diagnostic warnings telling them
to reconfigure a correct setup.
2025-11-09 16:49:49 +00:00
rcourtman
1a78dcbba2 Fix guest agent disk data regression on Proxmox 8.3+
Related to #630

Proxmox 8.3+ changed the VM status API to return the `agent` field as an
object ({"enabled":1,"available":1}) instead of an integer (0 or 1). This
caused Pulse to incorrectly treat VMs as having no guest agent, resulting
in missing disk usage data (disk:-1) even when the guest agent was running
and functional.

The issue manifested as:
- VMs showing "Guest details unavailable" or missing disk data
- Pulse logs showing no "Guest agent enabled, querying filesystem info" messages
- `pvesh get /nodes/<node>/qemu/<vmid>/agent/get-fsinfo` working correctly
  from the command line, confirming the agent was functional

Root cause:
The VMStatus struct defined `Agent` as an int field. When Proxmox 8.3+ sent
the new object format, JSON unmarshaling silently left the field at zero,
causing Pulse to skip all guest agent queries.

Changes:
- Created VMAgentField type with custom UnmarshalJSON to handle both formats:
  * Legacy (Proxmox <8.3): integer (0 or 1)
  * Modern (Proxmox 8.3+): object {"enabled":N,"available":N}
- Updated VMStatus.Agent from `int` to `VMAgentField`
- Updated all references to `detailedStatus.Agent` to use `.Agent.Value`
- The unmarshaler prioritizes the "available" field over "enabled" to ensure
  we only query when the agent is actually responding

This fix maintains backward compatibility with older Proxmox versions while
supporting the new format introduced in Proxmox 8.3+.
2025-11-06 18:42:46 +00:00
rcourtman
6eb1a10d9b Refactor: Code cleanup and localStorage consolidation
This commit includes comprehensive codebase cleanup and refactoring:

## Code Cleanup
- Remove dead TypeScript code (types/monitoring.ts - 194 lines duplicate)
- Remove unused Go functions (GetClusterNodes, MigratePassword, GetClusterHealthInfo)
- Clean up commented-out code blocks across multiple files
- Remove unused TypeScript exports (helpTextClass, private tag color helpers)
- Delete obsolete test files and components

## localStorage Consolidation
- Centralize all storage keys into STORAGE_KEYS constant
- Update 5 files to use centralized keys:
  * utils/apiClient.ts (AUTH, LEGACY_TOKEN)
  * components/Dashboard/Dashboard.tsx (GUEST_METADATA)
  * components/Docker/DockerHosts.tsx (DOCKER_METADATA)
  * App.tsx (PLATFORMS_SEEN)
  * stores/updates.ts (UPDATES)
- Benefits: Single source of truth, prevents typos, better maintainability

## Previous Work Committed
- Docker monitoring improvements and disk metrics
- Security enhancements and setup fixes
- API refactoring and cleanup
- Documentation updates
- Build system improvements

## Testing
- All frontend tests pass (29 tests)
- All Go tests pass (15 packages)
- Production build successful
- Zero breaking changes

Total: 186 files changed, 5825 insertions(+), 11602 deletions(-)
2025-11-04 21:50:46 +00:00
rcourtman
5c4be1921c chore: snapshot current changes 2025-11-02 22:47:55 +00:00
rcourtman
e07336dd9f refactor: remove legacy DISABLE_AUTH flag and enhance authentication UX
Major authentication system improvements:

- Remove deprecated DISABLE_AUTH environment variable support
- Update all documentation to remove DISABLE_AUTH references
- Add auth recovery instructions to docs (create .auth_recovery file)
- Improve first-run setup and Quick Security wizard flows
- Enhance login page with better error messaging and validation
- Refactor Docker hosts view with new unified table and tree components
- Add useDebouncedValue hook for better search performance
- Improve Settings page with better security configuration UX
- Update mock mode and development scripts for consistency
- Add ScrollableTable persistence and improved responsive design

Backend changes:
- Remove DISABLE_AUTH flag detection and handling
- Improve auth configuration validation and error messages
- Enhance security status endpoint responses
- Update router integration tests

Frontend changes:
- New Docker components: DockerUnifiedTable, DockerTree, DockerSummaryStats
- Better connection status indicator positioning
- Improved authentication state management
- Enhanced CSRF and session handling
- Better loading states and error recovery

This completes the migration away from the insecure DISABLE_AUTH pattern
toward proper authentication with recovery mechanisms.
2025-10-27 19:46:51 +00:00
rcourtman
cdba742884 Stabilize diagnostics test VM selection 2025-10-22 19:48:56 +00:00
rcourtman
2786afdff0 feat: comprehensive diagnostics and observability improvements
Upgrade diagnostics infrastructure from 5/10 to 8/10 production readiness
with enhanced metrics, logging, and request correlation capabilities.

**Request Correlation**
- Wire request IDs through context in middleware
- Return X-Request-ID header in all API responses
- Enable downstream log correlation across request lifecycle

**HTTP/API Metrics** (18 new Prometheus metrics)
- pulse_http_request_duration_seconds - API latency histogram
- pulse_http_requests_total - request counter by method/route/status
- pulse_http_request_errors_total - error counter by type
- Path normalization to control label cardinality

**Per-Node Poll Metrics**
- pulse_monitor_node_poll_duration_seconds - per-node timing
- pulse_monitor_node_poll_total - success/error counts per node
- pulse_monitor_node_poll_errors_total - error breakdown per node
- pulse_monitor_node_poll_last_success_timestamp - freshness tracking
- pulse_monitor_node_poll_staleness_seconds - age since last success
- Enables multi-node hotspot identification

**Scheduler Health Metrics**
- pulse_scheduler_queue_due_soon - ready queue depth
- pulse_scheduler_queue_depth - by instance type
- pulse_scheduler_queue_wait_seconds - time in queue histogram
- pulse_scheduler_dead_letter_depth - failed task tracking
- pulse_scheduler_breaker_state - circuit breaker state
- pulse_scheduler_breaker_failure_count - consecutive failures
- pulse_scheduler_breaker_retry_seconds - time until retry
- Enable alerting on DLQ spikes, breaker opens, queue backlogs

**Diagnostics Endpoint Caching**
- pulse_diagnostics_cache_hits_total - cache performance
- pulse_diagnostics_cache_misses_total - cache misses
- pulse_diagnostics_refresh_duration_seconds - probe timing
- 45-second TTL prevents thundering herd on /api/diagnostics
- Thread-safe with RWMutex
- X-Diagnostics-Cached-At header shows cache freshness

**Debug Log Performance**
- Gate high-frequency debug logs behind IsLevelEnabled() checks
- Reduces CPU waste in production when debug disabled
- Covers scheduler loops, poll cycles, API handlers

**Persistent Logging**
- File logging with automatic rotation
- LOG_FILE, LOG_MAX_SIZE, LOG_MAX_AGE, LOG_COMPRESS env vars
- MultiWriter sends logs to both stderr and file
- Gzip compression support for rotated logs

Files modified:
- internal/api/diagnostics.go (caching layer)
- internal/api/middleware.go (request IDs, HTTP metrics)
- internal/api/http_metrics.go (NEW - HTTP metric definitions)
- internal/logging/logging.go (file logging with rotation)
- internal/monitoring/metrics.go (node + scheduler metrics)
- internal/monitoring/monitor.go (instrumentation, debug gating)

Impact: Dramatically improved production troubleshooting with per-node
visibility, scheduler health metrics, persistent logs, and cached
diagnostics. Fast incident response now possible for multi-node deployments.
2025-10-21 12:37:39 +00:00
rcourtman
f141f7db33 feat: enhance sensor proxy with improved cluster discovery and SSH management
Improvements to pulse-sensor-proxy:
- Fix cluster discovery to use pvecm status for IP addresses instead of node names
- Add standalone node support for non-clustered Proxmox hosts
- Enhanced SSH key push with detailed logging, success/failure tracking, and error reporting
- Add --pulse-server flag to installer for custom Pulse URLs
- Configure www-data group membership for Proxmox IPC access

UI and API cleanup:
- Remove unused "Ensure cluster keys" button from Settings
- Remove /api/diagnostics/temperature-proxy/ensure-cluster-keys endpoint
- Remove EnsureClusterKeys method from tempproxy client

The setup script already handles SSH key distribution during initial configuration,
making the manual refresh button redundant.
2025-10-17 11:43:26 +00:00
rcourtman
aaae27dc11 Log memory source transitions for diagnostics (#553) 2025-10-15 19:19:11 +00:00
rcourtman
5cf0697157 Document optional host-script upgrade path 2025-10-14 13:19:38 +00:00
rcourtman
61020881c4 Align proxy upgrade messaging with node re-add workflow 2025-10-14 13:17:34 +00:00
rcourtman
982a078753 Include temperature proxy status in diagnostics 2025-10-14 12:49:53 +00:00
rcourtman
156fd34c50 Update Proxmox guest agent permissions docs and tooling (refs #548) 2025-10-14 10:21:52 +00:00
rcourtman
a74baed121 feat: capture Proxmox memory snapshots in diagnostics 2025-10-12 10:25:43 +00:00
rcourtman
f46ff1792b Fix settings security tab navigation 2025-10-11 23:29:47 +00:00