Commit graph

204 commits

Author SHA1 Message Date
rcourtman
ee0e89871d fix: reduce metrics memory 86x by reverting buffer and adding LTTB downsampling
The in-memory metrics buffer was changed from 1000 to 86400 points per
metric to support 30-day sparklines, but this pre-allocated ~18 MB per
guest (7 slices × 86400 × 32 bytes). With 50 guests that's 920 MB —
explaining why users needed to double their LXC memory after upgrading
to 5.1.0.

- Revert in-memory buffer to 1000 points / 24h retention
- Remove eager slice pre-allocation (use append growth instead)
- Add LTTB (Largest Triangle Three Buckets) downsampling algorithm
- Chart endpoints now use a two-tier strategy: in-memory for ranges
  ≤ 2h, SQLite persistent store + LTTB for longer ranges
- Reduce frontend ring buffer from 86400 to 2000 points

Related to #1190
2026-02-04 19:49:52 +00:00
rcourtman
9d4d392026 fix: host network sparklines showing cumulative bytes instead of rates
Host network sparklines were displaying wildly incorrect values (e.g., 147 GB/s
for an idle Raspberry Pi) because cumulative byte counters (total bytes since
boot) were being stored directly instead of being converted to rates.

Changes:
- monitor.go: Use RateTracker to calculate network rates for hosts, matching
  the existing pattern used for VMs and containers. Only record network
  metrics when we have enough samples to calculate valid rates.
- router.go: Remove network metrics from live fallback for hosts since we
  can't calculate rates from a single snapshot. Better to show nothing than
  misleading cumulative totals.

The fix follows the established codebase pattern where:
1. Agent reports cumulative RXBytes/TXBytes
2. RateTracker compares consecutive samples to calculate bytes/second
3. Rates are stored in metrics history for sparkline display
2026-02-04 16:11:04 +00:00
rcourtman
5f2990deec Require proxy admin for SSH config endpoints 2026-02-04 15:57:59 +00:00
rcourtman
145e5c46bb Require admin for host config patch and delete 2026-02-04 15:56:07 +00:00
rcourtman
5ede1f6a97 Harden apply-restart auth for proxy/OIDC 2026-02-04 15:48:06 +00:00
rcourtman
34ca427458 Add unified guest intelligence to patrol seed context
Enrich the patrol seed context with service identity (from discovery
store) and network reachability (via ICMP ping through host agents).
The guest metrics table now includes Service and Reachable columns,
and a Service Health Issues section highlights running-but-unreachable
guests. A new SignalGuestUnreachable signal type creates deterministic
findings for unreachable guests.

New files:
- patrol_intelligence.go: GuestProber interface, GuestIntelligence
  type, gatherGuestIntelligence() with concurrent per-node probing
- patrol_prober.go: agentExecProber implementation using batch ping
  commands via connected host agents
2026-02-04 14:08:57 +00:00
rcourtman
5c18748742 Add SMART disk lifecycle monitoring with historical charts
Expand the smartctl collector to capture detailed SMART attributes (SATA
and NVMe), propagate them through the full data pipeline, persist them
as time-series metrics, and display them in an interactive disk detail
drawer with historical sparkline charts.

Backend: add SMARTAttributes struct, writeSMARTMetrics for persistent
storage, "disk" resource type in metrics API with live fallback.
Frontend: enhanced DiskList with Power-On column and SMART warnings,
new DiskDetail drawer matching NodeDrawer styling patterns, generic
HistoryChart metric support with proper tooltip formatting.
2026-02-04 13:35:40 +00:00
rcourtman
8951b6f7f9 Require monitoring scope for socket.io 2026-02-04 12:41:12 +00:00
rcourtman
5c1487e406 feat: add resource picker and multi-resource report generation
Replace manual resource ID entry with a searchable, filterable resource
picker that uses live WebSocket state. Support selecting multiple
resources (up to 50) for combined fleet reports.

Multi-resource PDFs include a cover page, fleet summary table with
aggregate health status, and condensed per-resource detail pages with
overlaid CPU/memory charts. Multi-resource CSVs include a summary
section followed by interleaved time-series data with resource columns.

New POST /api/admin/reports/generate-multi endpoint handles multi-resource
requests while the existing single-resource GET endpoint remains unchanged.

Also fixes resource ID validation regex to allow colons used in
VM/container IDs (e.g., "instance:node:vmid").
2026-02-04 10:24:23 +00:00
rcourtman
6059759958 feat: Add sparkline support for unified host agents on hosts page
Backend:
- Add HostData field to ChartResponse struct in types.go
- Add host data processing in /api/charts endpoint using 'host:' prefix key
- Include hosts count in debug logging for chart responses

Frontend:
- Add 'host' to MetricResourceKind type in metricsKeys.ts
- Add hostData field to ChartsResponse interface in charts.ts
- Process hostData in seedFromBackend() in metricsHistory.ts
- Pass resourceId to EnhancedCPUBar and StackedMemoryBar in HostsOverview.tsx
- Add '7d' and '30d' to TIME_RANGE_OPTIONS in metricsViewMode.ts

This enables sparkline trend visualization for unified host agents,
consistent with Proxmox guests. Data accumulates over time at 30s intervals.
2026-02-03 22:59:55 +00:00
rcourtman
5a990dd554 Fix sparkline data inconsistency and support 30d range 2026-02-03 22:39:50 +00:00
rcourtman
b7a94bad9f security: fix websocket scope and agent impersonation
1. Enforce monitoring:read scope on WebSocket upgrades
   - Prevents low-privilege tokens (e.g. host-agent:report) from accessing
     full infra state via requestData on the main WebSocket.

2. Enforce agent token binding to prevent impersonation
   - Added Metadata field to APITokenRecord to support bound_agent_id
   - Updated agentexec server to validate token-to-agent binding if present
   - Prevents agent:exec tokens from registering as arbitrary agent IDs
2026-02-03 20:40:08 +00:00
rcourtman
0dfe3d16b3 security: secure socket.io, test-notification, and stats endpoints
1. Secure /socket.io/ endpoint
   - Previously allowed unauthenticated WebSocket upgrades via transport=websocket
   - Now enforces CheckAuth() before upgrade

2. Secure /api/test-notification
   - Previously unauthenticated and allowed broadcasting to all clients
   - Now requires Admin + settings:write scope

3. Secure /simple-stats
   - Added authentication requirement (was public)
2026-02-03 20:08:16 +00:00
rcourtman
dd47cbe5b4 security: fix host token binding, AI findings scope, and DLQ credential exposure
1. Host agent link/unlink/delete now require settings:write scope
   - Prevents compromised host-agent:manage tokens from manipulating
     or deleting unrelated hosts
   - Host tokens scoped to one host can no longer affect other hosts

2. AI investigation endpoints now require ai:execute scope
   - /api/ai/findings/* was only protected by RequireAuth
   - Low-privilege tokens could read investigation details and chat logs

3. Notification DLQ endpoints now require settings:read/write scope
   - DLQ entries contain notification configs (webhooks, SMTP, etc.)
   - Prevents monitoring:read tokens from reading credential data
   - DLQ retry/delete operations require settings:write
2026-02-03 19:59:46 +00:00
rcourtman
fdc99418d6 security: add authentication to /api/security/apply-restart endpoint
CRITICAL FIX: This endpoint previously allowed unauthenticated users to
trigger service restarts, which is a denial-of-service vulnerability.

Now requires:
- Authentication (CheckAuth) when auth is configured
- Admin role for proxy auth users
- settings:write scope for API tokens

Initial setup (no auth configured yet) remains accessible to allow
first-time security configuration to trigger restart.
2026-02-03 19:55:29 +00:00
rcourtman
832fda6c96 security: add scope checks to alerts, AI models, patrol status/stream, and remaining AI endpoints
- /api/alerts/* now requires monitoring:read scope
- /api/ai/models now requires ai:chat scope
- /api/ai/patrol/status and /api/ai/patrol/stream now require ai:execute scope
- /api/ai/patrol/findings now requires ai:execute scope
- /api/ai/remediation/* endpoints now require ai:execute scope
- /api/ai/circuit/status now requires ai:execute scope
- /api/ai/incidents/* now requires ai:execute scope
- /api/ai/question/* now requires ai:chat scope
- /api/ai/agents now requires ai:execute scope
- /api/ai/cost/summary now requires settings:read scope
2026-02-03 19:48:43 +00:00
rcourtman
c295ee277f security: add scope checks to AI endpoints and mitigate CSWSH
- AI Intelligence endpoints (/api/ai/intelligence/*, /api/ai/forecast/*,
  /api/ai/unified/findings, etc.) now require ai:execute scope to prevent
  low-privilege tokens from reading sensitive intelligence data

- AI Knowledge endpoints (/api/ai/knowledge/*) now require ai:chat scope
  to prevent arbitrary guest data access across the fleet

- AI Debug Context (/api/ai/debug/context) now requires settings:read scope
  to prevent system prompt and infrastructure details leakage

- WebSocket origin check now validates peer IP is private when allowing
  private network origins, mitigating CSWSH attacks where a malicious page
  on the same LAN tries to hijack connections using victim's session cookie
2026-02-03 19:40:46 +00:00
rcourtman
2ebe65bbc5 security: add scope checks to AI Patrol and agent profile endpoints
- AI Patrol mutation endpoints (acknowledge, dismiss, suppress, snooze, resolve,
  findings/note, suppressions/*) now require ai:execute scope to prevent
  low-privilege tokens from blinding patrol by hiding/suppressing findings

- Agent profile admin endpoints (/api/admin/profiles/*) now require
  settings:write scope to prevent low-privilege tokens from modifying
  fleet-wide agent behavior
2026-02-03 19:29:56 +00:00
rcourtman
69e3286e5e security: fix AI OAuth scope bypass, approval replay attacks, and approval endpoint scope gating
- OAuth endpoints now require settings:write scope (not just admin)
- Approval endpoints now require ai:execute scope
- Added CommandHash to approvals for replay protection
- Approvals are now single-use (consumed on first use)
- consumeApprovalWithValidation validates command matches approval
2026-02-03 19:15:15 +00:00
rcourtman
43c696896f security: fix high severity authz issues (AI chat, patrol autonomy, discovery, host config) 2026-02-03 19:00:56 +00:00
rcourtman
225da6eb39 security: strengthen public URL capture to enforce scope and admin checks 2026-02-03 18:49:42 +00:00
rcourtman
83382ee251 security: enforce scope checks on admin diagnostics endpoint 2026-02-03 18:44:55 +00:00
rcourtman
60f9e6f07f security: fix multiple vulnerabilities (SAML, SSRF, Auth)
Addressed several security findings:
- SAML: Sanitized RelayState to prevent open redirects
- SAML: Fixed logout to properly invalidate server-side sessions
- Auth: Added auth, rate limiting, and logout checks to password change endpoint
- AI: Added admin/scope gating (ai:execute) for command execution
- AI: Blocked private IP ranges in fetch_url to prevent SSRF
- Config: Enforced settings:read/write scopes for export/import
- Agent: Added agent:exec scope requirement for WebSockets
2026-02-03 18:39:15 +00:00
rcourtman
d716bbfdeb fix(security): add proper authorization to sensitive endpoints
- /api/agent-install-command: require admin + settings:write scope
  Previously only RequireAuth, allowing any authenticated user to mint
  high-privilege API tokens (host-agent:manage)

- /api/system/ssh-config: require settings:write scope
  Previously any authenticated token could modify ~/.ssh/config

- /api/system/verify-temperature-ssh: require settings:write scope
  Previously any authenticated token could trigger SSH connection
  attempts to arbitrary nodes (network scanning risk)

- /api/diagnostics: require admin privileges
  Previously exposed API token metadata (IDs, hints, usage mapping)
  to any authenticated token, enabling enumeration attacks
2026-02-03 17:47:40 +00:00
rcourtman
12a5a98117 fix: SSE race conditions, alert user spoofing, and security status oracle
SSE Broadcaster:
- Add per-client mutex to prevent concurrent writes to ResponseWriter
- Fix data race in cleanupLoop reading LastActive without synchronization
- Update LastActive in SendHeartbeat so clients aren't incorrectly pruned
  after 5 minutes of idle heartbeat traffic

Alert Acknowledgements:
- Extract authenticated user from X-Authenticated-User header instead of
  hardcoding 'admin' or trusting request body's User field
- Prevents audit log spoofing and ensures accurate user attribution

Security Status Endpoint:
- Remove ?token= query param validation from public /api/security/status
- Prevents endpoint from acting as a token validity oracle for attackers
- Authentication still works via session cookies and X-API-Token header
2026-02-03 17:40:58 +00:00
rcourtman
beae4c860c fix: address 6 security and reliability issues
Security fixes:
- Auto-register now requires settings:write scope for API tokens
- X-Forwarded-For in auto-register only trusted from verified proxies
- Public URL capture requires authentication (no loopback bypass)
- Lockout reset now uses RequireAdmin for session users

Reliability fixes:
- Docker stop command expiration clears PendingUninstall flag
- Cancelled notifications get completed_at set and are cleaned up
2026-02-03 17:32:44 +00:00
rcourtman
bd030c7c87 security: fix webhook SSRF, rate limit spoofing, metrics retention, and url poisoning
- Fix SSRF and rate limit bypass in SendEnhancedWebhook by validating the rendered URL.
- Fix rate limit spoofing in updates API by using secure IP extraction (trusted proxies).
- Fix memory leak in metrics history by correctly clearing fully stale data series.
- Fix public URL poisoning by preventing overwrites when explicitly configured.
2026-02-03 16:58:13 +00:00
rcourtman
4f40c3d751 fix: resolve critical stability and auth issues
- Fix data race in webhook notifications by removing shared state
- Fix duplicate monitors on config reload by stopping old instances
- Prevent metrics ID deletion on transient startup errors
- Support Bearer auth header for config export/import endpoints
2026-02-03 16:46:27 +00:00
rcourtman
bea3bbe5f6 Fix API token authentication and multi-tenancy logic
- Fix AuthContextMiddleware to use tenant-specific config for token validation

- Resolve data race in token LastUsedAt update

- Fix invalid org IDs returning 501/402 instead of 400

- Prevent unauthenticated organization directory creation (DoS protection)
2026-02-03 16:24:28 +00:00
rcourtman
88d95f40be feat: add Discovery Transparency & Trust features
- Add AI provider indicator showing local (Ollama) vs cloud (Anthropic/OpenAI) analysis
- Add "What Discovery Does" explanation section before first scan
- Show commands preview before scan so users know what will run
- Add scan details section showing raw command outputs for admins
- Filter sensitive Docker labels (passwords, secrets, tokens) before AI analysis
- Add comprehensive tests for label filtering

This improves sysadmin confidence by making discovery transparent about
what it does, what data it collects, and where that data goes.
2026-02-03 14:59:27 +00:00
rcourtman
c2ed6067f1 Fix: discovery routing, host identification, and UX feedback
- Fix routing for POST/PUT/DELETE on /api/discovery/host/ endpoints
  (Go's http.ServeMux was matching the longer prefix before method-specific routes)
- Add HOST-specific AI prompt that focuses on identifying the host OS
  rather than services/containers running on it
- Add success message UI after discovery completes
- Fix timing so success appears after data is visible (not during refetch)
- Add error handling and display for failed discoveries
2026-02-03 14:10:54 +00:00
rcourtman
eb2d07e48f Chore: enhance core api and metrics testability
Refactor Router to allow HTTP client injection for install script proxying. Add tests for unified agent install mechanism and additional metrics store coverage.
2026-02-02 22:01:36 +00:00
rcourtman
d1f76982ec fix: finding drawer actions (notes persist, acknowledge visual, discuss context)
- Sync UserNote, AcknowledgedAt, SnoozedUntil, DismissedReason, Suppressed,
  and TimesRaised from ai.Finding to unified store in both callback and
  startup sync paths. Mirror note writes to unified store immediately.
- Dim acknowledged findings (opacity-60), add "Acknowledged" badge, hide
  acknowledge button once acknowledged, sort below unacknowledged in
  severity mode.
- Pass finding_id through frontend chat API → backend ChatRequest →
  ExecuteRequest. Look up full finding from unified store (mutex-guarded)
  and prepend structured context to the prompt.
2026-02-02 15:18:51 +00:00
rcourtman
44f9a36d5c feat(license): implement free Patrol / pro Auto-Fix tiering strategy 2026-02-01 16:27:10 +00:00
rcourtman
95a0d7a6bd feat(backend): implement AI Patrol, Investigation, and system-wide refactors 2026-01-30 19:02:14 +00:00
rcourtman
03b5586ac8 refactor(ai): update patrol and service to use chat service adapter
- Update patrol.go to use chat service for AI execution
- Update service.go with chat service provider integration
- Add patrol streaming endpoint to router
2026-01-28 21:24:34 +00:00
rcourtman
9dcd859056 Update API handlers for AI and discovery endpoints
API layer updates:

AI Handlers:
- Better streaming response handling
- Improved error responses
- Session management improvements

Discovery Handlers:
- New discovery endpoint handlers
- Storage config handler
- Better router organization

Removed deprecated aidiscovery handlers in favor of unified approach.
2026-01-28 16:51:35 +00:00
rcourtman
7f7edfceb4 test: expand backend coverage 2026-01-25 21:08:44 +00:00
rcourtman
9072b8eaa8 feat: enhance API router with multi-tenant authorization
Router & Middleware:
- Add auth context middleware for user/token extraction
- Add tenant middleware with authorization checking
- Refactor middleware chain ordering for proper isolation
- Add router helpers for common patterns

Authentication & SSO:
- Enhance auth with tenant-aware context
- Update OIDC, SAML, and SSO handlers for multi-tenant
- Add RBAC handler improvements
- Add security enhancements

New Test Coverage:
- API foundation tests
- Auth and authorization tests
- Router state and general tests
- SSO handler CRUD tests
- WebSocket isolation tests
- Resource handler tests
2026-01-24 22:42:23 +00:00
rcourtman
8bf31214f5 feat: enhance API handlers and router with new endpoints
- Add new AI handler endpoints
- Enhance diagnostics API
- Improve router configuration
2026-01-22 22:31:24 +00:00
rcourtman
f2541b0d6c Refactor: Multi-tenancy support for API and License handlers
- Updated LicenseHandlers and LicenseService to be context/tenant aware
- Refactored API router and middleware to support tenant-scoped license checks
- Updated associated tests for context-aware handlers
2026-01-22 16:42:39 +00:00
rcourtman
289d95374f feat: add multi-tenancy foundation (directory-per-tenant)
Implements Phase 1-2 of multi-tenancy support using a directory-per-tenant
strategy that preserves existing file-based persistence.

Key changes:
- Add MultiTenantPersistence manager for org-scoped config routing
- Add TenantMiddleware for X-Pulse-Org-ID header extraction and context propagation
- Add MultiTenantMonitor for per-tenant monitor lifecycle management
- Refactor handlers (ConfigHandlers, AlertHandlers, AIHandlers, etc.) to be
  context-aware with getConfig(ctx)/getMonitor(ctx) helpers
- Add Organization model for future tenant metadata
- Update server and router to wire multi-tenant components

All handlers maintain backward compatibility via legacy field fallbacks
for single-tenant deployments using the "default" org.
2026-01-22 13:39:06 +00:00
rcourtman
a55bdb7a3a feat(api): security and metrics history improvements
- Require admin + settings:write scope for setup-script-url endpoint
- Add license enforcement for long-term metrics (30d/90d require Pro)
- Add downsampling step calculation for metrics history queries
- Add isContainerSSHRestricted helper for SSH restriction checks
- Clean up temperature proxy references from config handlers
- Minor OIDC and rate limit improvements
2026-01-22 00:44:12 +00:00
rcourtman
d306e02151 fix: remove unused imports and obsolete tests in API handlers
- diagnostics.go: remove unused path/filepath and syscall imports
- router.go: remove unused errors import
- diagnostics_test.go: remove tests for deleted functions
  (normalizeHostForComparison, matchInstanceNameByHost)

These changes fix build errors after sensor proxy removal.
2026-01-21 11:59:41 +00:00
rcourtman
ecc31730f6 Remove OpenCode references 2026-01-20 16:56:41 +00:00
rcourtman
ffb8928dbf refactor(api): Update handlers for native AI chat service
Adapts API handlers to use the new native chat service:

ai_handler.go:
- Replace opencode.Service with chat.Service
- Add AIService interface for testability
- Add factory function for service creation (mockable)
- Update provider wiring to use tools package types

ai_handlers.go:
- Add Notable field to model list response
- Simplify command approval - execution handled by agentic loop
- Remove inline command execution from approval endpoint

router.go:
- Update imports: mcp -> tools, opencode -> chat
- Add monitor wrapper types for cleaner dependency injection
- Update patrol wiring for new chat service

agent_profiles:
- Rename agent_profiles_mcp.go -> agent_profiles_tools.go
- Update imports for tools package

monitor_wrappers.go:
- New file with wrapper types for alert/notification monitors
- Enables interface-based dependency injection
2026-01-19 19:20:00 +00:00
rcourtman
432f13b6f5 feat(ai): add Docker update management MCP tools
Add three new MCP tools for Docker container update management:
- pulse_list_docker_updates: list containers with pending updates
- pulse_check_docker_updates: trigger update check on a host
- pulse_update_docker_container: apply update with approval workflow

Changes:
- Add UpdatesProvider interface to executor.go
- Add response types to data_types.go
- Add UpdatesMCPAdapter to adapters.go
- Register tools and handlers in tools_infrastructure.go
- Add SetUpdatesProvider() to service.go
- Wire provider in router.go wireOpenCodeProviders()
2026-01-17 15:47:36 +00:00
rcourtman
4cea85ec97 feat(mcp): expand MCP tools and add session management APIs
New API endpoints:
- POST /api/ai/sessions/{id}/summarize - Compress context
- GET /api/ai/sessions/{id}/diff - Get file changes
- POST /api/ai/sessions/{id}/fork - Branch conversation
- POST /api/ai/sessions/{id}/revert - Undo changes
- POST /api/ai/sessions/{id}/unrevert - Restore reverted changes

MCP provider wiring:
- Storage, backup, disk health providers
- Metrics history, baseline, pattern detection
- Findings manager and metadata updater

Tool improvements:
- pulse_get_topology: Unified infrastructure view
- Improved tool descriptions with usage examples
- Better license checking with logging
2026-01-17 14:43:58 +00:00
rcourtman
035436ad6e fix: add mutex to prevent concurrent map writes in Docker agent CPU tracking
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.

Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()

Related to #1063
2026-01-15 21:10:55 +00:00
rcourtman
8c7581d32c feat(profiles): add AI-assisted profile suggestions
Add ability for users to describe what kind of agent profile they need
in natural language, and have AI generate a suggestion with name,
description, config values, and rationale.

- Add ProfileSuggestionHandler with schema-aware prompting
- Add SuggestProfileModal component with example prompts
- Update AgentProfilesPanel with suggest button and description field
- Streamline ValidConfigKeys to only agent-supported settings
- Update profile validation tests for simplified schema
2026-01-15 13:24:18 +00:00