The in-memory metrics buffer was changed from 1000 to 86400 points per
metric to support 30-day sparklines, but this pre-allocated ~18 MB per
guest (7 slices × 86400 × 32 bytes). With 50 guests that's 920 MB —
explaining why users needed to double their LXC memory after upgrading
to 5.1.0.
- Revert in-memory buffer to 1000 points / 24h retention
- Remove eager slice pre-allocation (use append growth instead)
- Add LTTB (Largest Triangle Three Buckets) downsampling algorithm
- Chart endpoints now use a two-tier strategy: in-memory for ranges
≤ 2h, SQLite persistent store + LTTB for longer ranges
- Reduce frontend ring buffer from 86400 to 2000 points
Related to #1190
Host network sparklines were displaying wildly incorrect values (e.g., 147 GB/s
for an idle Raspberry Pi) because cumulative byte counters (total bytes since
boot) were being stored directly instead of being converted to rates.
Changes:
- monitor.go: Use RateTracker to calculate network rates for hosts, matching
the existing pattern used for VMs and containers. Only record network
metrics when we have enough samples to calculate valid rates.
- router.go: Remove network metrics from live fallback for hosts since we
can't calculate rates from a single snapshot. Better to show nothing than
misleading cumulative totals.
The fix follows the established codebase pattern where:
1. Agent reports cumulative RXBytes/TXBytes
2. RateTracker compares consecutive samples to calculate bytes/second
3. Rates are stored in metrics history for sparkline display
Enrich the patrol seed context with service identity (from discovery
store) and network reachability (via ICMP ping through host agents).
The guest metrics table now includes Service and Reachable columns,
and a Service Health Issues section highlights running-but-unreachable
guests. A new SignalGuestUnreachable signal type creates deterministic
findings for unreachable guests.
New files:
- patrol_intelligence.go: GuestProber interface, GuestIntelligence
type, gatherGuestIntelligence() with concurrent per-node probing
- patrol_prober.go: agentExecProber implementation using batch ping
commands via connected host agents
Expand the smartctl collector to capture detailed SMART attributes (SATA
and NVMe), propagate them through the full data pipeline, persist them
as time-series metrics, and display them in an interactive disk detail
drawer with historical sparkline charts.
Backend: add SMARTAttributes struct, writeSMARTMetrics for persistent
storage, "disk" resource type in metrics API with live fallback.
Frontend: enhanced DiskList with Power-On column and SMART warnings,
new DiskDetail drawer matching NodeDrawer styling patterns, generic
HistoryChart metric support with proper tooltip formatting.
Replace manual resource ID entry with a searchable, filterable resource
picker that uses live WebSocket state. Support selecting multiple
resources (up to 50) for combined fleet reports.
Multi-resource PDFs include a cover page, fleet summary table with
aggregate health status, and condensed per-resource detail pages with
overlaid CPU/memory charts. Multi-resource CSVs include a summary
section followed by interleaved time-series data with resource columns.
New POST /api/admin/reports/generate-multi endpoint handles multi-resource
requests while the existing single-resource GET endpoint remains unchanged.
Also fixes resource ID validation regex to allow colons used in
VM/container IDs (e.g., "instance:node:vmid").
Backend:
- Add HostData field to ChartResponse struct in types.go
- Add host data processing in /api/charts endpoint using 'host:' prefix key
- Include hosts count in debug logging for chart responses
Frontend:
- Add 'host' to MetricResourceKind type in metricsKeys.ts
- Add hostData field to ChartsResponse interface in charts.ts
- Process hostData in seedFromBackend() in metricsHistory.ts
- Pass resourceId to EnhancedCPUBar and StackedMemoryBar in HostsOverview.tsx
- Add '7d' and '30d' to TIME_RANGE_OPTIONS in metricsViewMode.ts
This enables sparkline trend visualization for unified host agents,
consistent with Proxmox guests. Data accumulates over time at 30s intervals.
1. Enforce monitoring:read scope on WebSocket upgrades
- Prevents low-privilege tokens (e.g. host-agent:report) from accessing
full infra state via requestData on the main WebSocket.
2. Enforce agent token binding to prevent impersonation
- Added Metadata field to APITokenRecord to support bound_agent_id
- Updated agentexec server to validate token-to-agent binding if present
- Prevents agent:exec tokens from registering as arbitrary agent IDs
1. Host agent link/unlink/delete now require settings:write scope
- Prevents compromised host-agent:manage tokens from manipulating
or deleting unrelated hosts
- Host tokens scoped to one host can no longer affect other hosts
2. AI investigation endpoints now require ai:execute scope
- /api/ai/findings/* was only protected by RequireAuth
- Low-privilege tokens could read investigation details and chat logs
3. Notification DLQ endpoints now require settings:read/write scope
- DLQ entries contain notification configs (webhooks, SMTP, etc.)
- Prevents monitoring:read tokens from reading credential data
- DLQ retry/delete operations require settings:write
CRITICAL FIX: This endpoint previously allowed unauthenticated users to
trigger service restarts, which is a denial-of-service vulnerability.
Now requires:
- Authentication (CheckAuth) when auth is configured
- Admin role for proxy auth users
- settings:write scope for API tokens
Initial setup (no auth configured yet) remains accessible to allow
first-time security configuration to trigger restart.
- AI Intelligence endpoints (/api/ai/intelligence/*, /api/ai/forecast/*,
/api/ai/unified/findings, etc.) now require ai:execute scope to prevent
low-privilege tokens from reading sensitive intelligence data
- AI Knowledge endpoints (/api/ai/knowledge/*) now require ai:chat scope
to prevent arbitrary guest data access across the fleet
- AI Debug Context (/api/ai/debug/context) now requires settings:read scope
to prevent system prompt and infrastructure details leakage
- WebSocket origin check now validates peer IP is private when allowing
private network origins, mitigating CSWSH attacks where a malicious page
on the same LAN tries to hijack connections using victim's session cookie
- OAuth endpoints now require settings:write scope (not just admin)
- Approval endpoints now require ai:execute scope
- Added CommandHash to approvals for replay protection
- Approvals are now single-use (consumed on first use)
- consumeApprovalWithValidation validates command matches approval
SSE Broadcaster:
- Add per-client mutex to prevent concurrent writes to ResponseWriter
- Fix data race in cleanupLoop reading LastActive without synchronization
- Update LastActive in SendHeartbeat so clients aren't incorrectly pruned
after 5 minutes of idle heartbeat traffic
Alert Acknowledgements:
- Extract authenticated user from X-Authenticated-User header instead of
hardcoding 'admin' or trusting request body's User field
- Prevents audit log spoofing and ensures accurate user attribution
Security Status Endpoint:
- Remove ?token= query param validation from public /api/security/status
- Prevents endpoint from acting as a token validity oracle for attackers
- Authentication still works via session cookies and X-API-Token header
Security fixes:
- Auto-register now requires settings:write scope for API tokens
- X-Forwarded-For in auto-register only trusted from verified proxies
- Public URL capture requires authentication (no loopback bypass)
- Lockout reset now uses RequireAdmin for session users
Reliability fixes:
- Docker stop command expiration clears PendingUninstall flag
- Cancelled notifications get completed_at set and are cleaned up
- Fix SSRF and rate limit bypass in SendEnhancedWebhook by validating the rendered URL.
- Fix rate limit spoofing in updates API by using secure IP extraction (trusted proxies).
- Fix memory leak in metrics history by correctly clearing fully stale data series.
- Fix public URL poisoning by preventing overwrites when explicitly configured.
- Fix data race in webhook notifications by removing shared state
- Fix duplicate monitors on config reload by stopping old instances
- Prevent metrics ID deletion on transient startup errors
- Support Bearer auth header for config export/import endpoints
- Add AI provider indicator showing local (Ollama) vs cloud (Anthropic/OpenAI) analysis
- Add "What Discovery Does" explanation section before first scan
- Show commands preview before scan so users know what will run
- Add scan details section showing raw command outputs for admins
- Filter sensitive Docker labels (passwords, secrets, tokens) before AI analysis
- Add comprehensive tests for label filtering
This improves sysadmin confidence by making discovery transparent about
what it does, what data it collects, and where that data goes.
- Fix routing for POST/PUT/DELETE on /api/discovery/host/ endpoints
(Go's http.ServeMux was matching the longer prefix before method-specific routes)
- Add HOST-specific AI prompt that focuses on identifying the host OS
rather than services/containers running on it
- Add success message UI after discovery completes
- Fix timing so success appears after data is visible (not during refetch)
- Add error handling and display for failed discoveries
Refactor Router to allow HTTP client injection for install script proxying. Add tests for unified agent install mechanism and additional metrics store coverage.
- Sync UserNote, AcknowledgedAt, SnoozedUntil, DismissedReason, Suppressed,
and TimesRaised from ai.Finding to unified store in both callback and
startup sync paths. Mirror note writes to unified store immediately.
- Dim acknowledged findings (opacity-60), add "Acknowledged" badge, hide
acknowledge button once acknowledged, sort below unacknowledged in
severity mode.
- Pass finding_id through frontend chat API → backend ChatRequest →
ExecuteRequest. Look up full finding from unified store (mutex-guarded)
and prepend structured context to the prompt.
- Update patrol.go to use chat service for AI execution
- Update service.go with chat service provider integration
- Add patrol streaming endpoint to router
- Updated LicenseHandlers and LicenseService to be context/tenant aware
- Refactored API router and middleware to support tenant-scoped license checks
- Updated associated tests for context-aware handlers
Implements Phase 1-2 of multi-tenancy support using a directory-per-tenant
strategy that preserves existing file-based persistence.
Key changes:
- Add MultiTenantPersistence manager for org-scoped config routing
- Add TenantMiddleware for X-Pulse-Org-ID header extraction and context propagation
- Add MultiTenantMonitor for per-tenant monitor lifecycle management
- Refactor handlers (ConfigHandlers, AlertHandlers, AIHandlers, etc.) to be
context-aware with getConfig(ctx)/getMonitor(ctx) helpers
- Add Organization model for future tenant metadata
- Update server and router to wire multi-tenant components
All handlers maintain backward compatibility via legacy field fallbacks
for single-tenant deployments using the "default" org.
Adapts API handlers to use the new native chat service:
ai_handler.go:
- Replace opencode.Service with chat.Service
- Add AIService interface for testability
- Add factory function for service creation (mockable)
- Update provider wiring to use tools package types
ai_handlers.go:
- Add Notable field to model list response
- Simplify command approval - execution handled by agentic loop
- Remove inline command execution from approval endpoint
router.go:
- Update imports: mcp -> tools, opencode -> chat
- Add monitor wrapper types for cleaner dependency injection
- Update patrol wiring for new chat service
agent_profiles:
- Rename agent_profiles_mcp.go -> agent_profiles_tools.go
- Update imports for tools package
monitor_wrappers.go:
- New file with wrapper types for alert/notification monitors
- Enables interface-based dependency injection
Add three new MCP tools for Docker container update management:
- pulse_list_docker_updates: list containers with pending updates
- pulse_check_docker_updates: trigger update check on a host
- pulse_update_docker_container: apply update with approval workflow
Changes:
- Add UpdatesProvider interface to executor.go
- Add response types to data_types.go
- Add UpdatesMCPAdapter to adapters.go
- Register tools and handlers in tools_infrastructure.go
- Add SetUpdatesProvider() to service.go
- Wire provider in router.go wireOpenCodeProviders()
The agent was crashing with 'fatal error: concurrent map writes' when
handleCheckUpdatesCommand spawned a goroutine that called collectOnce
concurrently with the main collection loop. Both code paths access
a.prevContainerCPU without synchronization.
Added a.cpuMu mutex to protect all accesses to prevContainerCPU in:
- pruneStaleCPUSamples()
- collectContainer() delete operation
- calculateContainerCPUPercent()
Related to #1063
Add ability for users to describe what kind of agent profile they need
in natural language, and have AI generate a suggestion with name,
description, config values, and rationale.
- Add ProfileSuggestionHandler with schema-aware prompting
- Add SuggestProfileModal component with example prompts
- Update AgentProfilesPanel with suggest button and description field
- Streamline ValidConfigKeys to only agent-supported settings
- Update profile validation tests for simplified schema