Backend:
- Enhanced buildEnrichedResourceContext to ALWAYS show learned baselines with
status indicators (normal/elevated/anomaly) instead of only when anomalous
- This makes Pulse Pro's 'moat' visible - users can see the AI understands
their infrastructure's normal behavior patterns
- Added baseline import to service.go
Frontend (user changes):
- Added incident event type filtering with toggle buttons
- Added resource incident panel to view all incidents for a resource
- Added timeline expand/collapse functionality in alert history
- Added incident note saving with proper incidentId tracking
- Added startedAt parameter for proper incident timeline loading
Addresses issue #861 - syslog flooded on docker host
Many routine operational messages were being logged at INFO level,
causing excessive log volume when monitoring multiple VMs/containers.
These messages are now logged at DEBUG level:
- Guest threshold checking (every guest, every poll cycle)
- Storage threshold checking (every storage, every poll cycle)
- Host agent linking messages
- Filesystem inclusion in disk calculation
- Guest agent disk usage replacement
- Polling start/completion messages
- Alert cleanup and save messages
Users can set LOG_LEVEL=debug to see these messages if needed for
troubleshooting. The default INFO level now produces significantly
less log output.
Also updated documentation in CONFIGURATION.md and DOCKER.md to:
- Clarify what each log level includes
- Add tip about using LOG_LEVEL=warn for minimal logging
The issue was a SolidJS reactivity problem in the Dashboard component.
When guestMetadata signal was accessed inside a For loop callback and
assigned to a plain variable, SolidJS lost reactive tracking.
Changed from:
const metadata = guestMetadata()[guestId] || ...
customUrl={metadata?.customUrl}
To:
const getMetadata = () => guestMetadata()[guestId] || ...
customUrl={getMetadata()?.customUrl}
This ensures SolidJS properly tracks the signal dependency when the
getter function is called directly in JSX props.
- Install script now auto-detects Docker, Kubernetes, and Proxmox
- Platform monitoring is enabled automatically when detected
- Users can override with --disable-* or --enable-* flags
- Allow same token to register multiple hosts (one per hostname)
- Update tests to reflect new multi-host token behavior
- Improve CompleteStep and UnifiedAgents UI components
- Update UNIFIED_AGENT.md documentation
- Add cluster-aware guest ID generation (clusterName-VMID instead of instanceName-VMID)
to prevent duplicate VMs/containers when multiple cluster nodes are monitored
- Add cluster deduplication at registration time - when a node is added that belongs
to an already-configured cluster, merge as endpoint instead of creating duplicate
- Add startup consolidation to automatically merge duplicate cluster instances
- Change host agent token binding from agent GUID to hostname, allowing:
- Multiple host agents to share a token (each bound by hostname)
- Agent reinstalls on same host without token conflicts
- Remove 12-character password minimum requirement
- Remove emoji from auto-registration success message
- Fix grouped view node lookup to support both cluster-aware node IDs
(clusterName-nodeName) and legacy guest grouping keys (instance-nodeName)
Fixes duplicate guests appearing when agents are installed on multiple
cluster nodes. Also improves multi-agent UX by allowing shared tokens.
When a host agent registers, it now searches for a PVE node with a
matching hostname and links them together. Similarly, when PVE nodes
are discovered, they check for existing host agents with matching hostnames.
This prevents the confusion of seeing duplicate entries when users install
agents on PVE cluster nodes that were already discovered via the cluster API.
- Added LinkedHostAgentID field to Node struct
- Added LinkedNodeID/LinkedVMID/LinkedContainerID fields to Host struct
- Added findLinkedProxmoxEntity() to match by hostname (with domain stripping)
- Updated UpdateNodesForInstance() to preserve and auto-set links
When no auth is configured (fresh install), CheckAuth allows all requests.
This creates a race condition where existing agents from a previous setup
can report data before the wizard completes security configuration.
This fix clears all host agents and docker hosts when /api/security/quick-setup
is called, ensuring the wizard shows a clean state after security is configured.
Added:
- State.ClearAllHosts() - removes all host agents
- State.ClearAllDockerHosts() - removes all docker hosts
- Monitor.ClearUnauthenticatedAgents() - clears both and resets token bindings
- Call to ClearUnauthenticatedAgents() in handleQuickSecuritySetupFixed()
- Add MetricsRetentionRawHours, MetricsRetentionMinuteHours, MetricsRetentionHourlyDays, MetricsRetentionDailyDays to SystemSettings
- Wire settings from system.json through Config to metrics store initialization
- Set sensible defaults: Raw=2h, Minute=24h, Hourly=7d, Daily=90d
- Log active retention values on startup for transparency
Users can now customize how long metrics are stored at each aggregation tier.
- Add sortable table headers for Pod and Deployment views
- Click column headers to toggle sort direction
- Sort state persists across sessions
- Add namespace dropdown filter for Pods/Deployments views
- Auto-populates from available namespaces
- Include namespace filter in reset and active filters check
Backend:
- Seed OCI classification from previous state so containers never
'downgrade' to LXC if config fetching intermittently fails
- Prevent type regression in recordGuestSnapshot when OCI was previously detected
- Move metrics zeroing before snapshot recording for cleaner flow
Frontend:
- Add isOCIContainer() memo that checks both type and isOci flag
- Use isOCI helper in Dashboard.tsx for AI context building
- Include oci-container type in useResources container conversion
- Preserve isOci and osTemplate fields through legacy conversion
This ensures OCI containers retain their classification even when
Proxmox API permissions or transient errors prevent config reads.
- Refactored enrichContainerMetadata to not return early when container is stopped
- Status API calls are still skipped for stopped containers (as expected)
- Config fetch now runs regardless of status, enabling OCI detection
- Added test for OCI detection on stopped containers
Discovered: Proxmox 9.1 requires VM.Config.Options permission to read
OCI container configs (not just VM.Audit). Document this in setup guides.
- Added isOCIContainerByConfig() to detect OCI containers by:
- Presence of 'entrypoint' field (only OCI containers have this)
- Combination of ostype=unmanaged, cmode=console, and lxc.signal.halt
- This is needed because Proxmox doesn't persist ostemplate after creation
- Now supports detection of already-created OCI containers (like the test alpine container)
- Backend: Add IsOCI and OSTemplate fields to Container model
- Backend: Add extractContainerOSTemplate() and isOCITemplate() detection functions
- Backend: Detect OCI containers via ostemplate config and set type to 'oci'
- Frontend: Add isOci and osTemplate to Container interface
- Frontend: Add 'oci-container' to ResourceType with distinct purple badge
- Frontend: Update Dashboard filters to include OCI containers with LXC
- Tests: Add comprehensive unit tests for OCI detection logic
OCI containers are detected by checking the ostemplate for patterns like:
- oci: prefix (e.g., oci:docker.io/library/alpine:latest)
- docker: prefix (e.g., docker:nginx:latest)
- Known registry URLs (docker.io, ghcr.io, gcr.io, quay.io, etc.)
- Local templates with oci- or oci_ filename patterns
- Add DOMPurify sanitization for AI chat markdown rendering (XSS fix)
- Configure DOMPurify to add target=_blank and rel=noopener to links
- Update system prompt to align with command approval policy
- Clarify safe vs destructive commands in prompt
- Improve patrol auto-fix mode guidance with safe operation list
- Add verification requirements for auto-fix actions
- Update observe-only mode to be clearer about read-only restrictions
Phase 1 of Pulse AI differentiation:
- Create internal/ai/context package with types, trends, builder, formatter
- Implement linear regression for trend computation (growing/declining/stable/volatile)
- Add storage capacity predictions (predicts days until 90% and 100%)
- Wire MetricsHistory from monitor to patrol service
- Update patrol to use buildEnrichedContext instead of basic summary
- Update patrol prompt to reference trend indicators and predictions
This gives the AI awareness of historical patterns, enabling it to:
- Identify resources with concerning growth rates
- Predict capacity exhaustion before it happens
- Distinguish between stable high usage vs growing problems
- Provide more actionable, time-aware insights
All tests passing. Falls back to basic summary if metrics history unavailable.
- Add 'content' type to StreamDisplayEvent for tracking text chunks
- Track content events in streamEvents array for chronological display
- Update render to use Switch/Match for cleaner conditional rendering
- Interleave thinking, tool calls, and content as they stream in
- Add fallback for old messages without streamEvents for backwards compat
Previously, tool/command outputs stayed at top while AI text responses
accumulated at the bottom. Now all events appear in order like a
normal chatbot.
- Add alert-triggered AI analysis for real-time incident response
- Implement patrol history persistence across restarts
- Add patrol schedule configuration UI in AI Settings
- Enhance AIChat with patrol status and manual trigger controls
- Add resource store improvements for AI context building
- Expand Alerts page with AI-powered analysis integration
- Add Vite proxy config for AI API endpoints
- Support both Anthropic and OpenAI providers with streaming
- Add Claude OAuth authentication support with hybrid API key/OAuth flow
- Implement Docker container historical metrics in backend and charts API
- Add CEPH cluster data collection and new Ceph page
- Enhance RAID status display with detailed tooltips and visual indicators
- Fix host deduplication logic with Docker bridge IP filtering
- Fix NVMe temperature collection in host agent
- Add comprehensive test coverage for new features
- Improve frontend sparklines and metrics history handling
- Fix navigation issues and frontend reload loops
Backend:
- Call SetMonitor after router creation to inject resource store
- Add debug logging for resource population and broadcast
Frontend:
- Add resources array to WebSocket store initial state
- Handle resources in WebSocket message processing
- Use reconcile for efficient state updates
The unified resources are now properly:
1. Populated from StateSnapshot on each broadcast cycle
2. Converted to frontend format (ResourceFrontend)
3. Included in WebSocket state messages
4. Received and stored in frontend state
5. Consumed by migrated route components
Console now shows '[DashboardView] Using unified resources: VMs: X'
confirming the migration is working end-to-end.
- Added PopulateFromSnapshot method to resources.Store
- Extended ResourceStoreInterface to include PopulateFromSnapshot
- Monitor now calls updateResourceStore before broadcasts
- This ensures resources are fresh on every WebSocket broadcast
Without this, the store would only be populated when /api/resources or
/api/state endpoints are hit, leaving WebSocket broadcasts empty.
- Extended StateFrontend with Resources field containing unified resource data
- Added ResourceFrontend and related types for frontend-compatible resource data
- Extended ResourceStoreInterface to include GetAll() method
- Monitor now injects resources into WebSocket broadcasts
- Added helper method getResourcesForBroadcast() to convert resources to frontend format
- All existing tests pass
This enables the frontend to access unified resources via WebSocket state.
- Removed /resources page and associated frontend components
- Removed ResourcesOverview.tsx, UnifiedResourceRow.tsx, columns.ts
- Removed frontend types/resource.ts
- Updated unified-resource-architecture.md to mark Phase 4 as ABANDONED
- Removed unified-view-migration-plan.md
- Backend unified resource model remains for AI context
This is a checkpoint before attempting full frontend migration to unified model.
This commit implements the Unified Resource Architecture for AI-first
infrastructure management. Key features:
Phase 1 - Backend Unification:
- New unified Resource type with 9 resource types, 7 platforms, 7 statuses
- Resource store with identity-based deduplication (hostname, machineID, IP)
- 8 converter functions (FromNode, FromVM, FromContainer, etc.)
- REST API endpoints: /api/resources, /api/resources/stats, /api/resources/{id}
- 28 comprehensive unit tests
Phase 2 - AI Context Enhancement:
- Unified context builder for AI system prompts
- Cross-platform query methods: GetTopByCPU, GetTopByMemory, GetTopByDisk
- Resource correlation: GetRelated (parent, children, siblings, cluster)
- Infrastructure summary: GetResourceSummary with health status counts
- AI context now includes top consumers and infrastructure overview
Phase 3 - Agent Preference & Hybrid Mode:
- Polling optimization methods in resource store
- ResourceStoreInterface added to Monitor
- SetResourceStore() and shouldSkipNodeMetrics() helper methods
- Store automatically wired into Monitor via Router.SetMonitor()
- Foundation ready for reduced API polling when agents are active
Files added:
- internal/resources/resource.go - Core Resource type
- internal/resources/store.go - Store with deduplication
- internal/resources/converters.go - Type converters
- internal/resources/platform_data.go - Platform-specific data
- internal/resources/store_test.go - 28 tests
- internal/resources/converters_test.go - Converter tests
- internal/api/resource_handlers.go - REST API handlers
- internal/ai/resource_context.go - AI context builder
- .gemini/docs/unified-resource-architecture.md - Architecture docs
All tests pass.
- Implement 'Show Problems Only' toggle combining degraded status, high CPU/memory alerts, and needs backup filters
- Add 'Investigate with AI' button to filter bar for problematic guests
- Fix dashboard column sizing inconsistencies between bars and sparklines view modes
- Fix PBS backups display and polling
- Refine AI prompt for general-purpose usage
- Fix frontend flickering and reload loops during initial load
- Integrate persistent SQLite metrics store with Monitor
- Fortify AI command routing with improved validation and logging
- Fix CSRF token handling for note deletion
- Debug and fix AI command execution issues
- Various AI reliability improvements and command safety enhancements
- Extract ostype from LXC container config (debian, ubuntu, alpine, etc.)
- Map ostype values to human-readable names (e.g., "debian" -> "Debian")
- Add OSName field to Container model and ContainerFrontend
- Add icons for NixOS, openSUSE, and Gentoo in frontend
- LXC containers now show OS icons alongside VMs in the dashboard
Supported LXC OS types: alpine, archlinux, centos, debian, devuan,
fedora, gentoo, nixos, opensuse, ubuntu, unmanaged
Refactor the test to avoid unreachable code after panic by
checking a flag set before the panic instead of after. This
resolves the go vet warning while maintaining test coverage.
Cache err.Error() result in two locations:
- monitor.go: storage query retry logic (2x calls to 1)
- monitor_polling.go: storage timeout handling (2x calls to 1)
strconv.Itoa is faster than fmt.Sprintf("%d", ...) because it doesn't
need to parse a format string. Changed 4 occurrences in monitoring
package where integers are converted to strings.
Storage that is disabled in Proxmox (Datacenter > Storage > enabled=no)
was incorrectly showing as "available" in Pulse. The issue was that
Enabled/Active fields defaulted to true and were never set to false
from the per-node API response.
Now the model correctly initializes Enabled/Active from the Proxmox
per-node storage API response, and the status determination prioritizes
checking the disabled state first.
Related to #796
- errors.isRetryable: Convert final case to default (all ErrorType values covered)
- scheduler.SelectInterval: Remove bounds checks after mathematical computation
that guarantees target ∈ [min, max]
Both functions now at 100% coverage.
- firstForwardedValue: strings.Split always returns at least one element
- shouldRunBackupPoll: remaining is always >= 1 by math
- convertContainerDiskInfo: lowerLabel is never empty for non-rootfs
All three functions now at 100% coverage.
Add 6 tests for instance descriptor generation:
- No clients returns nil
- PVE only with sorted order
- PBS only
- PMG only
- All types combined
- Nil scheduler/stalenessTracker safety
Coverage: 60.7% → 62.5%