Pulse/ARCHITECTURE.md
2026-03-19 14:47:00 +00:00

186 lines
12 KiB
Markdown

# Pulse Architecture
Pulse is a real-time infrastructure monitoring platform for **Proxmox VE**, **Proxmox Backup Server**, **Proxmox Mail Gateway**, **Docker**, **Host** systems, **Kubernetes**, and **TrueNAS**. It is built with a **Go 1.25+** backend and a **SolidJS / TypeScript** frontend, focusing on low latency, high concurrency, and a premium user experience.
## 🏗 High-Level Overview
The system runs as a single binary that serves both the API and the embedded frontend assets. It connects to infrastructure via platform-specific REST APIs and lightweight push-based agents, normalises everything into a **Unified Resource model**, and delivers real-time updates to clients over WebSocket.
```mermaid
flowchart TD
User[Browser / Mobile] <-->|WebSocket + HTTP| Pulse[Pulse Server]
Mobile[Mobile App] <-->|E2E Encrypted| Relay[Relay Server]
Relay <-->|WebSocket| Pulse
subgraph "Pulse Server (Go)"
API[Decomposed API Router]
WS[WebSocket Hub]
Monitor[Reloadable Monitor]
UR[Unified Resource Registry]
AI[AI Subsystem]
Config[Config Manager]
License[Entitlements Engine]
Audit[Audit Logger]
end
Pulse -->|HTTPS :8006| PVE[Proxmox VE]
Pulse -->|HTTPS :8007| PBS[Proxmox Backup Server]
Pulse -->|HTTPS| PMG[Proxmox Mail Gateway]
Pulse -->|HTTPS| TrueNAS[TrueNAS SCALE/CORE]
DockerAgent[Docker Agent] -->|HTTPS POST| API
HostAgent[Host Agent] -->|HTTPS POST| API
K8sAgent[Kubernetes Agent] -->|HTTPS POST| API
Monitor --> UR
UR --> WS
UR --> API
```
## 🔌 Backend Architecture (Go)
All backend code lives under `cmd/`, `internal/`, and `pkg/`. The binary is assembled from `cmd/pulse/main.go` which delegates to `pkg/server.Run()`.
### Core Components
1. **Entry Point (`cmd/pulse/main.go` → `pkg/server/server.go`)**
- Loads unified configuration via `internal/config.Load()`.
- Initialises the RBAC manager (`pkg/auth`), audit logger (`pkg/audit`), and crypto layer (`internal/crypto`).
- Creates a `ReloadableMonitor` that manages the lifecycle of all monitoring goroutines.
- Starts the HTTP/HTTPS server, WebSocket hub, AI services (Patrol + Chat), and the Relay client.
- Supports graceful hot-reload via `SIGHUP` and `.env` file watching.
2. **Unified Resource Registry (`internal/unifiedresources`)**
- Central data model that normalises resources from **7 data sources** (`proxmox`, `pbs`, `pmg`, `docker`, `agent`, `kubernetes`, `truenas`) into a single `Resource` struct.
- **Canonical v6 resource types**: `agent`, `vm`, `system-container`, `app-container`, `docker-host`, `k8s-cluster`, `k8s-node`, `pod`, `k8s-deployment`, `storage`, `pbs`, `pmg`, `ceph`, `physical_disk`.
- Identity-matching engine: merges resources across sources using machine IDs, DMI UUIDs, hostnames, IPs, and MAC addresses.
- Provides typed **views** (`NodeView`, `K8sClusterView`, etc.) for consumer-specific queries.
- Canonical API endpoint: `GET /api/resources`.
3. **Monitoring Engine (`internal/monitoring`)**
- **Polymorphic monitors**: Each Proxmox VE/PBS/PMG node runs in its own goroutine, polling via the platform REST API.
- **Agent receivers** (`internal/api`): Docker, Host, and Kubernetes agents push metrics via HTTP POST to `/api/agents/{type}/report`.
- **TrueNAS provider** (`internal/truenas`): Polls TrueNAS REST API for system info, pools, datasets, disks, alerts, ZFS snapshots, and replication tasks.
- Multi-tenant aware: when `PULSE_MULTI_TENANT_ENABLED=true`, each organisation gets an isolated monitor instance with its own configuration.
4. **WebSocket Hub (`internal/websocket`)**
- Manages active browser connections with per-message compression (deflate).
- Broadcasts state diffs to all subscribed clients; supports per-tenant broadcasts for multi-org setups.
- Enforces origin validation, org-level authorization, and multi-tenant license gating.
5. **Decomposed API Router (`internal/api`)**
- The router is split into focused registration files for maintainability:
- `router_routes_auth_security.go` — Auth, OIDC, SAML, SSO, security tokens, recovery, agent install scripts.
- `router_routes_monitoring.go` — Unified resources, metrics history, charts, recovery points, alerts, notifications, discovery.
- `router_routes_ai_relay.go` — AI settings, Patrol, Intelligence, Chat sessions, Relay config, and the approval workflow.
- `router_routes_org_license.go` — Organisations, RBAC, audit logs, license/entitlements, billing.
- `router_routes_registration.go` — Config CRUD, TrueNAS connections, update management, agent profiles, system settings.
- `router_routes_hosted.go` — Hosted/SaaS-specific signup and org admin routes.
- Every endpoint enforces **scoped access control** (e.g., `monitoring:read`, `settings:write`, `ai:execute`, `ai:chat`).
6. **Relay Client (`internal/relay`)**
- Maintains a persistent WebSocket tunnel to a managed relay server for **mobile remote access**.
- Uses an ECDH key-exchange for per-channel **end-to-end encryption**.
- Multiplexes multiple mobile sessions over a single connection with back-pressure (data limiter) and per-channel authentication.
- Supports push notifications through the relay.
- Gated by the `relay` license feature.
7. **AI Subsystem (`internal/ai`)**
- **Pulse Assistant (Chat)**: Interactive LLM-powered chat with infrastructure context. Supports bring-your-own-key (BYOK) providers (OpenAI, Anthropic, Ollama, etc.) plus Claude OAuth.
- **Pulse Patrol**: Scheduled background analysis that produces findings, predictions, and remediation plans. Autonomy levels: `monitor`, `approval`, `assisted`, `full`.
- **Intelligence Services**: Patterns, correlations, anomalies, baselines, forecasts, and incident recording. All surfaced via `/api/ai/intelligence/*`.
- **Safety gates**: Command execution disabled by default (`--enable-commands` opt-in); circuit breakers and scoped permissions at every layer.
8. **Entitlements & Licensing (`internal/license`)**
- Capability-key based gating: `ai_autofix`, `rbac`, `multi_tenant`, `relay`, `agent_profiles`, `kubernetes_ai`, `ai_alerts`, etc.
- Three tiers: **Community** (free), **Pro** (license key), **Cloud** (subscription).
- Trial lifecycle with activation, renewal, and expiry. All state exposed via `/api/license/*`.
9. **Recovery Engine (`internal/recovery`)**
- Aggregates backup data (PBS snapshots, ZFS snapshots, replication tasks) into a unified recovery point timeline.
- Provides faceted queries: filter by type, source, status. Rollup summaries for dashboards.
- API: `/api/recovery/points`, `/api/recovery/series`, `/api/recovery/facets`, `/api/recovery/rollups`.
10. **Audit Logging (`pkg/audit`)**
- Defence-in-depth: every mutation is logged to SQLite, optionally signed with per-tenant encryption keys.
- Async logging mode (`PULSE_AUDIT_ASYNC`) for high-throughput environments.
- Tenant-aware logger manager for isolated per-org audit trails.
### Data Flow
1. **Collection**:
- **Proxmox VE / PBS / PMG**: Monitoring engine polls platform REST APIs (configurable interval, default 2 s for PVE).
- **Docker / Host / Kubernetes**: Lightweight agents push metrics via HTTP POST on their configured interval.
- **TrueNAS**: Provider polls the TrueNAS REST API for system, pool, dataset, disk, alert, and replication data.
2. **Normalisation**: Platform-specific responses are mapped into `unifiedresources.Resource` structs by adapters in `internal/unifiedresources/adapters.go`.
3. **Registration**: Resources are inserted into the in-memory registry, which handles deduplication, identity matching, and status computation.
4. **Broadcast**: The latest state snapshot is serialised to JSON and pushed to all connected WebSocket clients by the Hub.
5. **Persistence**: Metrics history is stored in a SQLite-backed metrics store (`pkg/metrics`) with configurable retention.
## 🎨 Frontend Architecture (SolidJS)
The frontend is a modern SPA in `frontend-modern/`, built with **SolidJS** and **TypeScript**. It uses fine-grained reactivity (no Virtual DOM) for maximum performance.
### Key Technologies
- **SolidJS**: Reactive UI framework.
- **TailwindCSS** with a **semantic design token layer**: All structural colours use CSS custom-property tokens (`bg-base`, `bg-surface`, `text-base-content`, `border-border`, etc.) defined in `index.css`— components never hardcode hex values. Light/dark mode switches automatically via class-based theme resolution.
- **Vite**: Build tooling (dev server + production bundling).
- **Lucide Icons**: Icon library (imported as solid components).
### Routing & Navigation
Navigation is organised by **task**, not by platform:
| Route | Page | Purpose |
|---|---|---|
| `/infrastructure` | Infrastructure | Hosts, nodes, clusters across all platforms |
| `/workloads` | Workloads | VMs, LXCs, containers, K8s pods |
| `/storage` | Storage | Proxmox storage, ZFS pools, Ceph |
| `/recovery` | Recovery | Backups, snapshots, replication |
| `/ceph` | Ceph | Detailed Ceph cluster view |
| `/dashboard` | Dashboard | Summary panels and metrics |
| `/alerts/*` | Alerts | Alert rules, active alerts, history |
| `/ai/*` | AI Intelligence | Patrol findings, investigations, forecasts |
| `/settings/*` | Settings | Configuration, security, AI, relay |
| `/operations/*` | Operations | Operational tools |
Legacy route aliases have been removed; canonical v6 routes are the only supported navigation surface.
### State Management
- **WebSocket store** (`stores/websocket.ts`): Manages the live connection, reactive `State` object, reconnection logic, and per-org switching.
- **Metrics collector** (`stores/metricsCollector.ts`): Buffers time-series data for sparklines and historical charts.
- **AI stores** (`stores/aiChat.ts`, `stores/aiIntelligence.ts`): Manage chat sessions and patrol findings.
- **License store** (`stores/license.ts`): Tracks plan tier, feature flags, and multi-tenant state.
- **System settings store** (`stores/systemSettings.ts`): Caches server-side preferences (for example theme and feature toggles).
### Component Design
- **Shared primitives** in `components/shared/`: `Card`, `Button`, `Toggle`, `FilterButtonGroup`, `Table` — all mapped to the semantic design tokens.
- **Lazy-loaded pages**: All top-level pages are loaded via `lazy()` with optional preloading after initial render.
- **Virtual table windowing**: Large resource lists use virtualised rendering for smooth scrolling at scale.
- **Command Palette** (`Cmd/Ctrl+K`): Quick-access command launcher.
- **Keyboard shortcuts**: `g i` → Infrastructure, `g w` → Workloads, `g s` → Storage, `g b` → Recovery, `g a` → Alerts, `g t` → Settings, `/` → Search.
### Mobile Experience
- **MobileNavBar** component: Bottom tab bar for touch navigation.
- **Relay integration**: The mobile app connects through the relay protocol for encrypted remote access.
## 🔒 Security
- **Encryption at Rest**: Sensitive config (passwords, API keys) encrypted with AES-GCM via `internal/crypto`. Each tenant can have its own encryption key.
- **Scoped API Tokens**: Tokens carry explicit scopes (`monitoring:read`, `settings:write`, `ai:chat`, `ai:execute`, `docker:report`, etc.). Endpoints enforce scope checks before processing.
- **RBAC**: Role-based access control via `pkg/auth` with file-backed persistence. Resources: `users`, `settings`, `monitoring`.
- **SSO**: OIDC and SAML providers with per-provider configuration and multi-provider support.
- **Agent Commands**: Disabled by default for security. Operators must opt in with `--enable-commands`.
- **Audit Trail**: Every mutation logged to per-tenant SQLite databases with optional cryptographic signatures.
- **Rate Limiting**: Per-endpoint and per-tenant rate limiting with configurable thresholds.
- **CSRF Protection**: Token-based CSRF prevention on mutation endpoints.
- **Recovery Mode**: Localhost-only endpoint for emergency auth recovery with time-limited tokens.
## 🚀 Deployment
Pulse is distributed as:
1. **Docker Container**: Multi-stage build producing a minimal image with the Go binary + embedded frontend.
2. **Single Binary**: The frontend is compiled into the Go binary using `go:embed` (`internal/api/frontend_embed.go`), enabling single-file deployment.
3. **Kubernetes Helm Chart**: For cluster deployments with configurable replicas and persistence.
4. **Systemd Service**: For bare-metal or VM installations with `.env`-based configuration and `SIGHUP` reload support.
5. **Proxmox LXC Helper Script**: One-line install inside a Proxmox helper-scripts container.