- Use X-Forwarded-Proto/X-Forwarded-Scheme for scheme detection
- Use X-Forwarded-Host for host matching behind reverse proxies
- Update tests with remoteAddr for CSWSH protection validation
- AI Intelligence endpoints (/api/ai/intelligence/*, /api/ai/forecast/*,
/api/ai/unified/findings, etc.) now require ai:execute scope to prevent
low-privilege tokens from reading sensitive intelligence data
- AI Knowledge endpoints (/api/ai/knowledge/*) now require ai:chat scope
to prevent arbitrary guest data access across the fleet
- AI Debug Context (/api/ai/debug/context) now requires settings:read scope
to prevent system prompt and infrastructure details leakage
- WebSocket origin check now validates peer IP is private when allowing
private network origins, mitigating CSWSH attacks where a malicious page
on the same LAN tries to hijack connections using victim's session cookie
- Initialize Alert and Notification managers with tenant-specific data directories
- Add panic recovery to WebSocket safeSend for stability
- Record host metrics to history for sparkline support
Implements multi-tenant infrastructure for organization-based data isolation.
Feature is gated behind PULSE_MULTI_TENANT_ENABLED env var and requires
Enterprise license - no impact on existing users.
Core components:
- TenantMiddleware: extracts org ID, validates access, 501/402 responses
- AuthorizationChecker: token/user access validation for organizations
- MultiTenantChecker: WebSocket upgrade gating with license check
- Per-tenant audit logging via LogAuditEventForTenant
- Organization model with membership support
Gating behavior:
- Feature flag disabled: 501 Not Implemented for non-default orgs
- Flag enabled, no license: 402 Payment Required
- Default org always works regardless of flag/license
Documentation added: docs/MULTI_TENANT.md
Add atomic `closed` flag to Client struct and `safeSend()` helper method
to prevent race condition when sending to client channels. The race
occurred when a client disconnected while a goroutine was trying to send
initial state - the channel could be closed between the registration
check and the actual send.
All sends to client.send now go through safeSend() which checks the
closed flag first. The flag is set atomically before closing the channel
in all code paths (unregister, dispatchToClients, broadcast, shutdown).
Related to #1048
- checkOrigin: Remove redundant same-origin check at line 155 that was
already handled at line 116 (origin == requestOrigin)
Function improved from 95.1% to 97.4% coverage.
Test coverage for pure utility functions:
- isValidPrivateOrigin: validates private network origins (security)
- normalizeForwardedProto: normalizes ws/wss to http/https for proxies
- sanitizeValue: handles NaN/Inf values in JSON data
- cloneMetadata/cloneMetadataValue: deep copies metadata maps/slices
- cloneAlert/cloneAlertData: deep copies alert structures
Coverage increased from 20.9% to 37.3% (80 test cases).
Fixed goroutine leaks in WebSocket hub from missing shutdown mechanism:
Problem:
1. Hub.Run() has infinite loop with no exit condition
2. runBroadcastSequencer() reads from channel forever
3. No way to cleanly shutdown hub during restarts or tests
Solution:
- Added stopChan chan struct{} field to Hub
- Initialize stopChan in NewHub()
- Added Stop() method that closes stopChan
- Modified Run() main loop to select on stopChan
- On shutdown: close all client connections and return
- Modified runBroadcastSequencer() from 'for range' to select
- Changed from: for msg := range h.broadcastSeq
- Changed to: for { select { case msg := <-h.broadcastSeq: ... case <-h.stopChan: ... }}
- On shutdown: stop coalesce timer and return
Shutdown sequence:
1. Call hub.Stop() to close stopChan
2. Both Run() and runBroadcastSequencer() exit their loops
3. All client send channels are closed
4. Clients map is cleared
5. Pending coalesce timer is stopped
Impact:
- Enables graceful shutdown during service restarts
- Prevents goroutine leaks in tests
- Allows proper cleanup of WebSocket connections
- No more orphaned broadcast sequencer goroutines
This commit addresses 5 critical P0 bugs that cause security vulnerabilities, crashes, and data corruption:
**P0-1: Recovery Tokens Replay Attack Vulnerability** (recovery_tokens.go:153-159)
- **SECURITY CRITICAL**: Single-use recovery tokens could be replayed
- **Problem**: Lock upgrade race - two concurrent requests both pass initial Used check
1. Both acquire RLock, see token.Used = false
2. Both release RLock
3. Both acquire Lock and mark token.Used = true
4. Both return true - TOKEN REUSED
- **Impact**: Attacker with intercepted token can use it multiple times
- **Fix**: Re-check token.Used after acquiring write lock (TOCTOU prevention)
**P0-2: WebSocket Hub Concurrent Map Panic** (hub.go:345-347, 376-378)
- **Problem**: Initial state goroutine reads h.clients map without lock
- Line 345: `if _, ok := h.clients[client]` (NO LOCK)
- Main loop writes to h.clients with lock (line 326, 394)
- **Impact**: "fatal error: concurrent map read and write" crashes hub
- **Fix**: Acquire RLock before all client map reads in goroutine
**P0-3: WebSocket Send on Closed Channel Panic** (hub.go:348, 380)
- **Problem**: Check client exists, then send - channel can close between
- **Impact**: "send on closed channel" panic crashes hub
- **Fix**: Hold RLock during both check and send (defensive select already present)
**P0-4: CSRF Store Shutdown Data Corruption** (csrf_store.go:189-196)
- **Problem**: Stop() calls save() after signaling worker. Both hold only RLock
- Worker's final save writes to csrf_tokens.json.tmp
- Stop()'s save writes to same file concurrently
- **Impact**: Corrupted/truncated csrf_tokens.json on shutdown
- **Fix**: Added saveMu mutex to serialize all disk writes
**P0-5: CSRF Store Deadlock on Double-Stop** (csrf_store.go:103-108)
- **Problem**: stopChan unbuffered, no sync.Once guard, uses send not close
- **Impact**: Second Stop() call blocks forever waiting for receiver
- **Fix**:
- Added sync.Once field stopOnce
- Changed to close(stopChan) within stopOnce.Do()
- Prevents double-close panic and deadlock
All fixes maintain backwards compatibility. The recovery token fix is particularly critical as it closes a security vulnerability allowing replay attacks on password reset flows.
Replace string(rune(i)) with strconv.Itoa(i) in hub_concurrency_test.go
for generating client IDs. While this is test code and not a production bug,
it uses the same incorrect pattern that caused the PR #575 bug.
This ensures consistent best practices across the codebase and avoids
confusion for developers who might copy this pattern.
Related: #575