Pulse/internal/monitoring
rcourtman 20854256c3 Fix VM migration issue where custom alert thresholds are lost
Resolves #641

## Problem
When a VM migrates between Proxmox nodes, Pulse was treating it as a new
resource and discarding custom alert threshold overrides. This occurred
because guest IDs included the node name (e.g., `instance-node-VMID`),
causing the ID to change when the VM moved to a different node.

Users reported that after migrating a VM, previously disabled alerts
(e.g., memory threshold set to 0) would resume firing.

## Root Cause
Guest IDs were constructed as:
- Standalone: `node-VMID`
- Cluster: `instance-node-VMID`

When a VM migrated from node1 to node2, the ID changed from
`instance-node1-100` to `instance-node2-100`, causing:
- Alert threshold overrides to be orphaned (keyed by old ID)
- Guest metadata (custom URLs, descriptions) to be orphaned
- Active alerts to reference the wrong resource ID

## Solution
Changed guest ID format to be stable across node migrations:
- New format: `instance-VMID` (for both standalone and cluster)
- Retains uniqueness across instances while being node-independent
- Allows VMs to migrate freely without losing configuration

## Implementation

### Backend Changes
1. **Guest ID Construction** (`monitor_polling.go`):
   - Simplified to always use `instance-VMID` format
   - Removed node from the ID construction logic

2. **Alert Override Migration** (`alerts.go`):
   - Added lazy migration in `getGuestThresholds()`
   - Detects legacy ID formats and migrates to new format
   - Preserves user configurations automatically

3. **Guest Metadata Migration** (`guest_metadata.go`):
   - Added `GetWithLegacyMigration()` helper method
   - Called during VM/container polling to migrate metadata
   - Preserves custom URLs and descriptions

4. **Active Alerts Migration** (`alerts.go`):
   - Added migration logic in `LoadActiveAlerts()`
   - Translates legacy alert resource IDs to new format
   - Preserves alert acknowledgments across restarts

### Frontend Changes
5. **ID Construction Updates**:
   - `ThresholdsTable.tsx`: Updated fallback from `instance-node-vmid` to `instance-vmid`
   - `Dashboard.tsx`: Simplified guest ID construction
   - `GuestRow.tsx`: Updated `buildGuestId()` helper

## Migration Strategy
- **Lazy Migration**: Configs are migrated as guests are discovered
- **Backwards Compatible**: Old IDs are detected and automatically converted
- **Zero Downtime**: No manual intervention required
- **Persisted**: Migrated configs are saved on next config write cycle

## Testing Recommendations
After deployment:
1. Verify existing alert overrides still apply
2. Test VM migration - confirm thresholds persist
3. Check guest metadata (custom URLs) survive migration
4. Verify active alerts maintain acknowledgment state

## Related
- Addresses similar issues with guest metadata and active alert tracking
- Lays groundwork for any future guest-specific configuration features
- Aligns with project philosophy: correctness and UX over implementation complexity
2025-11-06 10:27:15 +00:00
..
backoff.go feat: implement error handling with circuit breakers and backoff (Phase 2 Task 7) 2025-10-20 15:13:37 +00:00
backoff_test.go test: add comprehensive unit tests for backoff and circuit breaker (Phase 2 Task 9a) 2025-10-20 15:13:38 +00:00
backup_guard_test.go Guard PBS backups from failed polls 2025-11-05 19:26:20 +00:00
ceph.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
circuit_breaker.go feat: enhance scheduler health API with rich instance metadata 2025-10-20 15:13:38 +00:00
circuit_breaker_test.go test: add comprehensive unit tests for backoff and circuit breaker (Phase 2 Task 9a) 2025-10-20 15:13:38 +00:00
container_disk_usage.go feat: add professional logging with runtime configuration and performance optimization 2025-10-20 15:13:38 +00:00
diagnostic_snapshots.go Refine Proxmox node memory fallback (#582) 2025-10-22 15:36:26 +00:00
docker_commands.go feat: add docker agent command handling 2025-10-15 19:27:19 +00:00
docker_commands_test.go chore: snapshot current changes 2025-11-02 22:47:55 +00:00
fake_executor_integration.go test: add comprehensive integration test harness for adaptive polling (Phase 2 Task 9c) 2025-10-20 15:13:38 +00:00
fs_filters.go Ignore read-only guest filesystems in disk aggregation 2025-10-14 16:13:53 +00:00
fs_filters_test.go Ignore read-only guest filesystems in disk aggregation 2025-10-14 16:13:53 +00:00
harness_integration.go Surface LXC interface IPs via PVE interfaces API (#596) 2025-10-23 08:07:32 +00:00
helpers_test.go Expand monitoring and discovery test coverage 2025-10-16 08:17:08 +00:00
integration_integration_test.go test: add soak test with runtime instrumentation (Phase 2 Task 9d) 2025-10-20 15:13:38 +00:00
main_test.go Harden setup token flow and enforce encrypted persistence 2025-10-25 16:00:37 +00:00
metrics.go perf: reduce polling allocations and guest metadata load 2025-10-25 13:12:47 +00:00
metrics_history.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
metrics_history_concurrency_test.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
monitor.go Fix VM migration issue where custom alert thresholds are lost 2025-11-06 10:27:15 +00:00
monitor_docker_test.go Refactor: Code cleanup and localStorage consolidation 2025-11-04 21:50:46 +00:00
monitor_health_test.go feat: enhance scheduler health API with rich instance metadata 2025-10-20 15:13:38 +00:00
monitor_host_agents_test.go perf: reduce polling allocations and guest metadata load 2025-10-25 13:12:47 +00:00
monitor_memory_test.go Fix inflated RAM usage reporting for LXC containers 2025-11-06 00:16:18 +00:00
monitor_pmg_test.go Fix PMG API parameter issues causing 400 errors 2025-11-05 19:28:37 +00:00
monitor_polling.go Fix VM migration issue where custom alert thresholds are lost 2025-11-06 10:27:15 +00:00
monitor_snapshots_test.go Fix inflated RAM usage reporting for LXC containers 2025-11-06 00:16:18 +00:00
monitor_storage_test.go Fix inflated RAM usage reporting for LXC containers 2025-11-06 00:16:18 +00:00
monitor_temperature_toggle_test.go Add configurable SSH port for temperature monitoring 2025-11-05 20:03:29 +00:00
poller.go feat: add professional logging with runtime configuration and performance optimization 2025-10-20 15:13:38 +00:00
ratetracker.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
ratetracker_concurrency_test.go Fix settings security tab navigation 2025-10-11 23:29:47 +00:00
reload.go Propagate config updates to settings nodes (#588) 2025-10-22 13:45:13 +00:00
scheduler.go feat: enhance scheduler health API with rich instance metadata 2025-10-20 15:13:38 +00:00
staleness_tracker.go release: prepare v4.25.0 2025-10-22 10:46:18 +00:00
staleness_tracker_test.go test: add comprehensive staleness tracker unit tests (Phase 2 Task 9b) 2025-10-20 15:13:38 +00:00
task_queue.go perf: reduce polling allocations and guest metadata load 2025-10-25 13:12:47 +00:00
temperature.go Fix container SSH detection and improve troubleshooting for issue #617 2025-11-06 09:57:53 +00:00
temperature_service.go Add configurable SSH port for temperature monitoring 2025-11-05 20:03:29 +00:00
temperature_test.go Expand temperature sensor compatibility for SuperIO and AMD CPUs 2025-11-05 18:47:21 +00:00