Commit graph

2198 commits

Author SHA1 Message Date
rcourtman
e992eb43c1 Show Patrol record briefings in Assistant handoff 2026-05-06 17:20:59 +01:00
rcourtman
b84fc2301a Surface paid runtime mismatch in licensing 2026-05-06 17:18:35 +01:00
rcourtman
87ba22b0fc Surface Patrol investigation records in Assistant context 2026-05-06 17:00:05 +01:00
rcourtman
75e3cb76fd Add structured Patrol investigation records 2026-05-06 16:31:51 +01:00
rcourtman
cdd977a1c1 Capture Pulse Intelligence product direction 2026-05-06 15:56:45 +01:00
rcourtman
9569cfc785 Downplay infrastructure policy chips 2026-05-06 15:41:30 +01:00
rcourtman
1531f8cef4 Fix infrastructure resource badge stacking 2026-05-06 15:14:52 +01:00
rcourtman
a22eb2587c refactor: finish alerts manager decomposition
Delete the residual internal/alerts/alerts.go catch-all and move its remaining ownership groups into config_facade, model, constants, metric_hooks, manager, default_config, lifecycle, and escalation files.

Keep compatibility aliases in the alerts package, preserve the type-alias identity test, add a default-config pointer isolation regression test, and record the new ownership split in the alerts subsystem contract.

Proof: git diff --check -- docs/release-control/v6/internal/subsystems/alerts.md internal/alerts

Proof: go test ./internal/alerts/... -run 'TestDefaultAlertConfigUsesIndependentBackupAlertOrphanedPointer|TestTypeAliasIdentity|TestSetMetricHooks|TestSetLicenseCheckerStoresChecker|TestHistoryManager_Stop|TestEscalationDisabledWhenAlertsDisabled|TestEscalationDisabledWhenActivationPending|TestEscalationDisabledWhenActivationSnoozed|TestEscalationSkipsWhenScheduleDisabled|TestEscalationSkipsAcknowledgedAlerts|TestEscalationAdvancesLevels|TestEscalationDoesNotRepeatSameLevel|TestEscalationUsesCallback' -count=1

Proof: go test ./internal/alerts/... -count=1

Proof: go test ./internal/api -run Alert -count=1
2026-05-06 15:00:30 +01:00
rcourtman
0a1df95684 refactor: split active alert cleanup
Move active alert TTL cleanup, auto-ack cleanup, node cleanup, and full active-state reset into internal/alerts/active_cleanup.go.

Record active_cleanup.go in the alerts subsystem contract and broaden ClearActiveAlerts coverage for alias, Docker, recovery, and canonical acknowledgement maps.

Proof: go test ./internal/alerts/... -run 'TestClearActiveAlertsEmptyMaps|TestClearActiveAlertsWithExistingAlerts|TestCleanup|TestCleanupAlertsForNodes' -count=1

Proof: go test ./internal/alerts/... -count=1

Proof: go test ./internal/api -run Alert -count=1
2026-05-06 14:50:10 +01:00
rcourtman
feb2eb2f1b refactor: split active alert lifecycle
Move active alert acknowledgement, manual clear, recovery clear, state preservation, resolved registration, and no-lock removal helpers into internal/alerts/active_lifecycle.go.

Record active_lifecycle.go in the alerts subsystem contract and add a canonical-alias manual-clear characterization test.

Proof: go test ./internal/alerts/... -run 'TestClearAlertMarksResolutionAndReturnsStatus|TestClearAlertByCanonicalAliasRemovesActiveState|TestAddRecentlyResolvedUsesCanonicalStorageKey|TestAcknowledgeAlertNotFound|TestUnacknowledgeAlertSuccess|TestUnacknowledgeAlertByCanonicalAlias' -count=1

Proof: go test ./internal/alerts/... -count=1

Proof: go test ./internal/api -run Alert -count=1
2026-05-06 14:44:04 +01:00
rcourtman
8c3c1e417c refactor: split active alert persistence
Move active alert save/load, secure storage handling, startup restore migration, periodic persistence, and stale tracking cleanup into dedicated alerts package owners.

Record active_persistence.go and tracking_cleanup.go in the alerts subsystem contract and add a load-time file-permission hardening characterization test.

Proof: go test ./internal/alerts/... -run 'TestCleanupStaleMaps|TestLoadActiveAlerts|TestLoadActiveAlertsHardensExistingFilePermissions|TestSaveAndLoadActiveAlerts_UsesManagerDataDirAndSecurePermissions|TestSaveActiveAlertsBackfillsCanonicalIdentityOnDiskWithoutMutatingLiveAlert' -count=1

Proof: go test ./internal/alerts/... -count=1

Proof: go test ./internal/api -run Alert -count=1
2026-05-06 14:38:05 +01:00
rcourtman
4a981fe85d Refine availability probe row presentation
Refs #1460
2026-05-06 14:35:02 +01:00
rcourtman
69f6224b3b refactor: split alert config runtime
Move alert configuration normalization, activation migration, global-disable cleanup, active-alert reevaluation, and threshold override cloning into internal/alerts/config_runtime.go.

Record config_runtime.go as the alert config runtime owner in the alerts subsystem contract and add an activation-state preservation characterization test.

Proof: go test ./internal/alerts/... -count=1

Proof: go test ./internal/api -run Alert -count=1
2026-05-06 14:29:06 +01:00
rcourtman
90fbafbc21 Surface availability probe evidence in infrastructure rows
Refs #1460
2026-05-06 14:22:35 +01:00
rcourtman
7ce2cfaa66 refactor: split Proxmox guest alerts
Move guest metric projection, per-disk guest metric evaluation, powered-off lifecycle alerts, Pulse tag controls, relaxed thresholds, and guest suppression cleanup into internal/alerts/guest.go.

Record guest.go as the Proxmox guest alert owner in the alerts subsystem contract and add a Pulse-tag characterization test.

Proof: go test ./internal/alerts/...
2026-05-06 14:17:14 +01:00
rcourtman
6a1f0df61d refactor: split alert health assessment runtime
Move storage-health reason normalization, ZFS assessment helpers, active health alert value lookup, and canonical health-assessment alert synchronization into internal/alerts/health_assessment.go.

Record health_assessment.go as the shared storage-health assessment owner in the alerts subsystem contract and add a reason-code characterization test.

Proof: go test ./internal/alerts/...
2026-05-06 14:10:41 +01:00
rcourtman
2f5aa20122 Add mock availability endpoint fixtures
Refs #1460
2026-05-06 14:08:03 +01:00
rcourtman
1f3e3ec9cd refactor: split alert metric runtime
Move threshold lookup, per-metric delay resolution, legacy metric alert lifecycle, metric options, and shared metric key helpers into internal/alerts/metric_runtime.go.

Record metric_runtime.go as the shared metric-threshold runtime owner in the alerts subsystem contract and add a metric-options characterization test.

Proof: go test ./internal/alerts/...
2026-05-06 14:06:50 +01:00
rcourtman
8457be3cb6 refactor: split Proxmox disk health alerts
Move Proxmox disk canonical identity, disk health assessment, known-firmware suppression, and SSD wearout alerting into internal/alerts/disk_health.go.

Keep the Manager API unchanged while recording disk_health.go as the Proxmox disk-health checker owner in the alerts subsystem contract.

Proof: go test ./internal/alerts/...
2026-05-06 13:56:36 +01:00
rcourtman
baeef84c69 refactor: split backup snapshot alerts
Move snapshot age and size evaluation, backup rollup age evaluation, inventory readiness, namespace disambiguation, template matching, and backup/snapshot cleanup into internal/alerts/backup_snapshot.go.

Keep the generic async active-alert save helper in the central package because canonical metric migration still shares it, and record backup_snapshot.go as the backup/snapshot owner in the alerts subsystem contract.

Proof: go test ./internal/alerts/...
2026-05-06 13:54:09 +01:00
rcourtman
8a21162f35 refactor: split host alert checker
Move host-agent identity, metric projection, disk/SMART/RAID/Unraid health handling, cleanup, and offline lifecycle into internal/alerts/host.go.

Keep shared health-assessment evaluation package-level for now because storage ZFS and host SMART/RAID still share that bridge, while recording host.go as the host checker owner in the alerts subsystem contract.

Proof: go test ./internal/alerts/...
2026-05-06 13:50:13 +01:00
rcourtman
3d8cb6c8a5 refactor: split node alert checker
Move Proxmox node metric, temperature, offline lifecycle, host-agent deduplication, and node display-name cache support into internal/alerts/node.go.

Keep the Manager API unchanged while recording the node checker owner in the alerts subsystem contract and adding a focused display-name cache key characterization.

Proof: go test ./internal/alerts/...
2026-05-06 13:47:54 +01:00
rcourtman
a0e8896893 refactor: split PBS and storage alert checkers
Move PBS connectivity and metric evaluation into internal/alerts/pbs.go, and move storage connectivity, usage, and ZFS health evaluation into internal/alerts/storage.go.

Keep the Manager API unchanged while recording PBS and storage as resource-checker owners in the alerts subsystem contract, with focused characterization tests for PBS offline normalization and ZFS device labels.

Proof: go test ./internal/alerts/...
2026-05-06 13:45:16 +01:00
rcourtman
d2ac17fd80 refactor: split Docker alert checker
Move Docker host connectivity, container state and health, metric projection, service gap/update-state checks, image update timing, and Docker tracking cleanup into internal/alerts/docker.go.

Keep the Manager API unchanged while recording Docker as the resource-checker owner and strengthening Docker resource ID normalization proof.

Proof: go test ./internal/alerts/...
2026-05-06 13:31:05 +01:00
rcourtman
8c0261ec43 refactor: split PMG alert checker
Move PMG connectivity, queue, per-node queue, quarantine, and anomaly evaluation into internal/alerts/pmg.go while keeping the Manager API unchanged.

Record PMG as the resource-checker owner in the alerts contract and add a PMG connection-health normalization proof through CheckPMG.

Proof: go test ./internal/alerts/...

Proof: go test ./internal/monitoring
2026-05-06 13:28:19 +01:00
rcourtman
0d642c32ef refactor: split alert read model
Move active-alert projection, sorting, metadata coercion, recently resolved reads, history wrappers, and notify-existing redispatch into internal/alerts/read_model.go.

Keep the Manager API unchanged while recording the read-side alerts contract owner and strengthening resolved-alert clone proof.

Proof: go test ./internal/alerts/...
2026-05-06 13:22:12 +01:00
rcourtman
9d4fabf915 refactor: split alert notification policy
Move alert dispatch, flapping suppression, quiet-hours suppression, monitor-only suppression, cooldown, and rate-limit policy into internal/alerts/notification_policy.go.

Keep Manager behavior and public API unchanged while recording the new alerts contract owner and adding monitor-only dispatch proof.

Proof: go test ./internal/alerts/...; go test -json ./internal/api -count=1; go test ./internal/api ./internal/monitoring ./internal/ai/... ./internal/websocket (internal/api did not reproduce on isolated rerun; other packages passed in the broad run).
2026-05-06 13:14:18 +01:00
rcourtman
edae6d1edc refactor: split alert config and callbacks
Extract alert config types, normalization, and identity helpers into internal/alerts/config while preserving the existing alerts package API through aliases and wrappers.

Move Manager callback lifecycle state into a same-package callbackBus, keeping public Set/Subscribe methods unchanged.

Harden metrics SQLite artifacts to owner-only permissions and cover permissive umask behavior.

Proof: go test -json ./internal/api -count=1; go test ./internal/alerts/... ./internal/monitoring ./internal/ai/... ./internal/websocket ./internal/config ./pkg/metrics; go test ./internal/alerts/... ./pkg/metrics
2026-05-06 13:01:32 +01:00
rcourtman
d6ca8b12e6 Add agentless availability targets
Refs #1460
2026-05-06 10:35:34 +01:00
rcourtman
2f8e5184bd Remove navigation guide modal and reopen control
The four-step coachmark over the top tabs was a tour pretending to be
guidance: each step duplicated the tab title in one sentence, and the
Reopen control on /settings/system-general spawned a centered panel with
no spotlight target because the tabs only exist on dashboard routes.

Delete the modal, the localStorage dismissal key, the reopen event, the
Reopen row in General settings, and the matching guardrails so the
shared-primitives tests stop pinning the deleted owner split. Drop the
WhatsNew dismissal helpers and addInitScript bypasses from the
integration suite, and the dedicated tour test in
19-telemetry-disclosure.
2026-05-06 09:49:15 +01:00
rcourtman
01474a18b6 Fail closed on incomplete OpenAI SSE streams
Keep the buffered EOF compatibility path for OpenAI-compatible streams that omit [DONE] but provide a terminal finish_reason, while rejecting truncated tool-call streams before they can produce executable tool calls.

Refs #1411

Refs #1412
2026-05-05 22:10:50 +01:00
rcourtman
d6e96ebeca Fix v6 demo release signing key deployment 2026-05-05 21:40:14 +01:00
rcourtman
4aa91f6af3 Refresh RC4 packet after watcher lifecycle fix 2026-05-05 18:30:06 +01:00
rcourtman
7cebe78859 Fix config watcher stop lifecycle race 2026-05-05 18:26:53 +01:00
rcourtman
868239a648 Stabilize TrueNAS poller enable-disable proof 2026-05-05 16:50:10 +01:00
rcourtman
09c8e75f4d Refresh RC4 packet validation metadata 2026-05-05 16:27:49 +01:00
rcourtman
1a3e5ec27d Fix tenant monitor broadcast nil hub panic 2026-05-05 16:25:00 +01:00
rcourtman
96c2e160c9 Fix RC4 release validation blockers 2026-05-05 15:59:23 +01:00
rcourtman
f149c5d643 Prepare v6.0.0-rc.4 release packet 2026-05-05 15:32:32 +01:00
rcourtman
cd2abe879e Fix mock mode legacy sidecar drift 2026-05-05 15:12:31 +01:00
rcourtman
d7225a45a0 Fix Proxmox guest memory fallbacks
Also fixes Ceph pool threshold resource identity.

Refs #1341
2026-05-05 14:59:29 +01:00
rcourtman
35b2deebfb Harden Proxmox guest snapshot polling
Refs #1437
2026-05-05 14:51:28 +01:00
rcourtman
ce7b459aa7 Harden runtime Proxmox token ACLs 2026-05-05 14:42:05 +01:00
rcourtman
30180727ad Harden Proxmox setup token ACLs 2026-05-05 14:19:50 +01:00
rcourtman
c61ea4947a Make Proxmox onboarding API-first 2026-05-05 13:25:17 +01:00
rcourtman
cf103ca9fe Harden root agent service defaults 2026-05-05 13:03:13 +01:00
rcourtman
81b31e4d3b Remove monitored-system volume caps
Retire runtime/API/UI monitored-system volume enforcement now that infrastructure monitoring is no longer capped.

Keep only legacy metadata scrubbing and purchase-start compatibility for old max_monitored_systems references.

Rename the remaining preview surface to monitored-system impact and make previews explanatory rather than save-blocking.

Update subsystem contracts and RA7 evidence for the caps-retired invariant.
2026-05-05 12:59:59 +01:00
rcourtman
aa5472553f Fix Workloads empty state source detection
Refs #1456
2026-05-05 09:42:31 +01:00
rcourtman
632f0af7f3 Keep uncapped continuity from writing raw caps 2026-05-05 09:33:44 +01:00
rcourtman
641660dced Fix mdadm RAID fallback discovery
Refs #1455
2026-05-05 09:29:34 +01:00