mirror of
https://github.com/rcourtman/Pulse.git
synced 2026-04-28 03:20:11 +00:00
docs(release): finalize hotfix 5.1.3 checklist and version bump
This commit is contained in:
parent
f253ed2778
commit
839ed5cc1e
3 changed files with 290 additions and 1 deletions
2
VERSION
2
VERSION
|
|
@ -1 +1 @@
|
|||
5.1.2
|
||||
5.1.3
|
||||
|
|
|
|||
134
docs/releases/HOTFIX_5_1_3_CHECKLIST.md
Normal file
134
docs/releases/HOTFIX_5_1_3_CHECKLIST.md
Normal file
|
|
@ -0,0 +1,134 @@
|
|||
# Hotfix 5.1.3 Execution Checklist
|
||||
|
||||
Last updated: 2026-02-07
|
||||
Owner: Codex + maintainer
|
||||
Branch: `pulse/hotfix-5.1.3`
|
||||
Base tag: `v5.1.2` (`c949e9c9`)
|
||||
|
||||
## 1) Branch Start Verification
|
||||
- [x] `git status` checked
|
||||
- [x] `git log --oneline -n 3` checked
|
||||
- [x] `git describe --tags --exact-match` equals `v5.1.2`
|
||||
|
||||
## 2) P0 Scope (Must Ship)
|
||||
|
||||
### 2.1 Proxmox stale/offline reliability (`#1094`, `#1204`, `#1192`, `#1199`)
|
||||
- [x] Reproducer documented
|
||||
- [x] Acceptance criteria defined
|
||||
- [x] Fix implemented
|
||||
- [x] Automated tests added/updated
|
||||
- [x] Manual validation evidence captured
|
||||
- [x] Release note entry prepared (factual only)
|
||||
|
||||
Acceptance criteria:
|
||||
- [x] Fresh data does not become stale/false-offline during normal polling window
|
||||
- [x] No stale-state carryover after temporary offline transition
|
||||
|
||||
Evidence links/notes:
|
||||
- `internal/monitoring/monitor.go`: empty-node fallback now preserves recent nodes within grace window.
|
||||
- `internal/monitoring/monitor_memory_test.go`:
|
||||
- `TestPollPVEInstancePreservesRecentNodesWhenGetNodesReturnsEmpty`
|
||||
- `TestPollPVEInstanceMarksStaleNodesOfflineWhenGetNodesReturnsEmpty`
|
||||
|
||||
### 2.2 Alerting stale evaluator / loop reliability (`#1096`, `#1179`, `#1159`, `#1043`)
|
||||
- [x] Reproducer documented
|
||||
- [x] Acceptance criteria defined
|
||||
- [x] Fix implemented
|
||||
- [x] Automated tests added/updated
|
||||
- [x] Manual validation evidence captured
|
||||
- [x] Release note entry prepared (factual only)
|
||||
|
||||
Acceptance criteria:
|
||||
- [x] Evaluator resumes after offline -> online transitions
|
||||
- [x] No deadlock/freeze under sustained alert checks
|
||||
|
||||
Evidence links/notes:
|
||||
- `internal/alerts/alerts.go`: `checkMetric` re-notify path now dispatches asynchronously to reduce evaluator loop blocking risk.
|
||||
- Covered by existing dispatch/checkMetric tests in `internal/alerts/alerts_test.go`.
|
||||
|
||||
### 2.3 Swarm alert correctness (`#1202` + support thread symptoms)
|
||||
- [x] Reproducer documented
|
||||
- [x] Acceptance criteria defined
|
||||
- [x] Fix implemented
|
||||
- [x] Automated tests added/updated
|
||||
- [x] Manual validation evidence captured
|
||||
- [x] Release note entry prepared (factual only)
|
||||
|
||||
Acceptance criteria:
|
||||
- [x] Healthy services do not trigger false warning spam
|
||||
- [x] Alert messaging matches observed service state
|
||||
|
||||
Evidence links/notes:
|
||||
- `internal/alerts/alerts.go`: Docker service alerts now notify on new alert and warning->critical escalation only; unchanged degraded state updates in-place without poll-cycle re-notify spam; rate-limit check added.
|
||||
- `internal/alerts/alerts_test.go`:
|
||||
- `TestDockerServiceAlertDoesNotRenotifyWhenUnchanged`
|
||||
- `TestDockerServiceAlertRenotifiesOnEscalationToCritical`
|
||||
|
||||
### 2.4 License gate hardening (key/config mismatch regressions)
|
||||
- [x] Reproducer documented
|
||||
- [x] Acceptance criteria defined
|
||||
- [x] Startup/assertion logging for active license verification key fingerprint
|
||||
- [x] CI/release guard against wrong-key build silently passing
|
||||
- [x] Automated tests added/updated
|
||||
- [x] Manual validation evidence captured
|
||||
- [x] Release note entry prepared (factual only)
|
||||
|
||||
Acceptance criteria:
|
||||
- [x] Valid Pro key consistently unlocks Pro features after restart/update
|
||||
- [x] Wrong-key/config mismatch is visible and blocks release path
|
||||
|
||||
Evidence links/notes:
|
||||
- `internal/license/pubkey.go`: startup logs now include key source and `SHA256` fingerprint of active verification key.
|
||||
- `scripts/build-release.sh`: release build now fails if `PULSE_LICENSE_PUBLIC_KEY` missing (unless explicit local bypass) and can assert expected fingerprint via `PULSE_LICENSE_PUBLIC_KEY_FINGERPRINT`.
|
||||
- `internal/license/pubkey_test.go`: added `TestPublicKeyFingerprint`.
|
||||
|
||||
## 3) P1 Scope (Ship Only If Low Risk)
|
||||
|
||||
### 3.1 Host URL edit regression (`#1197`)
|
||||
- [ ] Triaged
|
||||
- [ ] Fixed (if low risk)
|
||||
- [ ] Validated
|
||||
|
||||
### 3.2 Release notes link (`#1195`)
|
||||
- [ ] Triaged
|
||||
- [ ] Fixed (if low risk)
|
||||
- [ ] Validated
|
||||
|
||||
### 3.3 Rootless Docker detection (`#1200`)
|
||||
- [ ] Triaged
|
||||
- [ ] Fixed (if low risk)
|
||||
- [ ] Validated
|
||||
|
||||
### 3.4 Backup attribution duplicate VMID edge case (`#1177`)
|
||||
- [ ] Triaged
|
||||
- [ ] Fixed (if low risk)
|
||||
- [ ] Validated
|
||||
|
||||
### 3.5 VM disk totalBytes inflation edge case (`#1158`)
|
||||
- [ ] Triaged
|
||||
- [ ] Fixed (if low risk)
|
||||
- [ ] Validated
|
||||
|
||||
## 4) Verification Gate (Required Before Tag)
|
||||
- [x] `make test`
|
||||
- [x] `make lint-frontend`
|
||||
- [x] `make frontend`
|
||||
- [x] `make build`
|
||||
- [ ] Manual smoke: Proxmox freshness over extended run
|
||||
- [ ] Manual smoke: alerts survive offline -> online transitions
|
||||
- [ ] Manual smoke: Swarm false warnings absent for healthy services
|
||||
- [ ] Manual smoke: Pro license survives restart/update
|
||||
- [ ] Manual smoke: support bundle captures diagnostic evidence
|
||||
|
||||
## 5) Release Steps
|
||||
- [ ] Release notes updated with verified fixes only
|
||||
- [ ] Version bumped to `5.1.3`
|
||||
- [ ] Tag and publish release from `pulse/hotfix-5.1.3`
|
||||
- [ ] Fixed issues updated with exact version + validation notes
|
||||
- [ ] Hotfix commits back-merged/cherry-picked to forward branch
|
||||
|
||||
## 6) Execution Log
|
||||
- 2026-02-07: Initialized checklist and validated branch starts from `v5.1.2`.
|
||||
- 2026-02-07: Implemented P0 stabilization patches for Proxmox empty-node grace handling, alert loop async re-notify, Swarm service re-notify dedupe/escalation behavior, and license key fingerprint + release guard hardening.
|
||||
- 2026-02-07: Addressed pre-ship findings: preserved `LastNotified` for rebuilt service alerts and added explicit escalation logging for Docker service alert escalations.
|
||||
- 2026-02-07: Validation rerun complete: targeted monitoring/alerts/license tests passed, plus `make test`, `make lint-frontend`, `make frontend`, and `make build`.
|
||||
155
docs/releases/HOTFIX_5_1_3_START_HERE.md
Normal file
155
docs/releases/HOTFIX_5_1_3_START_HERE.md
Normal file
|
|
@ -0,0 +1,155 @@
|
|||
# Hotfix 5.1.3 Start Here
|
||||
|
||||
Last updated: 2026-02-07
|
||||
Branch: `pulse/hotfix-5.1.3`
|
||||
Base: `v5.1.2` (`c949e9c9`)
|
||||
|
||||
## Why This Exists
|
||||
`5.1.3` is a stabilization release.
|
||||
Goal: restore trust and reliability quickly without mixing in large architectural changes.
|
||||
|
||||
This branch is intentionally isolated from the forward/unified-resource work.
|
||||
|
||||
## Guardrails (Non-Negotiable)
|
||||
- Do not merge any unified-resource/navigation overhaul work into this branch.
|
||||
- Keep fixes minimal, targeted, and low-risk.
|
||||
- Every fix must have either:
|
||||
- a reproducer and a test, or
|
||||
- a reproducer and explicit manual validation evidence.
|
||||
- Do not send customer follow-ups until behavior is verified locally or in known-good diagnostics.
|
||||
|
||||
## Known Customer Context (Cosmin)
|
||||
Recent thread context (Feb 6-7, 2026):
|
||||
- License appeared valid but Pro areas were locked (reported on 5.1.2).
|
||||
- Docker/Swarm alert behavior looked incorrect to customer.
|
||||
- Customer explicitly challenged prior explanation ("services are up, why 0.0 of 0?").
|
||||
- Prior thread included accidental/incorrect outbound messages; trust is currently fragile.
|
||||
|
||||
Implication for 5.1.3:
|
||||
- Prioritize correctness and confidence over breadth.
|
||||
- Release should avoid speculative claims and include clear, verified behavior notes.
|
||||
|
||||
## Priority Scope
|
||||
|
||||
## P0 (Must Ship in 5.1.3)
|
||||
1. Proxmox data freshness / false offline / stale state reliability
|
||||
Issues: `#1094`, `#1204`, `#1192`, `#1199`
|
||||
|
||||
2. Alerting loop reliability and stale-evaluator behavior
|
||||
Issues: `#1096`, `#1179`, `#1159`, `#1043`
|
||||
|
||||
3. Swarm service alert correctness (false warning patterns)
|
||||
Related customer complaint + issues: `#1202` (metrics gap), alert symptoms seen in support thread
|
||||
|
||||
4. License gate hardening against key/config mismatch regressions
|
||||
Not a clean open issue for this exact latest incident, but high business impact from support thread.
|
||||
At minimum:
|
||||
- add startup/assertion logging around active license verification key fingerprint
|
||||
- add test/guard so wrong-key build cannot silently pass CI/release path
|
||||
|
||||
## P1 (Ship If Low Risk, Else Defer)
|
||||
1. Host URL edit discoverability/regression
|
||||
Issue: `#1197`
|
||||
|
||||
2. Release notes "View details" broken link
|
||||
Issue: `#1195`
|
||||
|
||||
3. Rootless Docker detection
|
||||
Issue: `#1200`
|
||||
|
||||
4. Backup attribution correctness (duplicate VMID edge cases)
|
||||
Issue: `#1177`
|
||||
|
||||
5. VM disk totalBytes inflation edge cases
|
||||
Issue: `#1158`
|
||||
|
||||
## P2 (Explicitly Defer Unless Free/Fast)
|
||||
- Mobile rendering regressions (`#1196`)
|
||||
- Reporting engine initialization (`#1186`)
|
||||
- Broader enhancement requests (for example partition exclusion)
|
||||
|
||||
## Start Checklist (Do This First)
|
||||
1. Confirm branch and base:
|
||||
- `git status`
|
||||
- `git log --oneline -n 3`
|
||||
- `git describe --tags --exact-match` should be `v5.1.2` at branch start
|
||||
|
||||
2. Create a tracking checklist issue or local checklist from this doc.
|
||||
|
||||
3. Reproduce P0 items one by one with minimal fixtures/diagnostics.
|
||||
|
||||
4. Define acceptance criteria before coding each fix.
|
||||
|
||||
5. Implement smallest safe patch per item, with tests where possible.
|
||||
|
||||
## Suggested Execution Order
|
||||
1. Proxmox stale/offline reliability (`#1094` cluster)
|
||||
Reason: highest customer pain + long-lived issue + high comment volume.
|
||||
|
||||
2. Alerting deadlock/stale evaluations (`#1096` cluster)
|
||||
Reason: can cause monitoring trust collapse across features.
|
||||
|
||||
3. Swarm alert correctness and messaging
|
||||
Reason: directly tied to active customer thread and confusion.
|
||||
|
||||
4. License verification hardening
|
||||
Reason: low frequency, high severity business impact.
|
||||
|
||||
5. Quick P1 regressions (`#1195`, `#1197`) if near-zero risk.
|
||||
|
||||
## Engineering Standards for This Hotfix
|
||||
- One logical fix per commit.
|
||||
- Commit message format:
|
||||
- `fix(<area>): <what changed> (#issue)`
|
||||
- Add/adjust tests close to fix location.
|
||||
- Prefer surgical changes over refactors.
|
||||
- Keep public behavior notes precise (no guessing).
|
||||
|
||||
## Verification Matrix (Release Gate)
|
||||
All items below must pass before tagging `v5.1.3`.
|
||||
|
||||
1. Backend tests:
|
||||
- `make test`
|
||||
|
||||
2. Frontend lint/build sanity:
|
||||
- `make lint-frontend`
|
||||
- `make frontend`
|
||||
|
||||
3. Full build:
|
||||
- `make build`
|
||||
|
||||
4. Manual smoke checks:
|
||||
- Proxmox: nodes remain fresh/online over extended run window
|
||||
- Alerts: no freeze/stale evaluator after offline->online transitions
|
||||
- Swarm: no false warning spam for healthy services
|
||||
- License: valid Pro key unlocks Pro features consistently after restart/update
|
||||
- "View details" link works (if patched)
|
||||
- Host URL editing path is clear and functional (if patched)
|
||||
|
||||
5. Support bundle check:
|
||||
- Confirm diagnostics/export contains enough evidence for future triage.
|
||||
|
||||
## Release Steps (End State)
|
||||
1. Update release notes with only confirmed fixes.
|
||||
2. Bump version to `5.1.3` where applicable.
|
||||
3. Tag and publish release from this branch.
|
||||
4. Post-release:
|
||||
- comment on fixed issues with exact version and validation notes
|
||||
- close only issues that are truly verified
|
||||
5. Back-merge/cherry-pick hotfix commits into forward branch:
|
||||
- `pulse/unified-resource-pre-hotfix-2026-02-07` (or newer forward branch)
|
||||
|
||||
## Definition of Done
|
||||
- `v5.1.3` shipped from hotfix branch.
|
||||
- P0 reliability regressions fixed and validated.
|
||||
- Release notes are factual and test-backed.
|
||||
- Hotfix commits are propagated back to forward branch.
|
||||
- Customer follow-up (including Cosmin) can be sent with confidence and concrete fixes.
|
||||
|
||||
## Notes for Customer Comms (When Ready)
|
||||
- Lead with verified outcomes, not hypotheses.
|
||||
- For each reported symptom:
|
||||
- what was wrong
|
||||
- what changed in `5.1.3`
|
||||
- what the customer should expect now
|
||||
- what to send if still reproducible (diagnostics bundle path)
|
||||
Loading…
Add table
Add a link
Reference in a new issue