Commit graph

4826 commits

Author SHA1 Message Date
rcourtman
29c1e504d7 fix: Docker table resource column overlap regression
The max-width:0 CSS trick from d3f22f06 caused the resource column
content to overlap with the TYPE badge. Use a proper max-w-[300px]
class instead to constrain long container names while maintaining
proper column spacing.

Related to #810, #789
2025-12-04 13:29:08 +00:00
rcourtman
51667c0916 docs: Update Home Assistant addon link to new repo
The maintainer consolidated both agent and server addons into a single
repository at homeassistant-addons.
2025-12-04 03:04:52 +00:00
rcourtman
da51449392 fix: Exclude TrueNAS Docker overlay mounts from disk stats
Host agent was including Docker overlay2 mounts from TrueNAS SCALE's
.ix-apps directory in disk totals. These mounts inherit the ZFS pool's
AVAIL space, causing massively inflated storage numbers (e.g., 173 TB
per container overlay instead of actual usage).

Changes:
- Add /mnt/.ix-apps/docker/ to container overlay path exclusions
- Use ShouldSkipFilesystem() in host agent disk collection (was only
  using ShouldIgnoreReadOnlyFilesystem() which missed container paths)
- Add test cases for TrueNAS overlay paths

Related to #718
2025-12-04 03:03:04 +00:00
rcourtman
6aa51fefb5 chore: bump version to 4.36.2 2025-12-03 22:26:52 +00:00
rcourtman
610be6914c fix: TrueNAS SCALE 24.04+ has read-only /usr/local/bin
On TrueNAS SCALE 24.04+, the root filesystem including /usr/local/bin
is read-only. The installer now tries multiple locations for the
runtime binary:

1. Execute directly from /data (if no noexec mount)
2. /usr/local/bin (older TrueNAS versions)
3. /root/bin (TrueNAS SCALE 24.04+)
4. /var/tmp (last resort)

The bootstrap script is also updated to use the determined runtime
location rather than hardcoding /usr/local/bin.

Related to #801
2025-12-03 21:02:55 +00:00
rcourtman
7d733db3a8 fix: Default sensor-proxy HTTP to 0.0.0.0:8443 for IPv4 binding
On systems with net.ipv6.bindv6only=1 (including some Proxmox 8
configurations), using ":8443" results in IPv6-only binding. Users
reported curl to 127.0.0.1:8443 hanging while [::1]:8443 worked.

Changed default from ":8443" to "0.0.0.0:8443" to explicitly bind IPv4.

Related to #805
2025-12-03 20:25:08 +00:00
rcourtman
0d3c9eb2a4 docs: Add Community Integrations section with Home Assistant addon
Added link to Kosztyk's community-maintained Home Assistant addon for
running the Pulse Docker Agent.

Related to discussion #807
2025-12-03 20:18:49 +00:00
rcourtman
4f23cddcae fix: Handle --http-addr with bind address in sensor-proxy installer
When using --http-addr 0.0.0.0:8443 (to bind to IPv4 only), the URL
construction was broken, producing URLs like https://192.168.31.110.0.0.0:8443

Now correctly extracts the port number from both ":8443" and "0.0.0.0:8443"
formats using ${HTTP_ADDR##*:} instead of ${HTTP_ADDR#:}

Related to #805
2025-12-03 20:16:30 +00:00
rcourtman
a11e1c1df3 fix: TrueNAS agent binary now runs from /usr/local/bin to avoid noexec
TrueNAS SCALE's /data partition may have exec=off, preventing binaries
from executing. The installer now:
- Stores the binary in /data/pulse-agent/ for persistence
- Copies it to /usr/local/bin (tmpfs, allows exec) for runtime
- Updates the bootstrap script to copy on each boot

Related to #801
2025-12-03 20:14:48 +00:00
rcourtman
774fac9edd fix: Improve TrueNAS detection for immutable filesystem installs
Added fallback detection for TrueNAS systems that may not have
/etc/truenas-version or other standard markers:

1. Check if hostname contains "truenas" (common default hostname)
2. Test if /usr/local/bin is actually writable - if not and /data
   exists, use TrueNAS installation paths

This fixes installations on TrueNAS systems where the standard
detection files are missing but the filesystem is still immutable.

Related to #801
2025-12-03 18:04:10 +00:00
rcourtman
2e7913b258 fix: Use specific checkbox selector in UnifiedAgents test
The test was using getByRole('checkbox') which now matches multiple
elements after adding the "Skip certificate verification" checkbox.
Use name matcher to select the specific Docker monitoring checkbox.
2025-12-03 14:32:05 +00:00
rcourtman
7d10e97888 feat: Add "Skip certificate verification" option for agent install commands
Adds a checkbox in Settings → Host Agents that enables insecure mode for
users running Pulse behind self-signed HTTPS certificates.

When enabled:
- Adds -k flag to curl commands for downloading the install script
- Adds --insecure flag to the agent for connecting back to Pulse

Related to #806
2025-12-03 14:15:17 +00:00
rcourtman
5f747e8e28 fix: Truncate very long container names in Docker table
Add max-width constraint to the resource column in the Docker table to
ensure very long container names (like Kubernetes UUID-based names) are
properly truncated instead of extending the table width.

Related to #789
2025-12-03 14:09:16 +00:00
rcourtman
4c98933175 fix: Filter container overlay mounts in non-standard locations
Detect container overlay filesystem paths from various container runtimes
(Docker, Podman, LXC, EnhanceCP, etc.) that may not be in standard
/var/lib/docker or /var/lib/containers locations.

Paths containing /containers/ with overlay patterns (/overlay2/, /overlay/,
/diff/, /merged) are now filtered from disk usage aggregation.

Related to #790
2025-12-03 14:06:15 +00:00
rcourtman
f3727d8047 ci: Add retry logic for Docker Hub transient failures 2025-12-03 09:39:31 +00:00
rcourtman
b41ab8e39c chore: bump version to 4.36.1 2025-12-03 09:05:48 +00:00
rcourtman
87eb88dd98 fix: sensor-proxy installer fails silently on containers without snapshots
The SNAPSHOT_START extraction used grep in a pipeline with pipefail
enabled. When a container config has no snapshot sections (no lines
starting with '['), grep returns exit code 1, causing set -e to
terminate the script without any error message.

This affected newly created containers that hadn't been snapshotted yet,
which is the common case for fresh Pulse installations via community
scripts.

Related to #780
2025-12-03 09:04:45 +00:00
rcourtman
4b8fbe6ae2 fix: --disable-host flag now correctly disables host monitoring
The install script was not passing the --enable-host=false flag to the
agent when --disable-host was specified. Since the agent defaults to
enabling host monitoring, it was ignored.

Also adds TrueNAS SCALE support to the unified agent installer:
- Detects TrueNAS SCALE via /etc/truenas-version and other markers
- Installs to /data/pulse-agent (persists across TrueNAS upgrades)
- Creates Init/Shutdown task to restore service after TrueNAS updates
- Adds uninstall support for TrueNAS SCALE

Related to #800, #801
2025-12-03 03:04:03 +00:00
rcourtman
4dd335e74c docs: Fix corrupted section in TEMPERATURE_MONITORING.md
Removed garbled Terraform code that was accidentally merged into the
"How It Works" section, likely from a copy-paste error or merge conflict.
2025-12-02 23:48:16 +00:00
rcourtman
eed045bd15 docs: Remove reference to non-existent security script
The SECURITY.md referenced /opt/pulse/testing-tools/security-verification.sh
which does not exist. Replaced with a manual verification checklist.
2025-12-02 23:47:21 +00:00
rcourtman
5d165fc055 docs: Fix CONFIGURATION.md - logFormat not in system.json
The logFormat setting is only available via LOG_FORMAT environment
variable, not in system.json. Updated the example and added a note
clarifying this. Also added LOG_FORMAT to the environment variables
table.
2025-12-02 23:43:45 +00:00
rcourtman
9de0c1cdb1 docs: Fix rollback instructions in INSTALL.md
The doc claimed a "Restore previous version" button exists in Settings UI,
but this doesn't exist. The rollback API endpoint exists in backend code
but has no UI. Updated to reflect actual behavior: backups are created
during systemd updates and can be restored manually.
2025-12-02 23:42:05 +00:00
rcourtman
8ad02ce048 docs: Remove non-existent LXC installation references
- FAQ.md: Replace LXC installer one-liner with Docker quick start
- MIGRATION.md: Replace LXC mention with Kubernetes
- README.md: Remove "Proxmox LXC" from installation methods list

The install.sh script is a unified agent installer, not an LXC
container creator. Pulse server installation is via Docker,
Kubernetes helm, or manual systemd setup.
2025-12-02 23:40:00 +00:00
rcourtman
96da426a63 docs: Fix DOCKER.md - Alpine-based image with shell access 2025-12-02 23:38:37 +00:00
rcourtman
bf619b9628 docs: Fix /api/storage endpoint path in API.md 2025-12-02 23:37:59 +00:00
rcourtman
8a54156632 docs: Remove LXC references from CONFIGURATION.md 2025-12-02 23:37:11 +00:00
rcourtman
aa2023c533 Fix INSTALL.md inaccuracies
- Remove non-existent Proxmox LXC installer section (install.sh is actually
  the unified agent installer, not an LXC container creator)
- Fix Helm install command to use GitHub Pages repo instead of non-existent
  OCI registry
- Add proper systemd installation instructions with actual commands
- Remove non-existent CLI commands (pulse config rollback, pulse-update.timer)
- Add Kubernetes update/uninstall commands
- Add sudo where needed for systemd commands
2025-12-02 23:36:32 +00:00
rcourtman
a40d5e0f5e Fix inaccurate architecture documentation
- Correct connection methods: Pulse uses REST APIs for PVE/PBS (not SSH)
- Update diagram to show HTTPS API connections on ports 8006/8007
- Add agent push model for Docker/Host metrics collection
- Remove incorrect SSH connection pooling references
- Update data flow to reflect API polling and agent push
2025-12-02 23:32:14 +00:00
rcourtman
d0d989289a Refactor alert system: fix race conditions, memory leaks, and improve code quality
- Rename checkFlapping to checkFlappingLocked to clarify lock contract
- Replace goto statements with structured control flow
- Wire up unused recordAlertFired/recordAlertResolved metric hooks
- Add trackingMapCleanup goroutine to prevent memory leaks from stale entries
- Tighten alert ID validation to alphanumeric + safe punctuation
- Fix history save error handling to properly manage backup lifecycle
- Add auto-migration for deprecated GroupingWindow field
- Refactor 300+ line UpdateConfig into focused helper functions
- Unify duplicate evaluateVMCondition/evaluateContainerCondition
- Add constants for magic numbers (thresholds, timing, flapping)
- Update tests to match new backup behavior
2025-12-02 23:31:36 +00:00
rcourtman
da43588189 Update docs and helm chart for agent health endpoints
- Add health-addr config option to UNIFIED_AGENT.md
- Document /healthz, /readyz, /metrics endpoints
- Add Kubernetes probe examples to docs
- Add liveness/readiness probes to helm chart agent template
- Add healthPort, livenessProbe, readinessProbe to values.yaml
- Update values.schema.json with new agent probe options
2025-12-02 22:45:24 +00:00
rcourtman
7fc15417e4 Add health/metrics server and proper cleanup to unified agent
- Add /healthz (liveness) and /readyz (readiness) endpoints
- Add /metrics endpoint with Prometheus metrics (pulse_agent_info, pulse_agent_up)
- Properly call dockerAgent.Close() on shutdown
- New config: -health-addr flag and PULSE_HEALTH_ADDR env (default :9191)
- Set to empty string to disable health server
2025-12-02 22:42:05 +00:00
rcourtman
b4a33c4f2d Fix offline buffering: add tests, remove unused config, fix flaky test
- Add unit tests for internal/buffer package
- Fix misleading "ring buffer" comment (it's a bounded FIFO queue)
- Remove unused BufferCapacity config field from both agents
- Rewrite flaky integration test to use polling instead of fixed sleeps
2025-12-02 22:31:44 +00:00
courtmanr@gmail.com
caf0c10206 feat: Implement offline buffering for host and docker agents
- Add internal/buffer package with generic ring buffer
- Add buffering logic to host agent for failed reports
- Add buffering logic to docker agent for failed reports
- Add BufferCapacity configuration option
- Add integration tests for buffering logic
2025-12-02 22:12:47 +00:00
rcourtman
bda8056e48 Add refresh-cluster button to detect new Proxmox cluster members
When new nodes are added to a Proxmox cluster after Pulse was
initially configured, they weren't showing up in Settings. The
existing "Refresh" button only triggered network discovery, not
cluster membership re-detection.

Changes:
- Add POST /api/config/nodes/{id}/refresh-cluster endpoint
- Add "Refresh" button in cluster node panel in Settings
- Re-detect cluster membership and update stored endpoints

Related to #799
2025-12-02 22:01:00 +00:00
courtmanr@gmail.com
19c00feced Link ARCHITECTURE.md in SECURITY and DEV-QUICK-START guides 2025-12-02 20:51:37 +00:00
courtmanr@gmail.com
0d2f035292 Update docs: Unified Agent, Migration checklist, and cleanup 2025-12-02 20:49:34 +00:00
courtmanr@gmail.com
3c92c38b27 Update docs with missing config, API endpoints, and Docker Compose 2025-12-02 20:46:21 +00:00
courtmanr@gmail.com
4e0d971fa9 Link ARCHITECTURE.md in documentation 2025-12-02 20:41:39 +00:00
courtmanr@gmail.com
afcc1267bb Add ARCHITECTURE.md system design documentation 2025-12-02 20:40:31 +00:00
rcourtman
d9833cf6b0 fix: Resolve TypeScript errors in StackedMemoryBar and Settings
StackedMemoryBar.tsx:
- Fixed 'props.balloon' possibly undefined error by adding fallback
  to second comparison in Show condition

Settings.tsx:
- Fixed 'systemSettings' scope error by using updateChannel() signal
  instead of referencing out-of-scope variable from previous try block

Both files now pass strict TypeScript checks.
2025-12-02 20:37:44 +00:00
rcourtman
4c5b515cba fix: Update Mail Gateway disconnect state for consistency
Changed from warning (amber) to danger (red) tone and added:
- Dynamic description based on reconnecting status
- Manual "Reconnect now" button when not auto-reconnecting
- Consistent "Connection lost" title

All 7 major pages now have unified connection lost UX:
Dashboard, Storage, Backups, Replication, Hosts, Docker, Mail Gateway
2025-12-02 20:34:32 +00:00
rcourtman
1af0740de2 fix: Add connection lost indicator to Docker page
Docker/Containers page now shows a clear error state when WebSocket
connection is lost, with a manual "Reconnect now" button. This
matches the pattern established across all other major pages.

Connection lost UX is now consistent across: Dashboard, Storage,
Backups, Replication, Hosts, and Docker.
2025-12-02 20:31:59 +00:00
rcourtman
f0ff21ca1b fix: Add connection lost indicator to Hosts page
Hosts page now shows a clear error state when WebSocket connection
is lost, with a manual "Reconnect now" button. Also improved loading
state logic to differentiate between initial loading and connection
loss after having received data.

This completes the connection lost UX consistency across all major
pages: Dashboard, Storage, Backups, Replication, and now Hosts.
2025-12-02 20:29:15 +00:00
rcourtman
272d582262 fix: Add reconnect button to Backups and Replication pages
Both pages now show a consistent disconnect state with:
- Dynamic description based on reconnecting status
- Manual "Reconnect now" button when not auto-reconnecting

This matches the Dashboard and Storage page behavior, providing a
consistent UX across all main pages when connection is lost.
2025-12-02 20:24:34 +00:00
rcourtman
39f8a9f42c fix: Add connection lost indicator to Storage page
Storage page now shows a clear error state when WebSocket connection
is lost, matching the Dashboard's behavior. Users see the issue and
can manually reconnect instead of wondering why data isn't updating.
2025-12-02 20:20:39 +00:00
rcourtman
7ad4ccba49 fix: Correct lastBackup TypeScript type from string to number
The backend sends lastBackup as Unix milliseconds (int64), not as an
ISO string. Update VM and Container interfaces to match the actual
JSON payload.

The getBackupInfo() function already handles both string and number
types, so this is a type-safety fix that aligns types with reality.
2025-12-02 20:15:35 +00:00
rcourtman
a3e60cdd85 chore: Remove unused usePersistentSignal import
Cleanup from Settings sidebar change - the import was left behind
when switching from usePersistentSignal to createSignal.
2025-12-02 20:09:54 +00:00
rcourtman
d620de147a fix: Settings sidebar always starts expanded for discoverability
The sidebar no longer persists its collapsed state to localStorage.
Each visit to Settings starts with the sidebar expanded, showing
all menu labels for better discoverability by new users.

Users can still collapse the sidebar during their session if they
want more space, but it will reset to expanded on page reload.

Related to #764
2025-12-02 20:03:54 +00:00
rcourtman
9e38e4a6f0 feat: Add 'Needs Backup' filter to Dashboard
Add a new filter button that shows only guests with stale, critical,
or missing backups. This makes it easy to identify which VMs and
containers need attention for backup scheduling.

- Adds backupMode state with 'all' and 'needs-backup' options
- Filters out templates (they don't need backups)
- Uses existing getBackupInfo() thresholds (>24h stale, >72h critical)
- Integrates with Reset button and Escape key handling
- Persists filter state in localStorage

Related to #762
2025-12-02 20:00:33 +00:00
rcourtman
abf64c4ed3 fix: Constrain Docker drawer width to force card wrapping
The previous fix (2078421d) added overflow-hidden but didn't address
the root cause: the drawer div inside overflow-x-auto context had no
width constraint, so flex-wrap saw infinite space and didn't wrap.

Adding w-0 min-w-full forces the div to take exactly 100% of parent
width, which properly constrains flex-wrap to wrap cards within the
visible viewport.

Related to #789
2025-12-02 18:02:28 +00:00