Pulse/docs/KUBERNETES.md
rcourtman ab62b46c1f Fix helm chart agent.enabled by routing through main pulse image
The chart's agent.image.repository defaulted to ghcr.io/rcourtman/pulse-agent,
an image that has never been published. publish-docker.yml only pushes
rcourtman/pulse; the Dockerfile defines an agent_runtime stage that
*could* be published but it isn't, and commit da7969fb4 from earlier in
this session removed the corresponding pulse-agent attestation
expectations — a clear signal the separate agent image was intentionally
dropped without updating the chart. Customers running
`helm install pulse pulse/pulse --set agent.enabled=true` were silently
hitting ImagePullBackOff on the agent DaemonSet.

Route the chart through the main rcourtman/pulse image instead. To make
that work without per-arch chart overrides, the runtime stage in the
Dockerfile now creates an arch-resolved /usr/local/bin/pulse-agent
symlink to the right /opt/pulse/bin/pulse-agent-linux-{amd64,arm64,armv7}
binary. The chart's agent.command default is /usr/local/bin/pulse-agent,
which overrides the server ENTRYPOINT and runs the pod as a unified
agent on whichever arch the node provides. agent.yaml renders the
command via toYaml so list values pass through cleanly.

KUBERNETES.md's DaemonSet example switches from the arch-hardcoded
/opt/pulse/bin/pulse-agent-linux-amd64 to the new arch-resolved path,
restoring multi-arch portability of the docs example.
validate-release.sh asserts the symlink exists, points at one of the
three supported Linux arch binaries, and is executable in the published
image. A new TestHelmAgentRuntimePointsAtRealImage pins the chart
defaults, the template wiring, the Dockerfile symlink, and the
validate-release.sh guard so the regression class can't quietly
resurface.

Governance: extend the helm-chart-release-runtime verification policy's
exact_files to include scripts/installtests/build_release_assets_test.go
(matching its existing pin set for related deployment-installability
policies); update the subsystem_lookup_test.py fixture that pins the
exact_files list; document the agent-image and pulse-agent symlink
contract in deployment-installability.md Extension Point 7.

Verified locally: `helm lint` passes; `helm template --set agent.enabled=true`
renders a DaemonSet with image rcourtman/pulse:6.0.0,
command ["/usr/local/bin/pulse-agent"], args ["--enable-docker", "--enable-host=false"].
End-to-end image build + agent DaemonSet smoke will run via helm_smoke
on the next release once rcourtman/pulse:6.0.0 is published.
2026-05-12 16:11:56 +01:00

274 lines
8.7 KiB
Markdown

# Pulse on Kubernetes
This guide explains how to deploy the Pulse Server (Hub) and Pulse Agents on Kubernetes clusters, including immutable distributions like Talos Linux.
> **Navigation note (v6):** Kubernetes cluster and node resources appear on the **Infrastructure** page, while pods appear on the **Workloads** page. The legacy `/kubernetes` URL redirects to `/workloads?type=k8s`.
## Prerequisites
- A Kubernetes cluster (v1.19+)
- `helm` (v3+) installed locally
- `kubectl` configured to talk to your cluster
## 1. Deploying the Pulse Server
The Pulse Server is the central hub that collects metrics and manages agents.
### Option A: Using Helm (Recommended)
1. Add the Pulse Helm repository:
```bash
helm repo add pulse https://rcourtman.github.io/Pulse
helm repo update
```
2. Install the chart:
```bash
helm upgrade --install pulse pulse/pulse \
--namespace pulse \
--create-namespace \
--set persistence.enabled=true \
--set persistence.size=10Gi
```
> **Note**: For production, ensure you configure a proper `persistence.storageClass` or `strategy.type=Recreate` if using ReadWriteOnce (RWO) volumes. The chart's default `strategy.type` is `RollingUpdate`, which can hit Multi-Attach errors with RWO PVCs during upgrade.
### Option B: Generating Static Manifests (For Talos / GitOps)
If you cannot use Helm directly on the cluster (e.g., restricted Talos environment), you can generate standard Kubernetes YAML manifests:
```bash
helm repo add pulse https://rcourtman.github.io/Pulse
helm repo update
helm template pulse pulse/pulse \
--namespace pulse \
--set persistence.enabled=true \
> pulse-server.yaml
```
You can then apply this file:
```bash
kubectl apply -f pulse-server.yaml
```
## 2. Deploying the Pulse Agent
### Helm Chart Agent Mode
The Helm chart includes an optional `agent` section that deploys the unified `pulse-agent`.
By default, this workload runs in container-monitoring mode (`--enable-docker --enable-host=false`).
For Kubernetes monitoring, use a custom DaemonSet as shown below.
### Unified Agent on Kubernetes (DaemonSet)
To monitor Kubernetes resources, run the unified agent as a DaemonSet and enable the Kubernetes module.
**Recommended options:**
- **Kubernetes-only monitoring**: `PULSE_ENABLE_KUBERNETES=true` and `PULSE_ENABLE_HOST=false` (no host mounts required).
- **Kubernetes + node metrics**: `PULSE_ENABLE_KUBERNETES=true` and `PULSE_ENABLE_HOST=true` (requires host mounts and privileged mode).
#### Minimal DaemonSet Example
This uses the main `rcourtman/pulse` image but runs the `pulse-agent` binary directly.
```yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: pulse-agent
namespace: pulse
spec:
selector:
matchLabels:
app: pulse-agent
template:
metadata:
labels:
app: pulse-agent
spec:
serviceAccountName: pulse-agent
containers:
- name: pulse-agent
image: rcourtman/pulse:latest
# /usr/local/bin/pulse-agent is an arch-resolved symlink in the
# main Pulse image, so this manifest works on both amd64 and
# arm64 nodes without changes.
command: ["/usr/local/bin/pulse-agent"]
args:
- --enable-kubernetes
env:
- name: PULSE_URL
value: "http://pulse-server.pulse.svc.cluster.local:7655"
- name: PULSE_TOKEN
value: "YOUR_API_TOKEN_HERE"
- name: PULSE_AGENT_ID
value: "my-k8s-cluster"
- name: PULSE_ENABLE_HOST
value: "false"
- name: PULSE_KUBE_INCLUDE_ALL_PODS
value: "true"
- name: PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS
value: "true"
securityContext:
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
memory: 512Mi
tolerations:
- operator: Exists
```
> **Note for ARM64 clusters**: Replace `pulse-agent-linux-amd64` with `pulse-agent-linux-arm64`.
Use a token scoped for the agent:
- `kubernetes:report` for Kubernetes reporting
- `agent:report` if you enable host metrics
#### Important DaemonSet Configuration
##### PULSE_AGENT_ID (Required for DaemonSets)
When running as a DaemonSet, all pods share the same API token but need a unified identity. Without `PULSE_AGENT_ID`, each pod auto-generates a unique ID (e.g., `mac-xxxxx`), causing token conflicts:
```text
API token is already in use by agent "mac-aa5496fed726". Each Kubernetes agent must use a unique API token.
```
Set `PULSE_AGENT_ID` to a shared cluster name so all pods report as one logical agent:
```yaml
- name: PULSE_AGENT_ID
value: "my-k8s-cluster"
```
##### Resource Visibility Flags
By default, Pulse only shows resources with problems (unhealthy pods, failing deployments). To see all resources:
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `PULSE_KUBE_INCLUDE_ALL_PODS` | Show all non-succeeded pods, not just problematic ones | `false` |
| `PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS` | Show all deployments, not just those with issues | `false` |
For most monitoring use cases, set both to `true`:
```yaml
- name: PULSE_KUBE_INCLUDE_ALL_PODS
value: "true"
- name: PULSE_KUBE_INCLUDE_ALL_DEPLOYMENTS
value: "true"
```
See [UNIFIED_AGENT.md](UNIFIED_AGENT.md) for all available configuration options.
#### Add Host Metrics (Optional)
If you want node CPU/memory/disk metrics, add privileged mode plus host mounts:
```yaml
env:
- name: PULSE_ENABLE_HOST
value: "true"
- name: HOST_PROC
value: "/host/proc"
- name: HOST_SYS
value: "/host/sys"
- name: HOST_ETC
value: "/host/etc"
securityContext:
privileged: true
volumeMounts:
- name: host-proc
mountPath: /host/proc
readOnly: true
- name: host-sys
mountPath: /host/sys
readOnly: true
- name: host-root
mountPath: /host/root
readOnly: true
volumes:
- name: host-proc
hostPath:
path: /proc
- name: host-sys
hostPath:
path: /sys
- name: host-root
hostPath:
path: /
```
#### RBAC
The Kubernetes agent uses the in-cluster API and needs read access to cluster resources (nodes, pods, deployments, etc.). Create a read-only `ClusterRole` and bind it to the `pulse-agent` service account.
```yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: pulse-agent
namespace: pulse
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: pulse-agent-read
rules:
- apiGroups: [""]
resources: ["nodes", "pods"]
verbs: ["get", "list", "watch"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "watch"]
# Optional (Recovery): VolumeSnapshots and Velero backups.
# These rules are safe to include even if the APIs are not installed; the agent will
# feature-detect and ignore 404/403 responses.
- apiGroups: ["snapshot.storage.k8s.io"]
resources: ["volumesnapshots"]
verbs: ["get", "list", "watch"]
- apiGroups: ["velero.io"]
resources: ["backups"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: pulse-agent-read
subjects:
- kind: ServiceAccount
name: pulse-agent
namespace: pulse
roleRef:
kind: ClusterRole
name: pulse-agent-read
apiGroup: rbac.authorization.k8s.io
```
## 3. Talos Linux Specifics
Talos Linux is immutable, so you cannot install the agent via the shell script. Use the DaemonSet approach above.
### Agent Configuration for Talos
- **Storage**: Talos mounts the ephemeral OS on `/`. Persistent data is usually in `/var`. The Pulse agent generally doesn't store state, but if it did, ensure it maps to a persistent path.
- **Network**: The agent will report the Pod IP by default. To report the Node IP, set `PULSE_REPORT_IP` using the Downward API:
Add this to the DaemonSet `env` section:
```yaml
- name: PULSE_REPORT_IP
valueFrom:
fieldRef:
fieldPath: status.hostIP
```
## 4. Troubleshooting
- **Agent not showing in UI**: Check logs for the DaemonSet pods, for example: `kubectl logs -l app=pulse-agent -n pulse`.
- **"Permission Denied" on metrics**: Ensure `securityContext.privileged: true` is set or proper capabilities are added.
- **Connection Refused**: Ensure `PULSE_URL` is correct and reachable from the agent pods.