Monitoring with Prometheus
This guide walks you through configuring monitoring for the Flux control plane.
Flux uses kube-prometheus-stack to provide a monitoring stack made out of:
- Prometheus Operator - manages Prometheus clusters atop Kubernetes
- Prometheus - collects metrics from the Flux controllers and Kubernetes API
- Grafana dashboards - displays the Flux control plane resource usage and reconciliation stats
- kube-state-metrics - generates metrics about the state of the Kubernetes objects
Install the kube-prometheus-stack
To install the monitoring stack with flux, first register the toolkit Git repository on your cluster:
flux create source git monitoring \
--interval=30m \
--url=https://github.com/fluxcd/flux2 \
--branch=main
Then apply the manifests/monitoring/kube-prometheus-stack kustomization:
flux create kustomization monitoring-stack \
--interval=1h \
--prune=true \
--source=monitoring \
--path="./manifests/monitoring/kube-prometheus-stack" \
--health-check="Deployment/kube-prometheus-stack-operator.monitoring" \
--health-check="Deployment/kube-prometheus-stack-grafana.monitoring"
The above Kustomization will install the kube-prometheus-stack in the monitoring namespace.
Prometheus Configuration
Note that the above configuration is not suitable for production. In order to configure long term storage for metrics and highly availability for Prometheus consult the Helm chart documentation.Install Flux Grafana dashboards
Note that the Flux controllers expose the /metrics endpoint on port 8080.
When using Prometheus Operator you need a PodMonitor object to configure scraping for the controllers.
Apply the
manifests/monitoring/monitoring-config
containing the PodMonitor and the ConfigMap with Flux’s Grafana dashboards:
flux create kustomization monitoring-config \
--interval=1h \
--prune=true \
--source=monitoring \
--path="./manifests/monitoring/monitoring-config"
You can access Grafana using port forwarding:
kubectl -n monitoring port-forward svc/kube-prometheus-stack-grafana 3000:80
To log in to the Grafana dashboard, you can use the default credentials from the kube-prometheus-stack chart:
username: admin
password: prom-operator
Flux dashboards
Control plane dashboard http://localhost:3000/d/flux-control-plane:


Cluster reconciliation dashboard http://localhost:3000/d/flux-cluster:

If you wish to use your own Prometheus and Grafana instances, then you can import the dashboards from GitHub.
Metrics
For each toolkit.fluxcd.io kind,
the controllers expose a gauge metric to track the Ready condition status,
and a histogram with the reconciliation duration in seconds.
Ready status metrics:
gotk_reconcile_condition{kind, name, namespace, type="Ready", status="True"}
gotk_reconcile_condition{kind, name, namespace, type="Ready", status="False"}
gotk_reconcile_condition{kind, name, namespace, type="Ready", status="Unknown"}
gotk_reconcile_condition{kind, name, namespace, type="Ready", status="Deleted"}
Suspend status metrics:
gotk_suspend_status{kind, name, namespace}
Time spent reconciling:
gotk_reconcile_duration_seconds_bucket{kind, name, namespace, le}
gotk_reconcile_duration_seconds_sum{kind, name, namespace}
gotk_reconcile_duration_seconds_count{kind, name, namespace}
Alert manager example:
groups:
- name: GitOpsToolkit
rules:
- alert: ReconciliationFailure
expr: max(gotk_reconcile_condition{status="False",type="Ready"}) by (namespace, name, kind) + on(namespace, name, kind) (max(gotk_reconcile_condition{status="Deleted"}) by (namespace, name, kind)) * 2 == 1
for: 10m
labels:
severity: page
annotations:
summary: '{{ $labels.kind }} {{ $labels.namespace }}/{{ $labels.name }} reconciliation has been failing for more than ten minutes.'