This is the eleventh article in our series on running the ClickHouse® database on Kubernetes with the Altinity® Kubernetes Operator. A cluster you cannot see is a cluster you cannot trust. This article gives you eyes on the system with Prometheus for collecting metrics and Grafana for visualizing them.
What to monitor and why
For a ClickHouse cluster the metrics that matter most are query throughput and latency, memory usage, disk space, the number of active data parts (too many signals merge pressure), and replication health (whether replicas are keeping up through Keeper). Watching these lets you catch a full disk, a memory-hungry query, or a lagging replica before it becomes an outage.
How metrics get out of ClickHouse
Two sources feed your monitoring. First, the operator runs a metrics exporter that scrapes every cluster it manages and republishes the data in Prometheus format. It is exposed by a Service called clickhouse-operator-metrics in the kube-system namespace on port 8888. Second, the operator annotates the ClickHouse pods so a Prometheus that does Kubernetes service discovery will find and scrape them automatically.
You can see the operator metrics directly. Port-forward the service and open the endpoint:
kubectl -n kube-system port-forward service/clickhouse-operator-metrics 8888Then visit http://localhost:8888/metrics in a browser. The wall of text you see is the raw metrics Prometheus will collect.
Step 1: Install Prometheus and Grafana
The simplest way to get both at once is the community kube-prometheus-stack Helm chart, which bundles Prometheus, Grafana, and Alertmanager and wires them together. Install it into a dedicated namespace:
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create namespace monitoring
helm install monitoring prometheus-community/kube-prometheus-stack -n monitoringGive it a minute, then confirm the pods are running:
kubectl get pods -n monitoringYou will see Prometheus, Grafana, and Alertmanager pods come up. If you prefer, Altinity also publishes ready-made Prometheus and Grafana manifests in the operator repository, but the Helm stack is the gentlest starting point.
Step 2: Tell Prometheus to scrape the operator
Prometheus needs to know about the operator's metrics endpoint. With the Prometheus Operator that the stack installs, you express a scrape target as a small ServiceMonitor resource pointing at the operator's metrics Service. Save this as operator-monitor.yaml:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: clickhouse-operator
namespace: monitoring
labels:
release: monitoring
spec:
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app: clickhouse-operator
endpoints:
- port: metrics
path: /metricskubectl apply -f operator-monitor.yamlThe release: monitoring label is what lets the stack's Prometheus pick up this ServiceMonitor. Once applied, Prometheus begins scraping ClickHouse metrics through the operator. You can confirm by port-forwarding Prometheus and checking its targets page:
kubectl -n monitoring port-forward svc/monitoring-kube-prometheus-prometheus 9090Open http://localhost:9090/targets and look for the ClickHouse operator target in the up state.
Step 3: Open Grafana and add the data source
Grafana is where the numbers become charts. Get its admin password and port-forward it:
kubectl -n monitoring get secret monitoring-grafana \
-o jsonpath="{.data.admin-password}" | base64 -d; echo
kubectl -n monitoring port-forward svc/monitoring-grafana 3000:80Open http://localhost:3000 and log in as admin with the password you just printed. The kube-prometheus-stack already configures Prometheus as the default data source, so you can go straight to dashboards. If you ever add Grafana separately, point its Prometheus data source at the in-cluster address http://<prometheus-service>.<namespace>.svc.cluster.local:9090 using proxy access.
Step 4: Import the Altinity dashboard
Altinity publishes a ready-made Grafana dashboard for the operator and the ClickHouse clusters it manages. In Grafana, go to Dashboards, choose Import, and upload the Altinity ClickHouse Operator dashboard JSON from the operator repository (or paste its dashboard ID), then select your Prometheus data source. You immediately get panels for query rates, memory, parts, replication, and more, without building anything by hand.
Step 5: Alerts
Collecting metrics is only useful if something tells you when they go wrong. The operator repository ships a set of Prometheus alert rules for ClickHouse, covering conditions like a replica falling behind, too many parts, or a server becoming unreachable. Apply those rules to your Prometheus, and route them through Alertmanager to email or a chat channel so a human hears about trouble. Start with the provided rules and tune the thresholds to your workload over time.
A quick health check from the database itself
Dashboards aside, ClickHouse exposes its own state through system tables, which are handy for a quick look or a custom panel:
SELECT metric, value FROM system.metrics WHERE metric LIKE '%Query%';
SELECT database, table, count() AS parts FROM system.parts WHERE active GROUP BY database, table ORDER BY parts DESC LIMIT 10;The first shows live query activity; the second shows part counts per table, a key signal of merge health.
What is next
You can now see your cluster's health and get alerted when it degrades. In the next article we harden it: encrypting connections with TLS, managing certificates, locking down users and networks, and keeping every secret in Kubernetes rather than in your manifests.



