This is the tenth article in our series on running the ClickHouse® database on Kubernetes with the Altinity® Kubernetes Operator. A running cluster is not a finished cluster. New ClickHouse versions ship monthly, and the operator itself gets updates. This article shows how to apply those changes without taking your database offline.
Two different upgrades
There are two things you might upgrade, and they are independent. One is the ClickHouse version, the database image your pods run. The other is the operator version, the controller in kube-system that manages your clusters. We cover both, starting with the more common one.
How a rolling ClickHouse upgrade works
To change the ClickHouse version you change the image in your CHI and reapply. The operator does not restart everything at once. It performs a rolling update: it upgrades one host at a time, optionally removing that host from the cluster's routing first, waiting for it to come back healthy, and only then moving to the next. Because your data is replicated, the other replicas keep serving queries while one pod restarts. Done this way, a version change causes no downtime for a properly replicated cluster.
To upgrade, edit the image tag in your pod template, for example from 26.3 to a newer release, and reapply:
spec:
templates:
podTemplates:
- name: clickhouse-pod
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:26.4kubectl apply -n ch -f cluster.yaml
kubectl get pods -n ch -wYou will see pods restart one by one, each picking up the new version, while the cluster stays available throughout.
Canary first: test a new version on one replica
Restarting every node onto a brand new version at once is risky. The safer pattern is a canary: run the new version on a single replica, verify it behaves, and only then roll it out everywhere. The operator supports this by letting you override the pod template for one specific replica while the rest stay on the current version.
You define two pod templates, the current version and the candidate, then point one replica of one shard at the candidate template:
spec:
configuration:
zookeeper:
keeper:
name: keeper
clusters:
- name: "main"
templates:
podTemplate: ch-current
layout:
shardsCount: 2
replicasCount: 2
shards:
- name: "0"
replicas:
- name: "1"
templates:
podTemplate: ch-candidate
templates:
podTemplates:
- name: ch-current
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:26.3
- name: ch-candidate
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:26.4When you apply this, only that one replica restarts onto the candidate version. The rest of the cluster keeps running the current version. You now have a real node on the new version, taking real replicated traffic, that you can observe.
Propagating the update
Once the canary has proven itself, you propagate the new version to the whole cluster by making it the default and removing the override. Set the cluster's default pod template to the new image and delete the per-replica override, then reapply:
spec:
configuration:
clusters:
- name: "main"
templates:
podTemplate: ch-new
layout:
shardsCount: 2
replicasCount: 2
templates:
podTemplates:
- name: ch-new
spec:
containers:
- name: clickhouse
image: clickhouse/clickhouse-server:26.4The operator rolls the remaining nodes onto the new version one at a time, exactly as before, with no downtime. Canary, observe, propagate is the pattern to internalize for every version bump.
A real upgrade gotcha to know about
Operational honesty matters, so here is a concrete caution from the operator's own release notes. On some newer ClickHouse versions running on Kubernetes, an upstream regression could cause DDL statements, such as CREATE TABLE, to fail on freshly created pods. The documented workaround is simply to restart the affected ClickHouse pods, and the underlying issue is fixed in current patch releases. The lesson is general: before upgrading a production cluster, read the release notes for both ClickHouse and the operator, and test on a canary first. Surprises are far cheaper to discover on one node than on all of them.
Upgrading the operator itself
The operator is upgraded separately from your clusters. If you installed it with kubectl, apply the newer bundle:
kubectl apply -f https://raw.githubusercontent.com/Altinity/clickhouse-operator/master/deploy/operator/clickhouse-operator-install-bundle.yamlIf you installed it with Helm, update the repository and upgrade the release. Note that the Helm chart asks you to apply the updated Custom Resource Definitions separately during an upgrade, so follow the chart's upgrade instructions:
helm repo update clickhouse-operator
helm upgrade clickhouse-operator clickhouse-operator/altinity-clickhouse-operatorUpgrading the operator can trigger a rolling restart of managed clusters if pod templates or labels changed between versions, so treat an operator upgrade with the same care as a database upgrade: read its release notes, do it during a quiet window, and watch the rollout.
Verifying after any upgrade
After an upgrade, confirm the version is what you expect and the cluster is healthy:
kubectl exec -n ch chi-ch-main-0-0-0 -- clickhouse-client -q "SELECT version()"
kubectl get chi -n chA Completed status on the CHI and the expected version string mean the rollout finished cleanly.
What is next
You can now evolve the cluster safely over time. In the next article we add eyes to the system: monitoring ClickHouse on Kubernetes with Prometheus and Grafana, so you can see health, performance, and problems before your users do.



