All posts
ClickHouse® Keeper on Kubernetes: The Foundation for Replication

ClickHouse® Keeper on Kubernetes: The Foundation for Replication

May 19, 20265 min readReshma M
Share:

This is the seventh article in our series on running the ClickHouse® database on Kubernetes with the Altinity® Kubernetes Operator. So far every cluster has been a single copy of the data. To make ClickHouse fault tolerant we need replication, and replication needs a coordination service. That service is ClickHouse Keeper, and this article deploys it.

What is ClickHouse Keeper, and why do you need it

When you run more than one copy of a table, the copies must agree on what data exists and in what order it was written. Something has to be the single source of truth that all replicas consult. Historically ClickHouse used Apache ZooKeeper for this. ClickHouse Keeper is a modern, built-in replacement that speaks the same protocol, uses less memory, and is simpler to operate. It stores the replication metadata, the bookkeeping that lets replicas stay consistent, not your table data.

You do not query Keeper and you do not store rows in it. It runs quietly in the background, and your replicated tables depend on it. No Keeper, no replication.

Quorums: why you run an odd number of nodes

Keeper stays correct by majority vote, called a quorum. A group of nodes can make progress only while more than half are healthy. This is why you run an odd number. A single node has no fault tolerance, since if it dies, replication coordination stops, but it is perfectly fine for learning on a laptop. Three nodes form a quorum of two, so the ensemble keeps working if any one node fails, which is the standard choice for production. Five nodes tolerate two failures. You almost never need more than three.

For this article we deploy a single-node Keeper so it runs comfortably on minikube, and we show the three-node version you would use in production.

Deploying Keeper with a ClickHouseKeeperInstallation

The operator manages Keeper through its own custom resource, the ClickHouseKeeperInstallation, or CHK. It looks much like a CHI but describes a Keeper ensemble. Save this single-node version as keeper.yaml:

apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
  name: keeper
spec:
  configuration:
    clusters:
      - name: keeper
        layout:
          # Use 1 for a laptop, 3 for production fault tolerance.
          replicasCount: 1
    settings:
      keeper_server/tcp_port: "2181"
  defaults:
    templates:
      podTemplate: keeper-pod
      volumeClaimTemplate: keeper-data
  templates:
    podTemplates:
      - name: keeper-pod
        spec:
          containers:
            - name: clickhouse-keeper
              image: clickhouse/clickhouse-keeper:26.3
    volumeClaimTemplates:
      - name: keeper-data
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 1Gi

A few notes. The image is clickhouse/clickhouse-keeper:26.3, matching the ClickHouse server version we use elsewhere. Keeper listens on the standard port 2181. It gets its own small persistent volume to store its logs and snapshots, because Keeper's metadata must also survive restarts. Apply it:

kubectl create namespace ch
kubectl apply -n ch -f keeper.yaml

Watch it come up:

kubectl get chk -n ch -w
kubectl get pods -n ch

The chk shorthand lists your Keeper installations. Once the pod is Running, Keeper is ready. The operator also created a Service that ClickHouse will use to reach it.

The production version: three nodes

For production you change one number and give it real storage. This is the same resource with replicasCount: 3:

spec:
  configuration:
    clusters:
      - name: keeper
        layout:
          replicasCount: 3
    settings:
      keeper_server/tcp_port: "2181"

The operator deploys three Keeper pods, configures them as an ensemble, and they elect a leader and maintain a quorum automatically. You did not have to configure peer addresses or raft IDs by hand; the operator did it.

Connecting ClickHouse to Keeper

Now we tell a ClickHouse cluster to use this Keeper. The cleanest way with the operator is to reference the CHK by name. The operator discovers the Keeper endpoints for you, so you never hard-code addresses. Save this as ch-with-keeper.yaml:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "ch"
spec:
  configuration:
    zookeeper:
      keeper:
        name: keeper          # the CHK we just deployed
        # serviceType: replicas  # default; one entry per Keeper replica
    clusters:
      - name: "main"
        layout:
          shardsCount: 1
          replicasCount: 2
  templates:
    podTemplates:
      - name: clickhouse-pod
        spec:
          containers:
            - name: clickhouse
              image: clickhouse/clickhouse-server:26.3

The zookeeper.keeper.name field points at our CHK named keeper. The cluster asks for two replicas, which only becomes meaningful because Keeper is now present to coordinate them. Apply it:

kubectl apply -n ch -f ch-with-keeper.yaml
kubectl get chi -n ch -w

When it reports Completed, you have two ClickHouse replicas wired to Keeper. They are not yet replicating any data, because replication happens at the table level, which is the subject of the next article. But the foundation is in place.

If you ever need to point at an external ZooKeeper or Keeper instead, you can list addresses explicitly under zookeeper.nodes with a host and port 2181, but the keeper.name reference is simpler and is what we use throughout this series.

Verifying Keeper is reachable

You can confirm ClickHouse sees Keeper by querying a system table from inside a pod:

kubectl exec -it -n ch chi-ch-main-0-0-0 -- \
  clickhouse-client -q "SELECT * FROM system.zookeeper WHERE path='/' FORMAT Vertical"

A list of nodes under the root path means ClickHouse is talking to Keeper successfully.

Clean up

kubectl delete namespace ch

This removes the ClickHouse cluster and the Keeper ensemble together.

What is next

Keeper is running and your cluster is connected to it. In the next article we finally build a replicated cluster: create a ReplicatedMergeTree table, insert into one replica, watch it appear on the other, and add a Distributed table to query across the cluster.

References

Work with Quantrail

Expert ClickHouse services

We design, migrate, tune, and run ClickHouse for teams that own their data, from first architecture through day-two operations. Tell us what you are building and we will help.

Talk to an expert

Manage ClickHouse with CHOps

CHOps is our free, open-source ClickHouse admin tool: monitoring, query profiling, backups, visual access control, and alerting in one self-hosted interface, with zero agents on your servers.

Explore CHOps
Share: