A Production-Grade ClickHouse® Cluster on Kubernetes with the Altinity® Operator

This is the fourteenth article in our series on running the ClickHouse® database on Kubernetes with the Altinity® Kubernetes Operator. We have built up every piece on its own. Now we assemble them into one production-grade cluster, so you can see how the parts fit and use it as a template for your own deployment.

This article is a capstone. Each block below was explained in its own earlier article, linked as we go, so treat this as the integrated reference.

What we are building

A fault-tolerant analytical cluster: two shards for capacity, two replicas per shard for safety, a three-node Keeper ensemble for coordination, durable and separated storage, credentials and certificates in Secrets, TLS everywhere, deliberate pod placement, sensible resources, and monitoring hooks. This is meant for a real multi-node cluster, not a single-node laptop, because placement rules need several nodes.

Prerequisites

Before applying the manifests, you need a few things in place: the operator installed (the introduction article), a StorageClass backed by fast disks, nodes spread across availability zones, a Kubernetes Secret holding your user credentials, and a TLS Secret holding your certificate and key. Create the namespace and credential Secret like this:

kubectl create namespace prod
kubectl create secret generic ch-credentials -n prod \
  --from-literal=analyst_hash='<sha256-of-your-password>'
kubectl create secret tls clickhouse-tls -n prod \
  --cert=server.crt --key=server.key

Step 1: The Keeper ensemble

Replication needs Keeper, and production needs three nodes so it survives a failure, as covered in the Keeper article. Save this as keeper-prod.yaml:

apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
  name: keeper
spec:
  configuration:
    clusters:
      - name: keeper
        layout:
          replicasCount: 3
    settings:
      keeper_server/tcp_port: "2181"
  defaults:
    templates:
      podTemplate: keeper-pod
      volumeClaimTemplate: keeper-data
  templates:
    podTemplates:
      - name: keeper-pod
        podDistribution:
          - type: ClickHouseAntiAffinity
            scope: ClickHouseInstallation
        spec:
          containers:
            - name: clickhouse-keeper
              image: clickhouse/clickhouse-keeper:26.3
              resources:
                requests:
                  cpu: "1"
                  memory: 1Gi
                limits:
                  cpu: "2"
                  memory: 2Gi
    volumeClaimTemplates:
      - name: keeper-data
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi

Three Keeper pods, kept on separate nodes by anti-affinity, each with its own durable volume and bounded resources.

Step 2: The ClickHouse cluster

This is the full CHI, combining storage, users, TLS, placement, and resources. Save it as clickhouse-prod.yaml:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "ch"
spec:
  defaults:
    templates:
      dataVolumeClaimTemplate: data-volume
      logVolumeClaimTemplate: log-volume
      podTemplate: clickhouse-pod
  configuration:
    # Coordination: reference the Keeper ensemble by name.
    zookeeper:
      keeper:
        name: keeper
    # Security: named user from a Secret, default user removed.
    users:
      analyst/password_sha256_hex:
        valueFrom:
          secretKeyRef:
            name: ch-credentials
            key: analyst_hash
      analyst/networks/ip:
        - 10.0.0.0/8
    files:
      users.d/remove_default.xml: |
        <clickhouse>
          <users>
            <default remove="1"/>
          </users>
        </clickhouse>
    # Topology: 2 shards x 2 replicas, TLS enabled.
    clusters:
      - name: "main"
        secure: "yes"
        security:
          clickhouse:
            tls:
              verify: Strict
              minVersion: "1.3"
        layout:
          shardsCount: 2
          replicasCount: 2
  templates:
    podTemplates:
      - name: clickhouse-pod
        metadata:
          annotations:
            prometheus.io/scrape: "true"
        podDistribution:
          - type: ClickHouseAntiAffinity
            scope: ClickHouseInstallation
        spec:
          containers:
            - name: clickhouse
              image: clickhouse/clickhouse-server:26.3
              resources:
                requests:
                  cpu: "2"
                  memory: 8Gi
                limits:
                  cpu: "4"
                  memory: 16Gi
    volumeClaimTemplates:
      - name: data-volume
        spec:
          storageClassName: fast-ssd     # your production StorageClass
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 200Gi
      - name: log-volume
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi

Reading the manifest

Every block here maps to an article you have already read. The zookeeper.keeper reference wires the cluster to the three-node ensemble. The users block pulls the analyst's hashed password from a Secret and restricts its network, while the files block removes the empty-password default user, the security practices from the security article. The cluster's secure flag and tls policy encrypt traffic. The layout defines two shards and two replicas, and the pod template's anti-affinity keeps copies on separate nodes, as in the scaling article. Resources requests and limits size each pod from the resources article, and the data and log volumes use a fast StorageClass from the storage article. The Prometheus annotation ties into the monitoring you set up. Nothing here is new; it is the sum of the series.

Step 3: Deploy and verify

Apply the Keeper first, wait for it, then the cluster:

kubectl apply -n prod -f keeper-prod.yaml
kubectl get chk -n prod -w
kubectl apply -n prod -f clickhouse-prod.yaml
kubectl get chi -n prod -w

When the CHI reports Completed, run through a quick production checklist. Confirm all pods are running and spread across nodes:

kubectl get pods -n prod -o wide

You should see three Keeper pods and four ClickHouse pods on different nodes. Then create a replicated and distributed table as in the replication article, insert data, and confirm it appears across shards and replicas. Finally, check that monitoring is scraping the cluster and that your Grafana dashboard shows the new nodes.

Production readiness checklist

Before calling a cluster production-ready, verify each of these, every one of which this manifest addresses: coordination on a three-node Keeper, at least two replicas per shard, durable storage on a fast class with a safe reclaim policy, the default user removed and real users sourced from Secrets, TLS enabled, anti-affinity spreading copies across nodes (and ideally zones), resource requests and limits set, a PodDisruptionBudget in place (the operator adds it), and monitoring and alerts wired up. When all of those are true, you have something you can run with confidence.

Clean up

kubectl delete namespace prod

What is next

You have a complete production cluster. The remaining articles cover advanced and operational topics that build on this foundation. In the next article we lower storage cost with tiered storage, keeping hot data on fast local disks and moving cold data to object storage like Amazon S3.

A Production-Grade ClickHouse® Cluster on Kubernetes with the Altinity® Operator

What we are building

Prerequisites

Step 1: The Keeper ensemble

Step 2: The ClickHouse cluster

Reading the manifest

Step 3: Deploy and verify

Production readiness checklist

Clean up

What is next

References

Expert ClickHouse services

Manage ClickHouse with CHOps

Related articles

FIPS 140-3 Compliance for ClickHouse® on Kubernetes with the Altinity® Operator

Troubleshooting ClickHouse® on Kubernetes: A Practical Debugging Guide

Tiered Storage for ClickHouse® on Kubernetes: Hot Disks and S3 Cold Storage