All posts
Run a Single-Node ClickHouse® on Kubernetes (the Manual Way)

Run a Single-Node ClickHouse® on Kubernetes (the Manual Way)

May 5, 20266 min readMohamed Hussain S
Share:

This is the third article in our series on running the ClickHouse® database on Kubernetes with the Altinity® Kubernetes Operator. So far we have learned the concepts and built a local cluster. Now we deploy a single ClickHouse node onto it, writing every piece of Kubernetes YAML ourselves.

We do this the manual way on purpose. It is the best way to understand what a ClickHouse deployment is actually made of, and by the end you will feel the friction that the operator removes.

What we are building

A single ClickHouse server running as a Kubernetes StatefulSet, with its data on a PersistentVolumeClaim so it survives pod restarts, reachable through a Service. We will connect, create a table, run an analytical query, and then talk honestly about why this approach does not scale to a real cluster.

You should have a running local cluster from the previous article. Confirm it with kubectl get nodes.

Step 1: Create a namespace

Keeping our work in its own namespace keeps the cluster tidy:

kubectl create namespace clickhouse-manual

Step 2: Write the manifest

Recall from the first article that a database needs three things from Kubernetes: a stable identity (StatefulSet), storage that survives restarts (a PersistentVolumeClaim), and a stable address (a Service). Our manifest provides all three. Save this as clickhouse-single.yaml:

# A headless Service gives the StatefulSet pod a stable DNS name.
apiVersion: v1
kind: Service
metadata:
  name: clickhouse
  namespace: clickhouse-manual
spec:
  clusterIP: None
  selector:
    app: clickhouse
  ports:
    - name: http
      port: 8123
    - name: native
      port: 9000
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: clickhouse
  namespace: clickhouse-manual
spec:
  serviceName: clickhouse
  replicas: 1
  selector:
    matchLabels:
      app: clickhouse
  template:
    metadata:
      labels:
        app: clickhouse
    spec:
      containers:
        - name: clickhouse
          image: clickhouse/clickhouse-server:26.3
          ports:
            - name: http
              containerPort: 8123
            - name: native
              containerPort: 9000
          env:
            - name: CLICKHOUSE_USER
              value: demo
            - name: CLICKHOUSE_PASSWORD
              value: demo_password
            - name: CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT
              value: "1"
          volumeMounts:
            - name: data
              mountPath: /var/lib/clickhouse
          resources:
            requests:
              cpu: "1"
              memory: 2Gi
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 5Gi

A few things to notice. We pin the image to clickhouse/clickhouse-server:26.3, the current Long Term Support release, rather than latest, so the deployment is reproducible. The volumeClaimTemplates block is the StatefulSet feature that gives the pod its own persistent disk, mounted at ClickHouse's data directory. The official image reads the CLICKHOUSE_USER and CLICKHOUSE_PASSWORD environment variables to create a user on first start, which saves us writing a configuration file for now.

Step 3: Apply it and watch it start

kubectl apply -f clickhouse-single.yaml
kubectl get pods -n clickhouse-manual -w

The -w flag watches the pod. It moves from Pending to ContainerCreating to Running over a minute or so, as Kubernetes provisions the disk and pulls the image. Press Ctrl+C once it is Running. Notice the pod is named clickhouse-0: the StatefulSet gave it that stable, predictable identity, and if it ever restarts it will come back as clickhouse-0 with the same disk.

Confirm the storage was provisioned automatically:

kubectl get pvc -n clickhouse-manual

You will see a bound claim named data-clickhouse-0, the disk tied to this pod.

Step 4: Connect and run a query

Open a ClickHouse client inside the pod:

kubectl exec -it clickhouse-0 -n clickhouse-manual -- \
  clickhouse-client -u demo --password demo_password

You are now at a ClickHouse prompt. Create a small table and load demo data generated on the fly, so this works even with no internet access from the pod:

CREATE TABLE trips
(
    id    UInt64,
    city  String,
    fare  Float64
)
ENGINE = MergeTree
ORDER BY id;
 
INSERT INTO trips
SELECT number,
       ['Chennai', 'Paris', 'Tokyo', 'New York'][number % 4 + 1],
       round(10 + rand() % 90 + rand() / 4294967295, 2)
FROM numbers(100000);

Now run the kind of analytical query ClickHouse is built for:

SELECT city, count() AS trips, round(avg(fare), 2) AS avg_fare
FROM trips
GROUP BY city
ORDER BY trips DESC;

You scanned a hundred thousand rows and aggregated them in a blink. Type exit to leave the client. Your single-node ClickHouse is working.

Step 5: Reach it from your machine

To query from your laptop rather than from inside the pod, forward the HTTP port:

kubectl port-forward -n clickhouse-manual svc/clickhouse 8123:8123

Then in another terminal:

curl 'http://localhost:8123/?user=demo&password=demo_password' \
  --data-binary 'SELECT count() FROM trips'

It returns the row count. You now have a real, queryable ClickHouse server on Kubernetes.

Step 6: Prove the data survives a restart

Delete the pod and watch Kubernetes rebuild it:

kubectl delete pod clickhouse-0 -n clickhouse-manual
kubectl get pods -n clickhouse-manual -w

A new clickhouse-0 appears. Connect again and run SELECT count() FROM trips: your rows are still there, because the StatefulSet reattached the same PersistentVolumeClaim. This is the payoff of doing storage properly.

Why this does not scale: the case for an operator

Everything above is fine for a single node. Now imagine what a real production deployment needs, and how much of it you would have to do by hand.

For replication, ClickHouse needs a coordination service called ClickHouse Keeper, plus carefully written configuration that tells each server about its peers, plus ReplicatedMergeTree tables wired with the correct cluster macros. None of that is in our manifest.

For sharding, you would hand-write a remote_servers configuration describing every shard and replica, and keep it in sync every time the topology changes.

For users, settings, profiles, and quotas, you would maintain a pile of XML configuration files mounted into the pods.

For scaling, you would edit the StatefulSet and the cluster configuration together and hope you kept them consistent.

For upgrades, you would manage a careful rolling restart yourself to avoid downtime.

Doing all of this correctly, and keeping it correct as the cluster grows and changes, is a real job. This is precisely the problem the Altinity Kubernetes Operator solves. You describe the cluster you want in one concise resource, and the operator generates and maintains all of the StatefulSets, Services, volumes, configuration, and coordination for you.

Clean up

kubectl delete namespace clickhouse-manual

This removes the pod, Service, and claim in one step.

What is next

You have run ClickHouse on Kubernetes the hard way, and you have seen its limits. In the next article we meet the Altinity Kubernetes Operator, learn why it exists and where it came from, install it, and deploy our first operator-managed ClickHouse cluster with a fraction of the YAML.

References

Work with Quantrail

Expert ClickHouse services

We design, migrate, tune, and run ClickHouse for teams that own their data, from first architecture through day-two operations. Tell us what you are building and we will help.

Talk to an expert

Manage ClickHouse with CHOps

CHOps is our free, open-source ClickHouse admin tool: monitoring, query profiling, backups, visual access control, and alerting in one self-hosted interface, with zero agents on your servers.

Explore CHOps
Share: