All posts
ClickHouse® Persistent Storage on Kubernetes: Volumes, StorageClasses, and Resizing

ClickHouse® Persistent Storage on Kubernetes: Volumes, StorageClasses, and Resizing

May 12, 20265 min readRavivarman R
Share:

This is the fifth article in our series on running the ClickHouse® database on Kubernetes with the Altinity® Kubernetes Operator. We have installed the operator and run a first cluster. That cluster used whatever default storage the operator picked, which is fine for a demo and dangerous for real data. This article makes storage explicit and durable.

Why storage is the part you must get right

A ClickHouse pod can be deleted and recreated at any time. If its data lived on the pod, it would vanish on every restart. As we saw earlier, Kubernetes solves this by keeping data on a PersistentVolume that outlives the pod, requested through a PersistentVolumeClaim. The operator gives you a clean way to describe these claims inside the CHI, so you never write a StatefulSet by hand.

StorageClasses: where volumes come from

A StorageClass describes a kind of storage your cluster can create on demand. When a claim asks for storage and names a StorageClass, the cluster provisions a matching volume automatically. Most clusters have a default StorageClass, so a claim that names none still gets a disk. List what your cluster offers:

kubectl get storageclass

On minikube you will see a class named standard marked (default). On a cloud cluster you will see classes backed by the provider's disks. You can let ClickHouse use the default, or name a specific class for faster or encrypted disks.

Adding a data volume to a CHI

The operator uses two pieces that work together: a volumeClaimTemplate under templates that describes the disk, and a reference to it under defaults.templates.dataVolumeClaimTemplate. Here is a single node with an explicit 10 gigabyte data volume. Save it as ch-storage.yaml:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "storage-demo"
spec:
  defaults:
    templates:
      dataVolumeClaimTemplate: data-volume
  configuration:
    clusters:
      - name: "main"
        layout:
          shardsCount: 1
          replicasCount: 1
  templates:
    podTemplates:
      - name: clickhouse-pod
        spec:
          containers:
            - name: clickhouse
              image: clickhouse/clickhouse-server:26.3
    volumeClaimTemplates:
      - name: data-volume
        spec:
          # Omit storageClassName to use the cluster default,
          # or set it explicitly, for example: storageClassName: standard
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi

The data-volume template requests a 10 gigabyte ReadWriteOnce disk, which means the volume is mounted read-write by a single node, exactly what a ClickHouse server needs. Because we did not set storageClassName, the cluster's default class provisions it. Apply it:

kubectl create namespace ch
kubectl apply -n ch -f ch-storage.yaml

Then confirm the claim was bound to a real volume:

kubectl get pvc -n ch
NAME                              STATUS   VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS
data-volume-chi-storage-demo...   Bound    pvc-...   10Gi       RWO            standard

The operator created the claim, the cluster provisioned the disk, and ClickHouse now stores its data on storage that survives pod restarts.

Separating data and log volumes

ClickHouse writes its table data to /var/lib/clickhouse and its logs to /var/log/clickhouse-server. On a busy server it is good practice to keep logs on their own smaller volume so log growth can never fill the data disk. The operator supports this with a second template and a second reference:

spec:
  defaults:
    templates:
      dataVolumeClaimTemplate: data-volume
      logVolumeClaimTemplate: log-volume
  templates:
    volumeClaimTemplates:
      - name: data-volume
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
      - name: log-volume
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 1Gi

Now data and logs live on independent disks, each sized for its job.

Reclaim policy: what happens when you delete the cluster

Every StorageClass has a reclaim policy that decides the fate of a volume when its claim is removed. The common default is Delete, which destroys the underlying disk when you delete the CHI. That is convenient for learning and risky for production, where you may prefer Retain so the data sticks around even if the cluster object is deleted by mistake. Check your class with kubectl get storageclass <name> -o yaml and look at reclaimPolicy. Choose deliberately; this setting protects you from accidental data loss.

Growing a volume without downtime

Data grows, and eventually 10 gigabytes is not enough. If your StorageClass allows it, you can expand a volume in place. Two things must be true. First, the StorageClass must permit expansion, which is set with allowVolumeExpansion: true on the class. Second, you increase the requested size in your CHI and reapply.

Here is a StorageClass that allows expansion, as an example of the property to look for:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: expandable
provisioner: kubernetes.io/your-provisioner
allowVolumeExpansion: true
reclaimPolicy: Retain

With an expandable class in use, raise the storage request, for example from 10Gi to 50Gi, in your volumeClaimTemplates, and reapply the CHI. The operator updates the claim and the volume grows without recreating the pod, so the database keeps serving queries throughout. Always confirm your provider supports online expansion before relying on this.

A note on choosing sizes and classes

For learning on minikube, the default standard class and small sizes are perfect. For production, pick a StorageClass backed by fast SSD or NVMe disks, size the data volume for your expected dataset plus headroom for background merges, give logs a modest separate volume, and set the reclaim policy to protect your data. We return to production storage choices, including tiered storage to object stores like S3, later in the series.

Clean up

kubectl delete namespace ch

What is next

Your ClickHouse data is now durable. In the next article we configure the database itself through the operator: users, profiles, quotas, server settings, and storing passwords safely in Kubernetes Secrets instead of plain text.

References

Work with Quantrail

Expert ClickHouse services

We design, migrate, tune, and run ClickHouse for teams that own their data, from first architecture through day-two operations. Tell us what you are building and we will help.

Talk to an expert

Manage ClickHouse with CHOps

CHOps is our free, open-source ClickHouse admin tool: monitoring, query profiling, backups, visual access control, and alerting in one self-hosted interface, with zero agents on your servers.

Explore CHOps
Share: