ClickHouse® Persistent Storage on Kubernetes: Volumes, StorageClasses, and Resizing

This is the fifth article in our series on running the ClickHouse® database on Kubernetes with the Altinity® Kubernetes Operator. We have installed the operator and run a first cluster. That cluster used whatever default storage the operator picked, which is fine for a demo and dangerous for real data. This article makes storage explicit and durable.

Why storage is the part you must get right

A ClickHouse pod can be deleted and recreated at any time. If its data lived on the pod, it would vanish on every restart. As we saw earlier, Kubernetes solves this by keeping data on a PersistentVolume that outlives the pod, requested through a PersistentVolumeClaim. The operator gives you a clean way to describe these claims inside the CHI, so you never write a StatefulSet by hand.

StorageClasses: where volumes come from

A StorageClass describes a kind of storage your cluster can create on demand. When a claim asks for storage and names a StorageClass, the cluster provisions a matching volume automatically. Most clusters have a default StorageClass, so a claim that names none still gets a disk. List what your cluster offers:

kubectl get storageclass

On minikube you will see a class named standard marked (default). On a cloud cluster you will see classes backed by the provider's disks. You can let ClickHouse use the default, or name a specific class for faster or encrypted disks.

Adding a data volume to a CHI

The operator uses two pieces that work together: a volumeClaimTemplate under templates that describes the disk, and a reference to it under defaults.templates.dataVolumeClaimTemplate. Here is a single node with an explicit 10 gigabyte data volume. Save it as ch-storage.yaml:

apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: "storage-demo"
spec:
  defaults:
    templates:
      dataVolumeClaimTemplate: data-volume
  configuration:
    clusters:
      - name: "main"
        layout:
          shardsCount: 1
          replicasCount: 1
  templates:
    podTemplates:
      - name: clickhouse-pod
        spec:
          containers:
            - name: clickhouse
              image: clickhouse/clickhouse-server:26.3
    volumeClaimTemplates:
      - name: data-volume
        spec:
          # Omit storageClassName to use the cluster default,
          # or set it explicitly, for example: storageClassName: standard
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi

The data-volume template requests a 10 gigabyte ReadWriteOnce disk, which means the volume is mounted read-write by a single node, exactly what a ClickHouse server needs. Because we did not set storageClassName, the cluster's default class provisions it. Apply it:

kubectl create namespace ch
kubectl apply -n ch -f ch-storage.yaml

Then confirm the claim was bound to a real volume:

kubectl get pvc -n ch

NAME                              STATUS   VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS
data-volume-chi-storage-demo...   Bound    pvc-...   10Gi       RWO            standard

The operator created the claim, the cluster provisioned the disk, and ClickHouse now stores its data on storage that survives pod restarts.

Separating data and log volumes

ClickHouse writes its table data to /var/lib/clickhouse and its logs to /var/log/clickhouse-server. On a busy server it is good practice to keep logs on their own smaller volume so log growth can never fill the data disk. The operator supports this with a second template and a second reference:

spec:
  defaults:
    templates:
      dataVolumeClaimTemplate: data-volume
      logVolumeClaimTemplate: log-volume
  templates:
    volumeClaimTemplates:
      - name: data-volume
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 10Gi
      - name: log-volume
        spec:
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 1Gi

Now data and logs live on independent disks, each sized for its job.

Reclaim policy: what happens when you delete the cluster

Every StorageClass has a reclaim policy that decides the fate of a volume when its claim is removed. The common default is Delete, which destroys the underlying disk when you delete the CHI. That is convenient for learning and risky for production, where you may prefer Retain so the data sticks around even if the cluster object is deleted by mistake. Check your class with kubectl get storageclass <name> -o yaml and look at reclaimPolicy. Choose deliberately; this setting protects you from accidental data loss.

Growing a volume without downtime

Data grows, and eventually 10 gigabytes is not enough. If your StorageClass allows it, you can expand a volume in place. Two things must be true. First, the StorageClass must permit expansion, which is set with allowVolumeExpansion: true on the class. Second, you increase the requested size in your CHI and reapply.

Here is a StorageClass that allows expansion, as an example of the property to look for:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: expandable
provisioner: kubernetes.io/your-provisioner
allowVolumeExpansion: true
reclaimPolicy: Retain

With an expandable class in use, raise the storage request, for example from 10Gi to 50Gi, in your volumeClaimTemplates, and reapply the CHI. The operator updates the claim and the volume grows without recreating the pod, so the database keeps serving queries throughout. Always confirm your provider supports online expansion before relying on this.

A note on choosing sizes and classes

For learning on minikube, the default standard class and small sizes are perfect. For production, pick a StorageClass backed by fast SSD or NVMe disks, size the data volume for your expected dataset plus headroom for background merges, give logs a modest separate volume, and set the reclaim policy to protect your data. We return to production storage choices, including tiered storage to object stores like S3, later in the series.

Clean up

kubectl delete namespace ch

What is next

Your ClickHouse data is now durable. In the next article we configure the database itself through the operator: users, profiles, quotas, server settings, and storing passwords safely in Kubernetes Secrets instead of plain text.

ClickHouse® Persistent Storage on Kubernetes: Volumes, StorageClasses, and Resizing

Why storage is the part you must get right

StorageClasses: where volumes come from

Adding a data volume to a CHI

Separating data and log volumes

Reclaim policy: what happens when you delete the cluster

Growing a volume without downtime

A note on choosing sizes and classes

Clean up

What is next

References

Expert ClickHouse services

Manage ClickHouse with CHOps

Related articles

Tiered Storage for ClickHouse® on Kubernetes: Hot Disks and S3 Cold Storage

FIPS 140-3 Compliance for ClickHouse® on Kubernetes with the Altinity® Operator

Troubleshooting ClickHouse® on Kubernetes: A Practical Debugging Guide