All posts
Merges and Mutations in ClickHouse®

Merges and Mutations in ClickHouse®

June 24, 20266 min readKanishga S
Share:

How to Monitor, Troubleshoot, and Optimize Background Operations

Introduction

One of the reasons ClickHouse® delivers exceptional analytical performance is its storage engine. Instead of updating data in place, ClickHouse continuously performs background merges and mutations to keep data organized and queries efficient.

While these operations happen automatically, they can sometimes become difficult to monitor. A growing number of table parts can lead to "Too many parts" errors, while long-running mutations may delay updates and consume significant system resources.

Unlike query execution, merges and mutations do not always provide clear visibility into their progress. Administrators often rely on manually checking system tables, making it difficult to identify bottlenecks before they impact production workloads.

In this article, you'll learn how merges and mutations work, why monitoring them matters, and how to troubleshoot common issues.


Understanding Background Merges

ClickHouse stores data in immutable parts.

Whenever new data is inserted, a new part is created. Over time, hundreds or even thousands of parts can accumulate.

To improve query performance and reduce storage overhead, ClickHouse automatically merges smaller parts into larger ones.

A typical merge process looks like this:

Insert Data


Part A   Part B   Part C
   │       │       │
   └───────┴───────┘

     Background Merge

      Larger Part

Benefits include:

  • Fewer files on disk
  • Faster query execution
  • Better compression
  • Lower metadata overhead

What Are Mutations?

A mutation modifies existing data.

Operations such as:

  • UPDATE
  • DELETE
  • MATERIALIZE COLUMN
  • MATERIALIZE INDEX

are executed as background mutations rather than immediate row updates.

For example:

ALTER TABLE events
DELETE WHERE event_date < '2024-01-01';

Instead of deleting rows instantly, ClickHouse schedules a mutation that rewrites affected parts in the background.

Large mutations can take considerable time depending on:

  • Table size
  • Number of parts
  • Available CPU
  • Disk throughput

Why Monitoring Merges and Mutations Matters

When background tasks fall behind, overall database performance can degrade.

Common symptoms include:

  • "Too many parts" errors
  • Slow inserts
  • High disk usage
  • Long-running ALTER operations
  • Increased CPU utilization
  • Delayed DELETE or UPDATE operations

These issues often appear gradually before becoming production incidents.


Monitoring Active Merges

ClickHouse exposes active merge operations through the system.merges table.

SELECT
    database,
    table,
    elapsed,
    progress,
    num_parts
FROM system.merges;

Example output:

DatabaseTableProgressElapsed
analyticsevents0.7235 sec
logsaccess_logs0.4118 sec

Useful columns include:

  • progress
  • elapsed
  • num_parts
  • bytes_read_uncompressed
  • bytes_written_uncompressed

These values help determine whether merges are progressing normally.


Monitoring Mutations

Use system.mutations to inspect mutation status.

SELECT
    database,
    table,
    mutation_id,
    command,
    is_done,
    parts_to_do
FROM system.mutations;

Example:

MutationStatusParts Remaining
mutation_25.txtRunning48
mutation_26.txtCompleted0

Important fields include:

  • is_done
  • parts_to_do
  • latest_failed_part
  • latest_fail_reason

These columns quickly reveal stalled or failed mutations.


Detecting Too Many Parts

Having thousands of small parts negatively affects performance.

Check part counts using:

SELECT
    database,
    table,
    count() AS parts
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY parts DESC;

If a table contains an unusually high number of parts, background merges may not be keeping up with incoming inserts.

Common causes include:

  • Very small insert batches
  • Heavy concurrent ingestion
  • Slow storage
  • Limited background threads

Viewing Merge Queue Activity

You can also monitor merge-related metrics through the system metrics tables.

Useful metrics include:

  • Active merges
  • Background pool utilization
  • Pending merge tasks
  • Mutation queue length

These metrics are valuable when building dashboards in monitoring tools like Grafana.


Common Reasons Merges Fall Behind

Several factors can slow background merges:

Small Insert Sizes

Frequent tiny inserts create excessive parts.

Instead of:

1 row
1 row
1 row

Prefer batching:

10,000 rows

Larger inserts significantly reduce merge pressure.


Heavy Disk I/O

Merges read and rewrite large amounts of data.

Slow disks become a bottleneck.

SSD or NVMe storage dramatically improves merge throughput.


Large Mutations

Executing large DELETE or UPDATE operations rewrites many data parts.

Instead of one massive mutation, consider processing data in smaller batches when possible.


Background Thread Saturation

ClickHouse uses background threads for merges.

If all workers remain busy, new merge tasks accumulate.

Monitoring thread utilization helps detect this condition early.


Best Practices

To keep merges and mutations healthy:

  • Batch inserts instead of inserting individual rows.
  • Monitor active part counts regularly.
  • Watch long-running mutations.
  • Use fast storage for production workloads.
  • Avoid frequent large DELETE operations.
  • Schedule heavy maintenance during off-peak hours.
  • Build dashboards around system tables for continuous monitoring.

Example Monitoring Queries

Largest tables by part count:

SELECT
    table,
    count()
FROM system.parts
WHERE active
GROUP BY table
ORDER BY count() DESC;

Running merges:

SELECT *
FROM system.merges;

Pending mutations:

SELECT *
FROM system.mutations
WHERE is_done = 0;

Building Dashboards

Many production teams collect data from:

  • system.merges
  • system.mutations
  • system.parts
  • system.metrics
  • system.events

Visualizing these metrics helps identify merge backlogs before they impact users.

A simple dashboard can display:

  • Active merges
  • Pending mutations
  • Part count per table
  • Merge throughput
  • Background thread utilization

This provides continuous visibility into ClickHouse background activity.


Conclusion

Background merges and mutations are essential to maintaining ClickHouse performance, but they often operate behind the scenes. Without proper monitoring, issues such as excessive part counts, stalled mutations, and merge backlogs can quietly grow until they affect ingestion and query performance.

By regularly inspecting system.merges, system.mutations, and system.parts, batching inserts efficiently, and monitoring background activity with dashboards, you can detect problems early and keep your ClickHouse cluster running smoothly.

Understanding these internal processes is a key step toward operating ClickHouse reliably at scale.

Share: