Introduction
As organizations collect more analytical data, generating reports from raw datasets becomes increasingly expensive. Dashboards that calculate daily sales, website traffic, user activity, or business metrics often execute the same aggregation queries repeatedly. Although ClickHouse® is designed for high-speed analytics, repeatedly scanning millions or billions of rows still consumes CPU, memory, and disk resources.
This is where AggregatingMergeTree becomes valuable.
Instead of recalculating aggregates every time a query runs, AggregatingMergeTree stores intermediate aggregate states that are merged incrementally in the background. As a result, reports and dashboards can retrieve summarized data much faster while reducing the computational overhead on the database.
In this article, we'll explore how AggregatingMergeTree works, understand aggregate states, build a practical implementation using Materialized Views, compare it with MergeTree, and review best practices for designing scalable analytical applications.
1. What is AggregatingMergeTree?
AggregatingMergeTree is a specialized table engine in ClickHouse designed for storing pre-aggregated data.
Unlike a standard MergeTree table, which stores every inserted row, AggregatingMergeTree stores aggregate function states. During background merge operations, ClickHouse combines these states automatically, allowing analytical queries to read much smaller datasets.
This engine is particularly useful when the same aggregation queries are executed repeatedly, such as those powering dashboards, reports, and business intelligence applications.
Typical use cases include:
- Daily sales reports
- Website analytics
- IoT sensor summaries
- Financial reporting
- Business KPIs
- Monitoring dashboards
2. Why Do We Need AggregatingMergeTree?
Imagine an e-commerce platform that records millions of orders every day.
A reporting dashboard needs to display total sales for each product.
A typical query might look like:
SELECT
product,
sum(amount) AS total_sales
FROM sales
GROUP BY product;This query works well on small datasets. However, when the sales table grows to billions of rows, ClickHouse must scan and aggregate all matching records every time the dashboard refreshes.
Even though ClickHouse is highly optimized, repeatedly processing massive datasets is inefficient.
AggregatingMergeTree solves this challenge by storing aggregated information as data is inserted. Instead of recalculating totals from scratch, queries simply read the pre-aggregated results, dramatically reducing execution time.
3. How Does AggregatingMergeTree Work?
The workflow typically follows these steps:
- Raw records are inserted into a MergeTree table.
- A Materialized View computes aggregate states.
- AggregatingMergeTree stores those aggregate states.
- ClickHouse merges compatible states automatically in the background.
- Queries finalize the results using merge functions.
Application
│
▼
Raw MergeTree Table
│
▼
Materialized View
│
▼
Aggregate States
(sumState(), avgState())
│
▼
AggregatingMergeTree
│
▼
Background Merge
│
▼
sumMerge()
│
▼
Dashboard / ReportThe advantage is that the expensive aggregation work happens once during data ingestion rather than every time a report is generated.
4. Aggregate Functions vs. AggregatingMergeTree
These two concepts are related but serve different purposes.
| Aggregate Functions | AggregatingMergeTree |
|---|---|
| SQL functions | Table engine |
| Calculate results during query execution | Stores aggregate states |
Used in SELECT queries | Used for pre-aggregated summary tables |
Examples: sum(), count(), avg() | Uses AggregateFunction columns |
Think of it this way:
- Aggregate functions perform calculations.
- AggregatingMergeTree stores the results of those calculations in an intermediate form.
5. Understanding Aggregate States
This is the most important concept behind AggregatingMergeTree.
Normally, when you execute:
SELECT sum(amount)
FROM sales;ClickHouse returns the final total.
However, AggregatingMergeTree does not store the final value.
Instead, it stores an aggregate state.
For example,
sumState(amount)creates a partial aggregation.
Later,
sumMerge(total_sales)combines all stored states to produce the final result.
You can think of an aggregate state as a partially completed calculation that ClickHouse can continue merging efficiently over time.
6. Creating the Tables
Let's build a simple website analytics example that tracks daily page views.
Step 1: Create the Raw Events Table
This table stores every page visit.
CREATE TABLE page_views
(
event_date Date,
page String,
views UInt64
)
ENGINE = MergeTree
ORDER BY (event_date, page);Example raw data:
| Date | Page | Views |
|---|---|---|
| 2026-07-01 | Home | 1 |
| 2026-07-01 | Home | 1 |
| 2026-07-01 | Products | 1 |
| 2026-07-01 | Home | 1 |
| 2026-07-01 | Contact | 1 |
Every page visit is stored as an individual row.
Step 2: Create the Summary Table
Instead of storing final totals, AggregatingMergeTree stores aggregate function states.
CREATE TABLE page_views_summary
(
event_date Date,
page String,
total_views AggregateFunction(sum, UInt64)
)
ENGINE = AggregatingMergeTree
ORDER BY (event_date, page);Notice that the column type is:
AggregateFunction(sum, UInt64)instead of:
UInt64This is because the table stores aggregate states, not completed values.
7. Automatically Aggregating Data
Rather than calculating totals manually, create a Materialized View that automatically updates the summary table whenever new events are inserted.
CREATE MATERIALIZED VIEW mv_page_views
TO page_views_summary
AS
SELECT
event_date,
page,
sumState(views) AS total_views
FROM page_views
GROUP BY
event_date,
page;Whenever a new page view is inserted into the page_views table, the materialized view computes the aggregate state and stores it in the summary table automatically.
This eliminates the need to run aggregation queries repeatedly on the raw data.
8. Querying the Summary Table
Since the summary table stores aggregate states, use sumMerge() to retrieve the final aggregated values.
SELECT
event_date,
page,
sumMerge(total_views) AS total_views
FROM page_views_summary
GROUP BY
event_date,
page
ORDER BY
event_date,
page;Example Output
| Date | Page | Total Views |
|---|---|---|
| 2026-07-01 | Home | 35,842 |
| 2026-07-01 | Products | 12,614 |
| 2026-07-01 | Contact | 4,238 |
Instead of scanning millions of individual page-view events, ClickHouse reads a much smaller summary table containing pre-aggregated data. This significantly reduces query execution time and improves dashboard performance, especially for high-traffic analytics workloads.
10. When Should You Use AggregatingMergeTree?
AggregatingMergeTree is an excellent choice for analytical workloads where the same aggregation queries are executed repeatedly. By storing pre-aggregated data, it significantly reduces the amount of data scanned during query execution, resulting in faster response times and lower resource consumption.
Common use cases include:
- Interactive dashboards with frequent refreshes
- Daily, hourly, or monthly reporting
- Business KPIs and executive dashboards
- Website and application analytics
- Monitoring and observability platforms
- Large-scale event or log analytics
However, AggregatingMergeTree is not recommended for:
- Transactional (OLTP) workloads
- Applications requiring row-level updates or lookups
- Frequently modified individual records
11. Common Mistakes to Avoid
When working with AggregatingMergeTree, avoid these common pitfalls:
- Using
sum()instead ofsumState()when inserting aggregate data. - Querying aggregate states without using the corresponding merge functions such as
sumMerge(). - Defining aggregate columns as regular numeric types instead of
AggregateFunctiondata types. - Using
AggregatingMergeTreefor transactional or row-level processing. - Expecting aggregate states to merge immediately after every insert, as background merges occur asynchronously.
Understanding these common mistakes can help you design more efficient aggregation pipelines and avoid unexpected query results.
Conclusion
AggregatingMergeTree is one of the most powerful table engines in ClickHouse® for optimizing analytical workloads. By storing aggregate function states instead of recalculating values from raw data, it dramatically improves query performance while reducing the computational cost of repeated aggregations.
When combined with Materialized Views and aggregate state functions such as sumState() and sumMerge(), it enables scalable, high-performance reporting for dashboards, business intelligence platforms, and real-time analytics applications.
If your ClickHouse deployment frequently executes the same aggregation queries over large datasets, adopting AggregatingMergeTree can significantly improve query performance, reduce resource utilization, and simplify the design of production-ready analytical systems.



