ClickHouse® Query Profiling is the process of analyzing query execution to identify performance bottlenecks before applying optimizations. Rather than relying on assumptions, profiling helps determine whether a query is limited by CPU, memory, disk I/O, network communication, or an inefficient execution plan. ClickHouse® provides built-in profiling tools that expose detailed execution statistics, enabling engineers to make evidence-based performance improvements.

This article explains how to profile queries in ClickHouse®, interpret the collected metrics, and identify common performance bottlenecks.

Why Query Profiling Matters

A slow query does not always indicate a slow database. The actual bottleneck may be:

Reading more data than necessary
Poor primary key filtering
Excessive sorting or aggregation
Memory-intensive joins
Network overhead in distributed queries
CPU-heavy expressions
Large data scans caused by ineffective pruning

Without profiling, these issues are difficult to distinguish because they often produce similar symptoms—high latency.

Query profiling provides visibility into:

Query execution time
Rows and bytes processed
Memory consumption
CPU utilization
Disk reads
Thread-level execution
Execution plan details
Low-level performance counters

The objective is to determine why a query is slow before attempting to optimize it.

Understanding the Query Profiling Workflow

A practical profiling workflow in ClickHouse® typically follows these steps:

Identify slow queries using system.query_log.
Examine the execution plan with EXPLAIN.
Inspect resource usage and execution statistics.
Analyze thread-level behavior if required.
Review ProfileEvents to identify resource bottlenecks.
Modify the query or schema.
Measure the results again.

ClickHouse® engineering documentation recommends measuring performance before and after every optimization instead of applying multiple changes simultaneously.

Step 1: Finding Slow Queries with `system.query_log`

The system.query_log table records metadata for executed queries, including execution duration, rows read, bytes processed, memory usage, exceptions, and ProfileEvents. It stores execution statistics rather than query results. Query logging can be configured through server settings, and the logs can be flushed immediately using SYSTEM FLUSH LOGS when necessary.

A common query for identifying expensive queries is:

SELECT
    event_time,
    query_duration_ms,
    read_rows,
    read_bytes,
    memory_usage,
    query
FROM system.query_log
WHERE type = 'QueryFinish'
ORDER BY query_duration_ms DESC
LIMIT 10;

The most useful columns include:

Column	What it Indicates
`query_duration_ms`	Total execution time
`read_rows`	Number of rows scanned
`read_bytes`	Amount of data read
`memory_usage`	Peak memory consumption
`query`	Executed SQL statement
`normalized_query_hash`	Groups structurally identical queries

The normalized_query_hash column is particularly useful because it groups similar queries together, helping identify expensive query patterns that execute repeatedly rather than isolated slow queries.

Step 2: Analyze the Execution Plan with `EXPLAIN`

Execution time alone does not explain why a query is slow.

The EXPLAIN statement reveals how ClickHouse® plans to execute a query.

Example:

EXPLAIN indexes = 1
SELECT *
FROM sales
WHERE order_date >= '2025-01-01';

The output shows information such as:

Index usage
Parts selected
Granules scanned
Predicate pushdown
Read operations

One of the most important indicators is the number of granules read.

If the query scans nearly every granule in a table, the primary key is providing little benefit, resulting in an expensive table scan. Conversely, reading only a small fraction of granules indicates effective data pruning.

Step 3: Inspect Query Resource Usage

Execution time alone rarely tells the complete story.

Consider the following metrics together.

Rows Read

read_rows

A query returning only a few hundred rows but scanning hundreds of millions usually indicates poor filtering or an ineffective primary key.

Bytes Read

read_bytes

Large values indicate heavy disk or storage activity.

This often suggests:

Full table scans
Reading unnecessary columns
Inefficient filtering

Peak Memory Usage

memory_usage

High memory consumption commonly occurs during:

Large joins
GROUP BY operations
ORDER BY
DISTINCT

Memory-intensive queries may eventually spill to disk depending on server settings.

Step 4: Use `system.query_thread_log`

Some slow queries are not uniformly slow.

Different execution threads may spend time performing different tasks.

The system.query_thread_log table records execution statistics for individual query threads, making it possible to determine:

CPU time per thread
Memory usage
Read statistics
Execution duration

This level of detail is especially valuable when diagnosing parallel execution or identifying thread imbalance.

Step 5: Understand ProfileEvents

One of the most valuable profiling features in ClickHouse® is the ProfileEvents map available in system.query_log.

ProfileEvents contains low-level execution counters collected while a query runs. These metrics can reveal whether a query is constrained by CPU, storage, or network activity.

Examples include:

CPU time
Disk reads
Network bytes sent
Selected rows
Selected bytes
Cache hits
File operations

Example:

SELECT
    ProfileEvents
FROM system.query_log
WHERE query_id = 'your-query-id';

Some commonly observed events include:

Profile Event	Interpretation
`SelectedRows`	Rows processed
`SelectedBytes`	Data processed
`NetworkSendBytes`	Network traffic generated
`ReadCompressedBytes`	Compressed data read from storage
`UserTimeMicroseconds`	CPU time spent in user space
`SystemTimeMicroseconds`	CPU time spent in kernel space

Looking at ProfileEvents often makes it clear whether the dominant cost is computation, I/O, or communication rather than simply observing a long execution time.

Identifying Common Bottlenecks

1. Excessive Data Scanning

Symptoms:

High read_rows
High read_bytes
Long execution time

Typical causes:

Missing filters
Poor primary key design
Ineffective partition pruning

2. CPU Bottlenecks

Symptoms:

High CPU ProfileEvents
Moderate data reads
Complex expressions

Common causes:

Expensive functions
Large aggregations
Complex JOIN conditions

3. Memory Bottlenecks

Symptoms:

High memory_usage
Slow aggregations
Slow joins

Often caused by:

Large hash tables
Large GROUP BY operations
Wide intermediate datasets

4. Disk I/O Bottlenecks

Symptoms:

High compressed bytes read
Large storage reads
Long execution time despite moderate CPU usage

Possible reasons include:

Reading excessive columns
Large table scans
Poor data locality

5. Network Bottlenecks

Distributed queries introduce additional overhead.

Indicators include:

High NetworkSendBytes
Large data exchange
Slow distributed aggregations

Reducing intermediate data movement can significantly improve distributed query performance.

Practical Example

Suppose a query requires 20 seconds to complete.

Profiling reveals:

Metric	Value
Duration	20 s
Rows Read	450 million
Result Rows	150
Memory	180 MB
CPU	Moderate
Read Bytes	Very High

This profile suggests that CPU is not the limiting factor.

Instead, the query spends most of its time scanning data.

The appropriate optimization is to improve data pruning through better filtering, schema design, or primary key selection—not to increase CPU resources.

Best Practices for Query Profiling

Enable query logging in production environments.
Analyze execution plans before modifying queries.
Review ProfileEvents alongside execution time.
Track recurring slow queries using normalized_query_hash.
Measure the impact of every optimization.
Test changes with realistic datasets rather than synthetic examples.
Avoid optimizing based on a single metric.

ClickHouse® also recommends testing optimizations iteratively and, when evaluating storage-related changes, measuring both cold-cache and warm-cache performance to understand real-world behavior.

Conclusion

Query optimization is fundamentally an exercise in identifying bottlenecks rather than making isolated performance tweaks.

ClickHouse® provides comprehensive profiling capabilities through system.query_log, system.query_thread_log, EXPLAIN, and ProfileEvents, enabling engineers to understand how a query is executed and which resources it consumes.

By combining execution plans with runtime metrics, it becomes possible to determine whether a query is constrained by CPU, memory, storage, or network activity. This evidence-driven approach leads to targeted optimizations that improve performance while avoiding unnecessary changes to infrastructure or configuration.

References

clickhouse-docs

ClickHouse® Query Profiling: Identifying Bottlenecks

Why Query Profiling Matters

Understanding the Query Profiling Workflow

Step 1: Finding Slow Queries with `system.query_log`

Step 2: Analyze the Execution Plan with `EXPLAIN`

Step 3: Inspect Query Resource Usage

Rows Read

Bytes Read

Peak Memory Usage

Step 4: Use `system.query_thread_log`

Step 5: Understand ProfileEvents

Identifying Common Bottlenecks

1. Excessive Data Scanning

2. CPU Bottlenecks

3. Memory Bottlenecks

4. Disk I/O Bottlenecks

5. Network Bottlenecks

Practical Example

Best Practices for Query Profiling

Conclusion

References

Expert ClickHouse services

Manage ClickHouse with CHOps

Related articles

High Availability and Disaster Recovery in ClickHouse®

ClickHouse® Data Sampling: Querying Billions of Rows Fast

Using ClickHouse® for Log Analysis and Observability

ClickHouse® Query Profiling: Identifying Bottlenecks

Why Query Profiling Matters

Understanding the Query Profiling Workflow

Step 1: Finding Slow Queries with system.query_log

Step 2: Analyze the Execution Plan with EXPLAIN

Step 3: Inspect Query Resource Usage

Rows Read

Bytes Read

Peak Memory Usage

Step 4: Use system.query_thread_log

Step 5: Understand ProfileEvents

Identifying Common Bottlenecks

1. Excessive Data Scanning

2. CPU Bottlenecks

3. Memory Bottlenecks

4. Disk I/O Bottlenecks

5. Network Bottlenecks

Practical Example

Best Practices for Query Profiling

Conclusion

References

Expert ClickHouse services

Manage ClickHouse with CHOps

Related articles

High Availability and Disaster Recovery in ClickHouse®

ClickHouse® Data Sampling: Querying Billions of Rows Fast

Using ClickHouse® for Log Analysis and Observability

Step 1: Finding Slow Queries with `system.query_log`

Step 2: Analyze the Execution Plan with `EXPLAIN`

Step 4: Use `system.query_thread_log`