Understanding ClickHouse® Partitions and Sorting Keys

When people first start working with ClickHouse®, they often focus on queries, functions, and performance benchmarks. However, one of the most important factors influencing query performance is how data is physically organized on disk.

Two concepts play a central role in this process:

Partitions
Sorting Keys

Understanding how these work can help you design more efficient tables, reduce query execution times, and avoid common performance issues as your datasets grow.

In this article, we’ll explore what ClickHouse® partitions and sorting keys are, how they differ, and when to use them.

Why Data Organization Matters

ClickHouse® is a column-oriented OLAP database designed to process large volumes of analytical data.

Unlike traditional databases that often rely heavily on indexes, ClickHouse® achieves performance through:

Columnar storage
Data compression
Data skipping
Parallel processing
Efficient data organization

This is where partitions and sorting keys become important.

They help ClickHouse® locate relevant data faster and avoid scanning unnecessary portions of a table.

What Are Partitions in ClickHouse®?

A partition is a logical way of dividing data into separate groups.

When data is inserted, ClickHouse® evaluates the partition expression and stores rows belonging to the same partition together.

For example:

CREATE TABLE events
(
    timestamp DateTime,
    user_id UInt64,
    event_type String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id);

In this example:

PARTITION BY toYYYYMM(timestamp)

creates a separate partition for every month.

Example partitions:

202601
202602
202603

Each partition contains only data from its corresponding month.

Why Use Partitions?

Partitions provide several operational benefits.

Faster Data Management

Instead of deleting individual rows, entire partitions can be removed.

Example:

ALTER TABLE events
DROP PARTITION 202501;

This operation is significantly faster than deleting billions of rows individually.

Easier Data Retention

Many organizations keep data for a fixed period.

For example:

Keep logs for 90 days
Keep metrics for 12 months
Archive old events

Partitions make these operations simple and efficient.

Reduced Data Scanning

If a query filters on the partition key, ClickHouse® can ignore partitions that do not match the query conditions.

Example:

SELECT count()
FROM events
WHERE timestamp >= '2026-06-01'
  AND timestamp < '2026-07-01';

The engine may only need to examine the relevant monthly partition.

What Partitions Are Not

One common misconception is that partitions are the primary mechanism for query acceleration.

They are not.

Many users create excessively granular partitions expecting dramatic speed improvements.

For example:

PARTITION BY user_id

PARTITION BY session_id

This usually creates too many partitions and can negatively impact performance.

A good partitioning strategy typically produces a manageable number of partitions, often based on time.

Examples include:

toYYYYMM(timestamp)
toYYYYMMDD(timestamp)
toYear(timestamp)

depending on data volume and retention requirements.

What Is a Sorting Key?

The sorting key defines how rows are physically ordered within each partition.

Example:

ORDER BY (customer_id, timestamp)

This tells ClickHouse® to store data sorted by:

customer_id
timestamp

The sorting key is one of the most important design decisions when creating a MergeTree table.

Why Sorting Keys Matter

ClickHouse® stores metadata that allows it to skip ranges of data that cannot satisfy a query.

Consider the following sorting key:

ORDER BY (customer_id, timestamp)

And the query:

SELECT *
FROM orders
WHERE customer_id = 12345;

Because the data is physically ordered by customer_id, ClickHouse® can quickly locate the relevant ranges rather than scanning the entire dataset.

This mechanism is often responsible for significant performance improvements.

Sorting Keys vs Primary Keys

In ClickHouse®, these concepts are closely related.

Most tables use:

ORDER BY (...)

which automatically becomes the primary key.

Example:

ENGINE = MergeTree
ORDER BY (user_id, timestamp)

Internally, ClickHouse® creates sparse indexing information based on this ordering.

Unlike traditional databases, ClickHouse® does not create a B-tree index for every row.

Instead, it uses lightweight index marks that help identify relevant data ranges efficiently.

How Partitions and Sorting Keys Work Together

Partitions and sorting keys serve different purposes.

Partitions determine:

How data is grouped
How data is managed
How retention policies are applied

Sorting keys determine:

How data is ordered
How data is skipped
How efficiently queries execute

Think of it this way:

Partitioning decides which storage box to look in.

Sorting determines how neatly the contents inside that box are arranged.

Both contribute to performance, but sorting keys usually have a larger impact on query speed.

Example of a Well-Designed Table

CREATE TABLE metrics
(
    timestamp DateTime,
    host String,
    metric_name String,
    value Float64
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (host, metric_name, timestamp);

Why this works:

Monthly partitions simplify retention management.
Data is grouped by host.
Metrics for the same host stay close together.
Time-series queries remain efficient.
Data skipping becomes more effective.

This design is common in observability and monitoring workloads.

Common Mistakes

Creating Too Many Partitions

Avoid:

PARTITION BY user_id

This can create millions of partitions and increase metadata overhead.

Using Random Sorting Keys

Avoid:

ORDER BY rand()

or sorting columns that rarely appear in filters.

The sorting key should align with common query patterns.

Designing for Inserts Only

Many teams focus on ingestion speed while ignoring query behavior.

A sorting key should primarily optimize how data will be queried.

Best Practices

When designing ClickHouse® tables:

Partition primarily for data management.
Sort primarily for query performance.
Use time-based partitions in most workloads.
Choose sorting keys that match filter conditions.
Avoid high-cardinality partition keys.
Monitor query patterns before finalizing table design.
Test with realistic production workloads.

Final Thoughts

Understanding ClickHouse® partitions and sorting keys is essential for building efficient analytical systems.

Partitions help organize and manage data, while sorting keys enable ClickHouse® to skip large portions of irrelevant data during query execution.

A common mistake is overestimating the importance of partitions and underestimating the impact of sorting keys. In practice, sorting keys often have a much greater influence on query performance.

When designing MergeTree tables, think carefully about how your data will be queried. The right partitioning strategy combined with a well-designed sorting key can significantly improve performance, reduce resource consumption, and help ClickHouse® scale effectively as data volumes grow.

Exploring ClickHouse® for Your Analytics?

At Quantrail Data, we help teams run ClickHouse® reliably for real-time analytics – from Kubernetes deployments and migrations to performance tuning in production.

We see these challenges firsthand while supporting demanding analytics workloads. In one recent engagement, a customer achieved near bare-metal performance with ClickHouse® in production – a story we’ve shared here:

Success Story: Quantrail Bare-Metal ClickHouse® Deployment
If you’re evaluating ClickHouse® or trying to get more out of an existing setup, we’re happy to share practical lessons from real-world deployments.

Contact
Quantrail Data

References

ClickHouse® Documentation – Choosing a Primary Key
Introduction to ClickHouse® MergeTree Engines

Understanding ClickHouse® Partitions and Sorting Keys

Why Data Organization Matters

What Are Partitions in ClickHouse®?

Why Use Partitions?

Faster Data Management

Easier Data Retention

Reduced Data Scanning

What Partitions Are Not

What Is a Sorting Key?

Why Sorting Keys Matter

Sorting Keys vs Primary Keys

How Partitions and Sorting Keys Work Together

Example of a Well-Designed Table

Common Mistakes

Creating Too Many Partitions

Using Random Sorting Keys

Designing for Inserts Only

Best Practices

Final Thoughts

Exploring ClickHouse® for Your Analytics?

References

Related articles

Deep Dive into the ReplacingMergeTree Engine

Working with JSON in ClickHouse®

Filtering Data in ClickHouse®: Best Practices