When people first start working with ClickHouse®, they often focus on queries, functions, and performance benchmarks. However, one of the most important factors influencing query performance is how data is physically organized on disk.
Two concepts play a central role in this process:
- Partitions
- Sorting Keys
Understanding how these work can help you design more efficient tables, reduce query execution times, and avoid common performance issues as your datasets grow.
In this article, we’ll explore what ClickHouse® partitions and sorting keys are, how they differ, and when to use them.
Why Data Organization Matters
ClickHouse® is a column-oriented OLAP database designed to process large volumes of analytical data.
Unlike traditional databases that often rely heavily on indexes, ClickHouse® achieves performance through:
- Columnar storage
- Data compression
- Data skipping
- Parallel processing
- Efficient data organization
This is where partitions and sorting keys become important.
They help ClickHouse® locate relevant data faster and avoid scanning unnecessary portions of a table.
What Are Partitions in ClickHouse®?
A partition is a logical way of dividing data into separate groups.
When data is inserted, ClickHouse® evaluates the partition expression and stores rows belonging to the same partition together.
For example:
CREATE TABLE events
(
timestamp DateTime,
user_id UInt64,
event_type String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id);In this example:
PARTITION BY toYYYYMM(timestamp)creates a separate partition for every month.
Example partitions:
202601
202602
202603Each partition contains only data from its corresponding month.
Why Use Partitions?
Partitions provide several operational benefits.
Faster Data Management
Instead of deleting individual rows, entire partitions can be removed.
Example:
ALTER TABLE events
DROP PARTITION 202501;This operation is significantly faster than deleting billions of rows individually.
Easier Data Retention
Many organizations keep data for a fixed period.
For example:
- Keep logs for 90 days
- Keep metrics for 12 months
- Archive old events
Partitions make these operations simple and efficient.
Reduced Data Scanning
If a query filters on the partition key, ClickHouse® can ignore partitions that do not match the query conditions.
Example:
SELECT count()
FROM events
WHERE timestamp >= '2026-06-01'
AND timestamp < '2026-07-01';The engine may only need to examine the relevant monthly partition.
What Partitions Are Not
One common misconception is that partitions are the primary mechanism for query acceleration.
They are not.
Many users create excessively granular partitions expecting dramatic speed improvements.
For example:
PARTITION BY user_idor
PARTITION BY session_idThis usually creates too many partitions and can negatively impact performance.
A good partitioning strategy typically produces a manageable number of partitions, often based on time.
Examples include:
toYYYYMM(timestamp)
toYYYYMMDD(timestamp)
toYear(timestamp)depending on data volume and retention requirements.
What Is a Sorting Key?
The sorting key defines how rows are physically ordered within each partition.
Example:
ORDER BY (customer_id, timestamp)This tells ClickHouse® to store data sorted by:
- customer_id
- timestamp
The sorting key is one of the most important design decisions when creating a MergeTree table.
Why Sorting Keys Matter
ClickHouse® stores metadata that allows it to skip ranges of data that cannot satisfy a query.
Consider the following sorting key:
ORDER BY (customer_id, timestamp)And the query:
SELECT *
FROM orders
WHERE customer_id = 12345;Because the data is physically ordered by customer_id, ClickHouse® can quickly locate the relevant ranges rather than scanning the entire dataset.
This mechanism is often responsible for significant performance improvements.
Sorting Keys vs Primary Keys
In ClickHouse®, these concepts are closely related.
Most tables use:
ORDER BY (...)which automatically becomes the primary key.
Example:
ENGINE = MergeTree
ORDER BY (user_id, timestamp)Internally, ClickHouse® creates sparse indexing information based on this ordering.
Unlike traditional databases, ClickHouse® does not create a B-tree index for every row.
Instead, it uses lightweight index marks that help identify relevant data ranges efficiently.
How Partitions and Sorting Keys Work Together
Partitions and sorting keys serve different purposes.
Partitions determine:
- How data is grouped
- How data is managed
- How retention policies are applied
Sorting keys determine:
- How data is ordered
- How data is skipped
- How efficiently queries execute
Think of it this way:
Partitioning decides which storage box to look in.
Sorting determines how neatly the contents inside that box are arranged.
Both contribute to performance, but sorting keys usually have a larger impact on query speed.
Example of a Well-Designed Table
CREATE TABLE metrics
(
timestamp DateTime,
host String,
metric_name String,
value Float64
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (host, metric_name, timestamp);Why this works:
- Monthly partitions simplify retention management.
- Data is grouped by host.
- Metrics for the same host stay close together.
- Time-series queries remain efficient.
- Data skipping becomes more effective.
This design is common in observability and monitoring workloads.
Common Mistakes
Creating Too Many Partitions
Avoid:
PARTITION BY user_idThis can create millions of partitions and increase metadata overhead.
Using Random Sorting Keys
Avoid:
ORDER BY rand()or sorting columns that rarely appear in filters.
The sorting key should align with common query patterns.
Designing for Inserts Only
Many teams focus on ingestion speed while ignoring query behavior.
A sorting key should primarily optimize how data will be queried.
Best Practices
When designing ClickHouse® tables:
- Partition primarily for data management.
- Sort primarily for query performance.
- Use time-based partitions in most workloads.
- Choose sorting keys that match filter conditions.
- Avoid high-cardinality partition keys.
- Monitor query patterns before finalizing table design.
- Test with realistic production workloads.
Final Thoughts
Understanding ClickHouse® partitions and sorting keys is essential for building efficient analytical systems.
Partitions help organize and manage data, while sorting keys enable ClickHouse® to skip large portions of irrelevant data during query execution.
A common mistake is overestimating the importance of partitions and underestimating the impact of sorting keys. In practice, sorting keys often have a much greater influence on query performance.
When designing MergeTree tables, think carefully about how your data will be queried. The right partitioning strategy combined with a well-designed sorting key can significantly improve performance, reduce resource consumption, and help ClickHouse® scale effectively as data volumes grow.
Exploring ClickHouse® for Your Analytics?
At Quantrail Data, we help teams run ClickHouse® reliably for real-time analytics – from Kubernetes deployments and migrations to performance tuning in production.
We see these challenges firsthand while supporting demanding analytics workloads. In one recent engagement, a customer achieved near bare-metal performance with ClickHouse® in production – a story we’ve shared here:
Success Story: Quantrail Bare-Metal ClickHouse® Deployment
If you’re evaluating ClickHouse® or trying to get more out of an existing setup, we’re happy to share practical lessons from real-world deployments.
Contact
Quantrail Data
References
ClickHouse® Documentation – Choosing a Primary Key
Introduction to ClickHouse® MergeTree Engines



