All posts
Using UUID as a Primary Key in ClickHouse: Benefits, Trade-offs, and Real Benchmarks

Using UUID as a Primary Key in ClickHouse: Benefits, Trade-offs, and Real Benchmarks

October 16, 20254 min readMohamed Hussain S
Share:

Primary key design plays a crucial role in ClickHouse performance. It determines how data is stored, merged, and queried – directly impacting speed, compression, and scalability.

One commonly discussed choice is whether to use a UUID as a primary key instead of a numeric ID.
In this blog, we’ll explore what UUID is, why engineering teams use it, its trade-offs, and – most importantly – how it performs in practice with a real benchmark.

What Is UUID in ClickHouse

A UUID (Universally Unique Identifier) is a 128-bit (16-byte) value that guarantees uniqueness across systems.
In ClickHouse, it’s stored internally as a FixedString(16) – compact and efficient compared to regular strings.

Example table definitions:

CREATE TABLE test.events_uint
(
    id UInt64,
    user_id UInt64,
    event_time DateTime,
    event_type String
)
ENGINE = MergeTree()
ORDER BY id;
 
CREATE TABLE test.events_uuid
(
    id UUID,
    user_id UInt64,
    event_time DateTime,
    event_type String
)
ENGINE = MergeTree()
ORDER BY id;

Both tables are identical except for their primary key data type.

Why Engineering Teams Use UUID as a Primary Key

  1. Global Uniqueness Across Sources
    UUIDs ensure globally unique identifiers across distributed pipelines or multi-node ingestion systems.
  2. No Centralized ID Generation
    No need for a sequence generator or MAX(id) handling – every record can be generated independently.
  3. Simplified Data Merging
    When consolidating data from multiple environments, UUIDs make deduplication and reconciliation easy.
  4. Concurrency-Friendly for Writes
    Multiple writers can safely insert data without worrying about ID collisions.

Trade-offs of Using UUID

  1. Poor Sorting Locality
    UUIDs are random, which means inserts are not sequential. This affects how efficiently ClickHouse merges and compresses parts.
  2. Higher Storage Overhead
    UUIDs (16 bytes) are larger than typical numeric IDs (8 bytes), resulting in slightly higher index and part sizes.
  3. Reduced Compression Efficiency
    Since data isn’t naturally ordered, compression ratios can be lower than for sequential numeric keys.
  4. Not Ideal as a Sorting Key
    For analytical tables, it’s better to order by time or another sequential field for efficient merges.

Benchmark Setup

To test the real impact of UUID vs UInt64, we created two tables (events_uint and events_uuid) and inserted 5 million rows into each.

Insert Statements

We then measured:

  • Insert speed (rows per second)
  • Table size on disk
  • Compression ratio
  • Query performance

Benchmark Results

MetricUInt64 KeyUUID Key
Insert Speed2.3M rows/s1.9M rows/s
Table Size58.13 MiB128.40 MiB
Compression Ratio0.4350.748
Query Time (count())0.038 sec.0.046 sec.

Results are based on a single-node ClickHouse instance; actual performance may vary.

Interpretation

  • UUIDs simplify ID management in distributed systems but add minor overhead in storage and insert performance.
  • Numeric IDs (UInt64) remain faster for sequential workloads or analytics tables with ordered inserts.
  • The optimal choice depends on your data model – for event pipelines or decentralized ingestion, UUIDs are worth it.

Best Practices

  • Use UUID as a primary key, not necessarily as a sorting key.
  • Consider composite sorting keys like (event_time, event_id) for better compression.
  • Benchmark using your actual data scale before finalizing schema decisions.

Example:

CREATE TABLE events
(
    event_id UUID,
    event_time DateTime,
    user_id UInt64,
    event_type String
)
ENGINE = MergeTree()
ORDER BY (event_time, event_id);

Final Thoughts

Choosing the right primary key in ClickHouse isn’t about following trends – it’s about understanding trade-offs.
UUIDs bring flexibility and scalability, especially in distributed systems, but sequential numeric keys still win in pure performance and compression.

Balancing both gives you the best of both worlds.

Looking for ClickHouse Expertise?

At Quantrail Data, we work closely with teams to make ClickHouse adoption seamless – from smooth deployments and guided migrations to performance benchmarking and scalable data pipelines.

We’ve seen these improvements in action, helping teams unlock bare-metal-level performance for real-time analytics – as shared in our Success Story: Quantrail Baremetal.

Whether you’re optimizing an existing setup or planning a migration, our goal is to make your ClickHouse journey efficient, reliable, and ready for production.

👉 Get in touch to see how we can help you make the most of ClickHouse.

References

ClickHouse Official Documentation – UUID Data Type
ClickHouse MergeTree Engine Overview

Share: