All posts
Optimizing ClickHouse® for High-Throughput Inserts

Optimizing ClickHouse® for High-Throughput Inserts

July 4, 20266 min readGayathri
Share:

Introduction

One of the biggest reasons organizations choose ClickHouse® for high-throughput inserts is its ability to ingest millions of rows per second while maintaining lightning-fast query performance. Whether you're collecting application logs, IoT sensor data, financial transactions, or user analytics, efficient data ingestion is essential for building a high-performance analytical platform.

However, achieving maximum insert performance isn't just about using powerful hardware. Table design, batching strategy, partitioning, compression, and background merge operations all play an important role in determining how efficiently ClickHouse handles incoming data.

In this blog, we'll explore the best practices for optimizing ClickHouse for high-throughput inserts, helping you reduce ingestion latency, minimize system overhead, and maximize overall performance.

Why Insert Performance Matters

Efficient data ingestion is critical for analytics workloads where data is continuously generated from multiple sources.

Poor insert performance can lead to:

  • Increased ingestion latency
  • Higher CPU and disk utilization
  • Excessive background merge operations
  • Large numbers of small data parts
  • Reduced query performance

By optimizing insert operations, you can reduce system overhead while ensuring that newly ingested data becomes available for analysis as quickly as possible.

How ClickHouse Handles Inserts

Unlike traditional row-oriented databases, ClickHouse is optimized for batch inserts.

When data is inserted into a MergeTree table, ClickHouse performs the following steps:

  1. Creates a new data part.
  2. Sorts the data according to the primary key (ORDER BY).
  3. Compresses the column data.
  4. Writes the data part to disk.
  5. Background merge threads combine smaller parts into larger ones over time.

This architecture enables ClickHouse to efficiently handle large-scale data ingestion while maintaining excellent query performance.

Best Practices for High-Throughput Inserts

1. Insert Data in Large Batches

Batch inserts are one of the most effective ways to improve ingestion performance.

Small inserts generate many tiny data parts, increasing merge operations and placing unnecessary load on the system.

Less Efficient

INSERT INTO events VALUES (...);

Executing this statement repeatedly for individual rows creates excessive overhead.

INSERT INTO events FORMAT CSV

Load thousands or even millions of rows in a single insert operation.

Rows per InsertPerformance
1–100Poor
1,000–10,000Good
10,000–100,000+Excellent

Larger batches reduce the number of generated parts and improve overall throughput.

2. Choose an Efficient Primary Key

The primary key (ORDER BY) determines how data is physically sorted on disk.

A well-designed primary key should:

  • Match common query patterns
  • Minimize unnecessary sorting
  • Avoid excessively high-cardinality columns
  • Support efficient filtering

Example:

ORDER BY (event_date, user_id)

A carefully selected primary key improves both insert efficiency and query performance.

3. Partition Data Effectively

Partitioning organizes data into logical groups, making inserts and maintenance more efficient.

Example:

PARTITION BY toYYYYMM(event_date)

Benefits include:

  • Faster inserts
  • Efficient partition pruning
  • Simplified data retention
  • Easier maintenance operations

Avoid creating too many partitions, as excessive partition counts increase metadata and merge overhead.

4. Minimize Small Data Parts

Every insert creates a new data part. Excessive small inserts lead to a large number of parts, increasing background merge activity.

Common issues include:

  • High CPU utilization
  • Increased disk I/O
  • Longer merge times
  • Slower query performance

Monitor active parts using:

SELECT
    database,
    table,
    count() AS active_parts
FROM system.parts
WHERE active
GROUP BY database, table;

If part counts continue growing, consider increasing insert batch sizes.

5. Use High-Performance Data Formats

ClickHouse supports multiple input formats for data ingestion.

Common formats include:

  • Native
  • Parquet
  • CSV
  • JSONEachRow

The Native format provides the best performance because it minimizes parsing overhead and is optimized specifically for ClickHouse.

Whenever possible, use the Native format for large-scale data ingestion.

6. Optimize Compression Codecs

Compression reduces storage requirements and improves disk efficiency.

ClickHouse supports multiple compression codecs, including:

  • LZ4 (default)
  • ZSTD

Example:

value String CODEC(ZSTD)
  • LZ4 provides faster compression and decompression.
  • ZSTD offers higher compression ratios, reducing storage usage.

Choose the codec that best fits your workload and storage requirements.

7. Optimize Background Merge Operations

Background merges combine smaller data parts into larger ones, improving query performance and reducing storage fragmentation.

Monitor merge activity using:

SELECT * FROM system.merges;

Frequent merge activity may indicate:

  • Small insert batches
  • Excessive concurrent inserts
  • High ingestion rates

Using larger batches reduces merge pressure and improves system efficiency.

8. Monitor Insert Performance

Monitoring helps identify bottlenecks before they affect production workloads.

The system.query_log table provides information about recent insert operations.

Example:

SELECT
    query,
    query_duration_ms,
    written_rows,
    written_bytes
FROM system.query_log
WHERE type = 'QueryFinish'
ORDER BY event_time DESC
LIMIT 10;

Regular monitoring allows you to identify slow inserts and optimize ingestion performance.

9. Enable Asynchronous Inserts

Applications that generate frequent small inserts can benefit from asynchronous inserts.

Example:

SET async_insert = 1;
SET wait_for_async_insert = 1;

Advantages include:

  • Automatic batching of small inserts
  • Fewer generated data parts
  • Reduced merge overhead
  • Improved ingestion throughput

Asynchronous inserts are particularly useful for streaming applications and telemetry workloads.

10. Optimize Insert Settings

ClickHouse provides several settings that can improve insert performance depending on your workload.

Some commonly used settings include:

  • async_insert – Buffers small inserts before writing them to disk.
  • wait_for_async_insert – Ensures data is written before returning success.
  • max_insert_block_size – Controls the number of rows processed in a single insert block.
  • max_partitions_per_insert_block – Prevents inserts from creating too many partitions.

Carefully tuning these settings can improve throughput, reduce merge overhead, and make ingestion more efficient, especially in streaming or high-volume environments.

Best Practices Checklist

  • Insert data in large batches.
  • Choose an efficient primary key.
  • Partition data appropriately.
  • Minimize the creation of small data parts.
  • Use the Native format whenever possible.
  • Select appropriate compression codecs.
  • Monitor system.parts and system.merges.
  • Review insert performance using system.query_log.
  • Enable asynchronous inserts for frequent small writes.
  • Balance concurrent insert workloads.

Conclusion

ClickHouse is capable of ingesting millions of rows per second, but achieving consistent high-throughput performance requires thoughtful configuration and efficient data-loading strategies.

By using larger batch inserts, selecting an appropriate primary key, partitioning data effectively, optimizing compression, and monitoring merge activity, you can significantly improve ingestion performance while maintaining fast analytical queries.

Whether you're building real-time dashboards, processing application logs, handling IoT telemetry, or analyzing business events, these best practices will help you maximize ClickHouse's ingestion capabilities and build scalable, high-performance analytics systems.

References

  1. ClickHouse Documentation – Selecting an Insert Strategy
    https://clickhouse.com/docs/best-practices/selecting-an-insert-strategy

  2. ClickHouse Documentation – MergeTree Table Engine
    https://clickhouse.com/docs/engines/table-engines/mergetree-family/mergetree

  3. ClickHouse Documentation – Asynchronous Inserts
    https://clickhouse.com/docs/optimize/asynchronous-inserts

Share: