All posts
Writing Custom ClickHouse® User-Defined Functions (UDFs)

Writing Custom ClickHouse® User-Defined Functions (UDFs)

July 3, 20267 min readMohamed Hussain S
Share:

When working with ClickHouse® User-Defined Functions (UDFs), you'll often encounter situations where the same expressions are repeated across multiple queries. Whether you're cleaning strings, standardizing calculations, or simplifying business logic, repeating complex expressions can make queries harder to maintain and understand.

ClickHouse® User-Defined Functions allow you to encapsulate reusable logic into a named function that can be called throughout your queries. Instead of copying and pasting lengthy expressions, you can define them once and reuse them wherever needed.

In this article, we'll explore how UDFs work in ClickHouse®, how to create them, common use cases, limitations, and practical examples that can help make your analytics workflows cleaner and easier to manage.

What Are User-Defined Functions?

A User-Defined Function (UDF) is a custom function that you create and register within ClickHouse®. Once created, it behaves similarly to a built-in function and can be referenced in SQL queries.

For example, imagine you frequently need to calculate the profit margin percentage:

(round((revenue - cost) / revenue * 100, 2))

Repeating this expression across dozens of dashboards and reports can become difficult to maintain. By creating a UDF, you can encapsulate the logic:

CREATE FUNCTION profit_margin AS
(revenue, cost) ->
round((revenue - cost) / revenue * 100, 2);

Now the same calculation becomes:

SELECT profit_margin(revenue, cost)
FROM sales;

This improves readability while ensuring consistent business logic across teams.

Why Use UDFs in ClickHouse®?

There are several reasons why UDFs can be useful in production environments.

Improved Query Readability

Complex calculations can quickly clutter SQL statements. Wrapping logic inside a descriptive function makes queries easier to understand.

Instead of:

SELECT
    round((revenue - cost) / revenue * 100, 2)
FROM sales;

You can write:

SELECT
    profit_margin(revenue, cost)
FROM sales;

The second version clearly communicates intent.

Reusable Business Logic

Organizations often have calculations that appear throughout dashboards, reports, and ETL pipelines.

Examples include:

  • Profit calculations
  • Currency conversions
  • Data normalization
  • String formatting
  • Customer segmentation rules

Creating a UDF helps ensure everyone uses the same implementation.

Easier Maintenance

If business requirements change, you only need to update the function definition instead of modifying dozens of separate queries.

This reduces the risk of inconsistencies and human error.

Creating Your First ClickHouse® UDF

Let's create a simple function that converts a string to title case.

CREATE FUNCTION format_name AS
(name) -> initcap(lower(name));

Usage:

SELECT format_name('CLICKHOUSE');

Output:

Clickhouse

You can verify that the function exists using:

SHOW FUNCTIONS;

Or query the system catalog:

SELECT *
FROM system.functions
WHERE name = 'format_name';

This allows administrators and developers to inspect available functions within the environment.

Practical Examples

Example 1: Standardizing Email Domains

Suppose user records contain inconsistent capitalization.

CREATE FUNCTION normalize_email AS
(email) ->
lower(email);

Usage:

SELECT normalize_email('Quantrail@Example.COM');

Result:

quantrail@example.com

Example 2: Revenue Classification

Business teams frequently classify customers based on revenue.

CREATE FUNCTION revenue_tier AS
(revenue) ->
multiIf(
    revenue >= 100000, 'Enterprise',
    revenue >= 10000, 'Growth',
    'Starter'
);

Usage:

SELECT
    customer_name,
    revenue_tier(revenue)
FROM customers;

This ensures consistent categorization across reports.

Example 3: Geographical Formatting

CREATE FUNCTION country_code AS
(country) ->
upper(left(country, 2));

Usage:

SELECT country_code('india');

Result:

IN

Understanding UDF Types and Limitations

ClickHouse® actually supports a few different types of UDFs depending on how complex your logic is. Understanding which one to use will save you a lot of architectural headaches.

1. SQL User-Defined Functions (Lambda Expressions)

These are the functions we've focused on so far. They are purely SQL-expression based.

  • Best For: Lightweight transformations, math formulas, and string formatting.
  • Limitation: They cannot execute complex procedural logic (like loops) or pull in external libraries. They must be deterministic and stateless.

2. Executable User-Defined Functions (Scripts)

If you need Python, JavaScript, or Bash, ClickHouse allows you to define "Executable UDFs." You register a script in the ClickHouse server configuration, and ClickHouse streams data into it via stdin and reads the result from stdout.

  • Best For: Heavy-lifting tasks like running data through a pre-trained machine learning model, tokenizing text with advanced NLP libraries, or hitting a local cache.
  • Stateful Exception: Unlike SQL UDFs, executable UDFs run as long-lived external processes. This means they can maintain a form of state—like keeping a lookup table or ML model loaded in memory across data batches.

3. WebAssembly (WASM) UDFs

For compiled languages like Rust or C++, ClickHouse supports running WASM binaries. This gives you the power of external code libraries but executes inside the ClickHouse runtime with near-native performance and strict sandboxing.

Rule of Thumb: Keep it simple. Start with standard SQL UDFs for 90% of your analytics workflows. Only upgrade to Executable or WASM UDFs if you genuinely need external programming languages or complex state management.

Performance Considerations

One common question is whether UDFs introduce performance overhead.

In most cases, ClickHouse® expands the function definition during query execution. Since UDFs are essentially wrappers around expressions, performance is generally similar to writing the expression directly.

However, it's still important to:

  • Row-Count Constraints: Standard SQL UDFs are scalar-they intake a row and output a single value for that row. If your logic requires changing the number of rows (like exploding an array or filtering data out entirely), you should use ClickHouse Table Functions instead of a UDF.
  • Avoid unnecessarily complex nested functions.
  • Benchmark critical workloads.
  • Review execution plans when optimizing large analytical queries.
  • Use built-in functions when they already satisfy the requirement.

You can inspect query behavior using:

EXPLAIN
SELECT profit_margin(revenue, cost)
FROM sales;

For deeper performance analysis, consider reviewing query execution metrics and profiling tools available in ClickHouse®.

For additional ClickHouse® optimization techniques, you may also find our guide on query performance tuning useful:

ClickHouse® Query Optimization

Best Practices for Production Environments

When introducing UDFs into production systems, follow a few simple guidelines.

Use Descriptive Names

Choose names that clearly describe the business purpose.

Good:

customer_lifetime_value()
profit_margin()
revenue_tier()

Less clear:

calc1()
func2()
helper()

Document Function Behavior

Maintain documentation for:

  • Inputs
  • Outputs
  • Assumptions
  • Example usage

This makes onboarding easier for new team members.

Version Carefully

If business logic changes significantly, consider creating a new function version instead of silently replacing existing behavior.

For example:

profit_margin_v2()

This prevents unexpected reporting differences.

Review Function Usage

Periodically review existing UDFs to identify:

  • Duplicates
  • Deprecated functions
  • Unused functions
  • Opportunities for simplification

A smaller, well-maintained function catalog is easier to manage.

Final Thoughts

ClickHouse® User-Defined Functions provide a convenient way to encapsulate reusable SQL logic, improve query readability, and standardize business calculations across your organization. While they are intentionally lightweight compared to procedural functions found in other databases, they are extremely useful for reducing repetition and maintaining consistent analytics logic.

Whether you're building reporting dashboards, ETL pipelines, or analytical workloads, UDFs can help keep your queries cleaner and easier to maintain. Start with simple transformations, follow clear naming conventions, and document your functions well. Over time, you'll build a reusable library of logic that improves both developer productivity and query maintainability.

By understanding when and how to use ClickHouse® UDFs effectively, you can write cleaner SQL and create more maintainable analytical systems.

References

ClickHouse® - UDFs User Defined Functions

Work with Quantrail

Expert ClickHouse services

We design, migrate, tune, and run ClickHouse for teams that own their data, from first architecture through day-two operations. Tell us what you are building and we will help.

Talk to an expert

Manage ClickHouse with CHOps

CHOps is our free, open-source ClickHouse admin tool: monitoring, query profiling, backups, visual access control, and alerting in one self-hosted interface, with zero agents on your servers.

Explore CHOps
Share: