All posts
Bringing LLMs to SQL: Exploring ClickHouse’s New AI Functions

Bringing LLMs to SQL: Exploring ClickHouse’s New AI Functions

June 3, 20267 min readReshma M
Share:

Introduction

Modern data systems are evolving fast and the line between analytics and artificial intelligence is starting to blur. Traditionally, if you wanted to apply AI models like Large Language Models (LLMs) to your data, you had to build separate pipelines, integrate APIs, and manage complex workflows.

However, with the latest updates in ClickHouse , that’s changing.

ClickHouse introduces native AI functions that allow you to directly interact with LLMs using SQL. As a result, you can now run AI-powered queries on your data in real time without leaving your database.

In this blog, we will walk through what those functions are, how to set them up, and how to use them correctly.

A Quick Note: These Are Experimental

These AI Functions were officially introduced in experimental releases around **ClickHouse version 25.**x+, with availability depending on environment ( Cloud or Specific builds). Additionally Since they are part of a rapidly evolving AI suite, the documentation focuses primarily on the shared infrastructure like retries and token management rather than deep per-function guides.

If you find these functions aren’t enabled in your version, run this first:

SET allow_experimental_ai_functions = 1;

NOTE: This flag enables the feature only if your build includes AI support.

What’s New: AI Functions in ClickHouse

With the recent release, ClickHouse integrates directly with leading AI providers like OpenAI and Anthropic.

Key AI Functions

ai_chat – Used for text generation, summarising, or answering questions based on your data.

ai_embedding – Converts text into vector numbers to power semantic search or “find similar” features.

ai_classify – Automatically tags or labels your data based on the categories you define.

ai_extract – Pulls specific facts (like prices, names, or dates) out of unstructured text.

They behave like SQL functions, but internally rely on external LLM APIs.

Setting Up a Named Collection

Every AI function in ClickHouse requires a Named Collection as its first argument. This is where you store your provider credentials and model configuration. You set it up once and reference it in every query.

-- Create this config once
CREATE NAMED COLLECTION ai_config AS
    provider = 'openai',
    endpoint = 'https://api.openai.com/v1/chat/completions',
    model = 'gpt-4o-mini',
    api_key = 'your-key-here',
    max_tokens = 1024;

Do not store API keys directly in queries in production. Use secure configuration methods.

Any OpenAI-compatible endpoint Ollama, vLLM, LiteLLM can also be used by setting provider = 'openai' and pointing the endpoint to your local service.

Shared Infrastructure: What You Get for Free

All four functions share the same underlying infrastructure, which is why the ClickHouse documentation groups them together:

FeatureDetail
Quota enforcementPer-query token and API call limits via ai_function_max_input_tokens_per_query
Retry with backoffTransient failures are automatically retried with exponential backoff.
ObservabilityAll calls tracked in system.query_log via AIAPICalls, AIInputTokens, AIOutputTokens.
Provider flexibilityOpenAI, Anthropic, or any OpenAI-compatible endpoint.

You can monitor your AI function usage directly in SQL:

SELECTProfileEvents['AIAPICalls'] AS api_calls,ProfileEvents['AIInputTokens'] AS input_tokens,ProfileEvents['AIOutputTokens'] AS output_tokensFROM system.query_logWHERE type = 'QueryFinish'ORDER BY event_time DESCLIMIT 10;

How It Works Behind the Scenes

Instead of exporting your data, to an external AI service, you can now call AI models directly within your query.

While the syntax looks simple, there’s powerful orchestration happening internally:

  1. The SQL query is parsed by ClickHouse.
  2. AI function calls are identified.
  3. ClickHouse sends requests to configured LLM providers.
  4. Responses are returned and integrated into query results.

This process is optimized for:

  • Low latency
  • Parallel execution
  • Scalability

As a result, it is a seamless blend of database performance + AI intelligence.

On the other hand, this approach eliminates the need for:

  • Separate ML pipelines.
  • Data movement between systems.
  • Complex API integrations.

Key Components of ClickHouse AI

  • Model Provider Configuration:
    • AI functions allow you to specify the provider (openai, anthropic), model (gpt-4o-mini, text-embedding-3-small), and API keys, supporting flexible integration.
  • Observability:
    • Built-in tracking of AI-related metrics, such as AI APICalls, AIInputTokens, and AIRowsProcessed.
  • Performance:
    • ClickHouse serves as a unified platform for both storing data and managing AI workloads, eliminating the need for separate specialized databases for vectors or ML features.

These functions are wrappers and that is exactly the point. The quality of the output depends 90% on the LLM you choose and the prompt you write. What ClickHouse adds on top is the ability to apply that LLM across millions of rows, inside joins and aggregations, with native batching, retries, and observability all without leaving SQL.

The old workflow was: export → Python → API → write back. While, 
the new workflow is: one SELECT statement.

Advantages of AI Functions

  • Unified Data + AI Workflow

Everything happens inside ClickHouse, no extra infrastructure needed.

  • Real-Time Processing

Combine analytics and AI in a single query for instant results.

  • Simpler Architecture

Fewer moving parts → easier to maintain and scale.

  • Developer Productivity

Write simple SQL queries instead of complex ML pipelines.

Traditional Approach vs ClickHouse AI

Traditional WorkflowClickHouse AI Approach
Data Exportation neededQuery directly
Call external APIBuilt-in AI functions
Process resultsInstant output
Complex pipelinesSingle SQL query

Challenges and Considerations

While the benefits are significant, there are a few things to keep in mind:

  • Cost Management

LLM API calls may incur costs depending on usage.

  • Data Privacy

Sensitive data must be handled carefully when sent to external providers.

  • Accuracy

LLM outputs may vary and should be validated for critical applications.

  • Experimental Nature

Moreover, Some features are still be evolving.

The Future : AI-Native Databases

The integration of LLMs into databases signals a major shift toward AI-native data platforms. Instead of treating AI as an external tool, systems like ClickHouse are embedding intelligence directly into the data layer.

The introduction of AI functions in ClickHouse is just the beginning.

We are moving toward a future where:

  • Databases can understand natural language
  • Queries become intelligent conversations
  • Insights are automatically generated

This evolution will redefine roles across multiple fields, including data engineering, backend development, and AI system design.”

Conclusion

The integration of LLMs into SQL through ClickHouse represents a major leap in modern data systems. By embedding AI directly into the query layer, ClickHouse simplifies workflows, enhances performance, and unlocks new possibilities for real-time intelligence.

ClickHouse’s native AI functions ( ai_chat, ai_embedding, ai_classify, and ai_extract ) are new, experimental, and still evolving in terms of documentation and availability. But the foundation is solid: a single Named Collection for credentials, a consistent calling pattern, and shared infrastructure that handles the hard parts automatically.

In addition, ClickHouse is also exploring AI-powered SQL generation, where natural language can be converted into SQL queries. This represents another step toward conversational analytics and will be covered in a future blog.

Hence, The future is not just about querying data, it’s about interacting with data intelligently.

References

Share: