Introduction
Modern data systems are evolving fast and the line between analytics and artificial intelligence is starting to blur. Traditionally, if you wanted to apply AI models like Large Language Models (LLMs) to your data, you had to build separate pipelines, integrate APIs, and manage complex workflows.
However, with the latest updates in ClickHouse , that’s changing.
ClickHouse introduces native AI functions that allow you to directly interact with LLMs using SQL. As a result, you can now run AI-powered queries on your data in real time without leaving your database.
In this blog, we will walk through what those functions are, how to set them up, and how to use them correctly.
A Quick Note: These Are Experimental
These AI Functions were officially introduced in experimental releases around **ClickHouse version 25.**x+, with availability depending on environment ( Cloud or Specific builds). Additionally Since they are part of a rapidly evolving AI suite, the documentation focuses primarily on the shared infrastructure like retries and token management rather than deep per-function guides.
If you find these functions aren’t enabled in your version, run this first:
SET allow_experimental_ai_functions = 1;NOTE: This flag enables the feature only if your build includes AI support.
What’s New: AI Functions in ClickHouse
With the recent release, ClickHouse integrates directly with leading AI providers like OpenAI and Anthropic.
Key AI Functions
ai_chat – Used for text generation, summarising, or answering questions based on your data.
ai_embedding – Converts text into vector numbers to power semantic search or “find similar” features.
ai_classify – Automatically tags or labels your data based on the categories you define.
ai_extract – Pulls specific facts (like prices, names, or dates) out of unstructured text.
They behave like SQL functions, but internally rely on external LLM APIs.
Setting Up a Named Collection
Every AI function in ClickHouse requires a Named Collection as its first argument. This is where you store your provider credentials and model configuration. You set it up once and reference it in every query.
-- Create this config once
CREATE NAMED COLLECTION ai_config AS
provider = 'openai',
endpoint = 'https://api.openai.com/v1/chat/completions',
model = 'gpt-4o-mini',
api_key = 'your-key-here',
max_tokens = 1024;Do not store API keys directly in queries in production. Use secure configuration methods.
Any OpenAI-compatible endpoint Ollama, vLLM, LiteLLM can also be used by setting provider = 'openai' and pointing the endpoint to your local service.
Shared Infrastructure: What You Get for Free
All four functions share the same underlying infrastructure, which is why the ClickHouse documentation groups them together:
| Feature | Detail |
|---|---|
| Quota enforcement | Per-query token and API call limits via ai_function_max_input_tokens_per_query |
| Retry with backoff | Transient failures are automatically retried with exponential backoff. |
| Observability | All calls tracked in system.query_log via AIAPICalls, AIInputTokens, AIOutputTokens. |
| Provider flexibility | OpenAI, Anthropic, or any OpenAI-compatible endpoint. |
You can monitor your AI function usage directly in SQL:
SELECTProfileEvents['AIAPICalls'] AS api_calls,ProfileEvents['AIInputTokens'] AS input_tokens,ProfileEvents['AIOutputTokens'] AS output_tokensFROM system.query_logWHERE type = 'QueryFinish'ORDER BY event_time DESCLIMIT 10;How It Works Behind the Scenes
Instead of exporting your data, to an external AI service, you can now call AI models directly within your query.
While the syntax looks simple, there’s powerful orchestration happening internally:
- The SQL query is parsed by ClickHouse.
- AI function calls are identified.
- ClickHouse sends requests to configured LLM providers.
- Responses are returned and integrated into query results.
This process is optimized for:
- Low latency
- Parallel execution
- Scalability
As a result, it is a seamless blend of database performance + AI intelligence.
On the other hand, this approach eliminates the need for:
- Separate ML pipelines.
- Data movement between systems.
- Complex API integrations.
Key Components of ClickHouse AI
- Model Provider Configuration:
- AI functions allow you to specify the provider (
openai,anthropic), model (gpt-4o-mini,text-embedding-3-small), and API keys, supporting flexible integration.
- AI functions allow you to specify the provider (
- Observability:
- Built-in tracking of AI-related metrics, such as
AI APICalls,AIInputTokens, andAIRowsProcessed.
- Built-in tracking of AI-related metrics, such as
- Performance:
- ClickHouse serves as a unified platform for both storing data and managing AI workloads, eliminating the need for separate specialized databases for vectors or ML features.
These functions are wrappers and that is exactly the point. The quality of the output depends 90% on the LLM you choose and the prompt you write. What ClickHouse adds on top is the ability to apply that LLM across millions of rows, inside joins and aggregations, with native batching, retries, and observability all without leaving SQL.
The old workflow was: export → Python → API → write back. While,
the new workflow is: one SELECT statement.Advantages of AI Functions
- Unified Data + AI Workflow
Everything happens inside ClickHouse, no extra infrastructure needed.
- Real-Time Processing
Combine analytics and AI in a single query for instant results.
- Simpler Architecture
Fewer moving parts → easier to maintain and scale.
- Developer Productivity
Write simple SQL queries instead of complex ML pipelines.
Traditional Approach vs ClickHouse AI
| Traditional Workflow | ClickHouse AI Approach |
|---|---|
| Data Exportation needed | Query directly |
| Call external API | Built-in AI functions |
| Process results | Instant output |
| Complex pipelines | Single SQL query |
Challenges and Considerations
While the benefits are significant, there are a few things to keep in mind:
- Cost Management
LLM API calls may incur costs depending on usage.
- Data Privacy
Sensitive data must be handled carefully when sent to external providers.
- Accuracy
LLM outputs may vary and should be validated for critical applications.
- Experimental Nature
Moreover, Some features are still be evolving.
The Future : AI-Native Databases
The integration of LLMs into databases signals a major shift toward AI-native data platforms. Instead of treating AI as an external tool, systems like ClickHouse are embedding intelligence directly into the data layer.
The introduction of AI functions in ClickHouse is just the beginning.
We are moving toward a future where:
- Databases can understand natural language
- Queries become intelligent conversations
- Insights are automatically generated
This evolution will redefine roles across multiple fields, including data engineering, backend development, and AI system design.”
Conclusion
The integration of LLMs into SQL through ClickHouse represents a major leap in modern data systems. By embedding AI directly into the query layer, ClickHouse simplifies workflows, enhances performance, and unlocks new possibilities for real-time intelligence.
ClickHouse’s native AI functions ( ai_chat, ai_embedding, ai_classify, and ai_extract ) are new, experimental, and still evolving in terms of documentation and availability. But the foundation is solid: a single Named Collection for credentials, a consistent calling pattern, and shared infrastructure that handles the hard parts automatically.
In addition, ClickHouse is also exploring AI-powered SQL generation, where natural language can be converted into SQL queries. This represents another step toward conversational analytics and will be covered in a future blog.
Hence, The future is not just about querying data, it’s about interacting with data intelligently.



