This guide shows you how to stream every insert, update, and delete from Microsoft SQL Server into the ClickHouse® database in near real time, using Debezium, Apache Kafka, and Docker. SQL Server has its own built-in Change Data Capture feature that you must switch on first, and we walk through every step of that.
This article is self-contained. If Debezium and ClickHouse are brand new to you, the short overview in What is Debezium and how to offload analytics to ClickHouse is a good primer first.
No prior experience with Debezium, Kafka, or ClickHouse is assumed.
What you will build
A pipeline where SQL Server records changes in its transaction log and, with CDC enabled, into change tables, Debezium reads those changes and turns each into an event, Apache Kafka stores the events durably, and the ClickHouse Kafka Connect Sink writes them into a ClickHouse table. Change a row in SQL Server, and the same change appears in ClickHouse a moment later. Everything runs locally in Docker.
What is Change Data Capture, in plain English
Your application stores data in SQL Server, and your analytics team wants to run heavy reports. Running them on production would slow it down for real users, and copying the whole database nightly is stale and wasteful. Change Data Capture copies only what changed, as it happens, by reading the database's own change records. It adds almost no load.
How SQL Server CDC is different
The relational databases earlier in this series expose their change log to Debezium more or less directly. SQL Server is different in one important way: it has its own feature literally called Change Data Capture, and Debezium builds on top of it rather than reading the transaction log itself.
This means there is a setup step you cannot skip. You must enable CDC at the database level, and then enable it on each table you want to capture. When you do, SQL Server starts a background job, run by the SQL Server Agent, that reads the transaction log and copies changes into special change tables. Debezium then reads those change tables. Two consequences follow: the SQL Server Agent must be running, and you must explicitly turn CDC on. We handle both below.
Why SQL Server and ClickHouse are a great pair
SQL Server is a capable transactional database that runs many enterprise applications. Large analytical queries on it are expensive, both in performance and often in licensing. Streaming changes into ClickHouse, a columnar database built for analytics, lets you keep SQL Server as the system of record while doing heavy reporting on a fast, cost-effective analytical database.
The tools and the exact versions
Pinning specific, compatible versions is what makes this run. Do not assume a different tag behaves the same way.
| Component | Role | Image and version |
|---|---|---|
| SQL Server | Source database | mcr.microsoft.com/mssql/server:2022-latest |
| Apache Kafka | Event log / transport | apache/kafka:4.1.0 (KRaft mode, no ZooKeeper) |
| Debezium | SQL Server source connector | quay.io/debezium/connect:3.5 |
| ClickHouse Kafka Connect Sink | Loads events into ClickHouse | v1.3.7 |
| ClickHouse | Analytics database | clickhouse/clickhouse-server:26.3 (LTS) |
Debezium 3.5 (specifically 3.5.2.Final, released 2026-06-02) is built and tested against Kafka 4.1 and supports SQL Server 2017, 2019, and 2022. CDC is available in SQL Server Standard, Enterprise, and Developer editions; the container uses the free Developer edition by default. ClickHouse 26.3 is the current Long Term Support release. Kafka 4 uses KRaft and has no ZooKeeper.
The dataset
We will use the OpenFlights airports dataset, published under the Open Database License, so it is free to use. We load a small slice of it into SQL Server with an ingest script and then watch our edits flow to ClickHouse.
Prerequisites
You need Docker and Docker Compose, plus roughly 6 GB of free memory, since SQL Server is memory-hungry.
Step 1: Prepare a project folder
mkdir sqlserver-to-clickhouse-cdc
cd sqlserver-to-clickhouse-cdc
mkdir -p connect-pluginsStep 2: Download the ClickHouse Kafka Connect Sink
cd connect-plugins
curl -L -o clickhouse-kafka-connect.zip \
https://github.com/ClickHouse/clickhouse-kafka-connect/releases/download/v1.3.7/clickhouse-kafka-connect-v1.3.7.zip
unzip clickhouse-kafka-connect.zip
rm clickhouse-kafka-connect.zip
cd ..Step 3: The ingest script
This is the script that creates our database, enables CDC, creates the airports table, loads real rows from the OpenFlights dataset, and finally enables CDC on the table. Create a file named init.sql:
-- Create the database and enable Change Data Capture on it.
IF DB_ID('store') IS NULL CREATE DATABASE store;
GO
USE store;
GO
EXEC sys.sp_cdc_enable_db;
GO
-- Create the airports table.
CREATE TABLE dbo.airports (
airport_id INT PRIMARY KEY,
name NVARCHAR(255) NOT NULL,
city NVARCHAR(255),
country NVARCHAR(100),
iata NVARCHAR(3),
altitude INT
);
GO
-- Load a slice of the OpenFlights airports dataset.
INSERT INTO dbo.airports (airport_id, name, city, country, iata, altitude) VALUES
(1, 'Goroka Airport', 'Goroka', 'Papua New Guinea', 'GKA', 5282),
(2, 'Madang Airport', 'Madang', 'Papua New Guinea', 'MAG', 20),
(3, 'Mount Hagen Airport', 'Mount Hagen', 'Papua New Guinea', 'HGU', 5388),
(4, 'Nadzab Airport', 'Nadzab', 'Papua New Guinea', 'LAE', 239),
(5, 'Port Moresby Jacksons Intl', 'Port Moresby', 'Papua New Guinea', 'POM', 146);
GO
-- Enable CDC on the table. The SQL Server Agent must be running for this to work.
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'airports',
@role_name = NULL,
@supports_net_changes = 0;
GOThe order matters. CDC must be enabled on the database before you can enable it on a table, and the table must exist before you enable CDC on it. The @role_name = NULL means any user with select rights can read the change data, which is fine for a local tutorial.
Step 4: The Docker Compose file
Create docker-compose.yml. Note the MSSQL_AGENT_ENABLED setting, which turns on the SQL Server Agent that CDC depends on:
services:
# SQL Server with the Agent enabled, which CDC requires.
sqlserver:
image: mcr.microsoft.com/mssql/server:2022-latest
environment:
ACCEPT_EULA: "Y"
MSSQL_PID: "Developer"
MSSQL_SA_PASSWORD: "Str0ng!Passw0rd"
MSSQL_AGENT_ENABLED: "true"
ports:
- "1433:1433"
# A single-node Kafka 4 broker in KRaft mode (no ZooKeeper).
kafka:
image: apache/kafka:4.1.0
ports:
- "9092:9092"
environment:
KAFKA_NODE_ID: 1
KAFKA_PROCESS_ROLES: broker,controller
KAFKA_CONTROLLER_QUORUM_VOTERS: 1@kafka:9093
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
# Kafka Connect (Debezium image) with the ClickHouse sink mounted in.
connect:
image: quay.io/debezium/connect:3.5
depends_on:
- kafka
- sqlserver
ports:
- "8083:8083"
environment:
BOOTSTRAP_SERVERS: kafka:9092
GROUP_ID: cdc-connect
CONFIG_STORAGE_TOPIC: connect_configs
OFFSET_STORAGE_TOPIC: connect_offsets
STATUS_STORAGE_TOPIC: connect_statuses
KEY_CONVERTER: org.apache.kafka.connect.json.JsonConverter
VALUE_CONVERTER: org.apache.kafka.connect.json.JsonConverter
CONNECT_KEY_CONVERTER_SCHEMAS_ENABLE: "false"
CONNECT_VALUE_CONVERTER_SCHEMAS_ENABLE: "false"
volumes:
- ./connect-plugins/clickhouse-kafka-connect-v1.3.7:/kafka/connect/clickhouse-kafka-connect
# The analytics database.
clickhouse:
image: clickhouse/clickhouse-server:26.3
ports:
- "8123:8123"
- "9000:9000"
environment:
CLICKHOUSE_USER: default
CLICKHOUSE_PASSWORD: clickhouse
CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT: "1"
ulimits:
nofile:
soft: 262144
hard: 262144Step 5: Start the stack and run the ingest script
docker compose up -dSQL Server takes a minute to become ready on first start. Once it is up, run the ingest script. The 2022 image ships the command-line client at /opt/mssql-tools18/bin/sqlcmd, and the -C flag tells it to trust the container's self-signed certificate:
cat init.sql | docker compose exec -T sqlserver \
/opt/mssql-tools18/bin/sqlcmd -U sa -P 'Str0ng!Passw0rd' -CIf you see an error that the Agent is not running, wait a few more seconds and run it again; the Agent can take a moment to start.
Step 6: Create the target table in ClickHouse
ClickHouse is append-only at heart. To reflect updates and deletes we use a ReplacingMergeTree, which keeps versions of a row and returns the newest per key when asked, plus a version column and a deleted flag.
docker compose exec clickhouse clickhouse-client --password clickhouseCREATE DATABASE IF NOT EXISTS store;
CREATE TABLE store.airports
(
airport_id Int32,
name String,
city String,
country String,
iata String,
altitude Int32,
_version UInt64,
_deleted UInt8
)
ENGINE = ReplacingMergeTree(_version, _deleted)
ORDER BY airport_id;Step 7: Register the Debezium SQL Server source connector
Create a file named sqlserver-source.json:
{
"name": "sqlserver-source",
"config": {
"connector.class": "io.debezium.connector.sqlserver.SqlServerConnector",
"tasks.max": "1",
"database.hostname": "sqlserver",
"database.port": "1433",
"database.user": "sa",
"database.password": "Str0ng!Passw0rd",
"database.names": "store",
"database.encrypt": "false",
"topic.prefix": "store",
"table.include.list": "dbo.airports",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-history.store",
"transforms": "unwrap",
"transforms.unwrap.type": "io.debezium.transforms.ExtractNewRecordState",
"transforms.unwrap.delete.tombstone.handling.mode": "rewrite",
"transforms.unwrap.add.fields": "op,source.ts_ms"
}
}Send it to Kafka Connect:
curl -X POST -H "Content-Type: application/json" \
--data @sqlserver-source.json \
http://localhost:8083/connectorsA few SQL Server specifics. database.names lists the databases to capture (Debezium 3.5 uses this rather than the older singular property). database.encrypt is set to false so the connector does not insist on a trusted TLS certificate, which keeps the local setup simple; in production you would use proper certificates instead. table.include.list uses the schema and table name, dbo.airports. Events land in a Kafka topic named store.dbo.airports, which is the topic prefix, the schema, and the table.
The transforms block is the same idea as the other relational guides. ExtractNewRecordState flattens Debezium's nested event into a plain row, delete.tombstone.handling.mode set to rewrite turns a delete into a row with an added __deleted field, and add.fields of op,source.ts_ms attaches the operation type and the event timestamp. We use the timestamp as our version. SQL Server orders changes by a log sequence number that is a multi-part value rather than a single number, so the event timestamp is the simplest version to use here, with the usual caveat that two changes in the same millisecond cannot be told apart.
Step 8: Register the ClickHouse sink connector
Create a file named clickhouse-sink.json:
{
"name": "clickhouse-sink",
"config": {
"connector.class": "com.clickhouse.kafka.connect.ClickHouseSinkConnector",
"tasks.max": "1",
"topics": "store.dbo.airports",
"hostname": "clickhouse",
"port": "8123",
"database": "store",
"username": "default",
"password": "clickhouse",
"ssl": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms": "renameFields",
"transforms.renameFields.type": "org.apache.kafka.connect.transforms.ReplaceField$Value",
"transforms.renameFields.renames": "__source_ts_ms:_version,__deleted:_deleted"
}
}Send it:
curl -X POST -H "Content-Type: application/json" \
--data @clickhouse-sink.json \
http://localhost:8083/connectorsStep 9: Verify the CDC pipeline
This is the part most tutorials skip, so we cover it carefully. There are four checks, from source to destination.
First, confirm both connectors are running, not failed:
curl -s http://localhost:8083/connectors/sqlserver-source/status
curl -s http://localhost:8083/connectors/clickhouse-sink/statusBoth should report a state of RUNNING. If the source connector failed, the message almost always points at CDC not being enabled or the Agent not running.
Second, confirm SQL Server is actually capturing changes. CDC writes to change tables; this query shows the captured tables:
docker compose exec -T sqlserver /opt/mssql-tools18/bin/sqlcmd \
-U sa -P 'Str0ng!Passw0rd' -C -Q "USE store; EXEC sys.sp_cdc_help_change_data_capture;"You should see a row for dbo.airports. An empty result means CDC was not enabled on the table.
Third, confirm events reached Kafka. Read one message from the topic:
docker compose exec kafka /opt/kafka/bin/kafka-console-consumer.sh \
--bootstrap-server kafka:9092 --topic store.dbo.airports \
--from-beginning --max-messages 1You should see a JSON message describing an airport. If the topic does not exist or has no messages, the source connector is not producing events.
Fourth, confirm the data landed in ClickHouse. The FINAL keyword collapses each row to its newest version, which you should always use with a ReplacingMergeTree:
docker compose exec clickhouse clickhouse-client --password clickhouse \
--query "SELECT count() FROM store.airports FINAL"The count should match the five rows you loaded. If all four checks pass, your pipeline is working end to end.
Step 10: See changes flow
Make some changes in SQL Server:
docker compose exec -T sqlserver /opt/mssql-tools18/bin/sqlcmd -U sa -P 'Str0ng!Passw0rd' -C -Q "
USE store;
UPDATE dbo.airports SET altitude = 5300 WHERE airport_id = 1;
INSERT INTO dbo.airports (airport_id, name, city, country, iata, altitude)
VALUES (6, 'Wewak Intl', 'Wewak', 'Papua New Guinea', 'WWK', 19);
DELETE FROM dbo.airports WHERE airport_id = 2;
"Wait a couple of seconds, then query ClickHouse again, hiding deleted rows:
docker compose exec clickhouse clickhouse-client --password clickhouse \
--query "SELECT airport_id, name, altitude FROM store.airports FINAL WHERE _deleted = 0 ORDER BY airport_id"You should see airport 1 with its updated altitude, the new airport 6, and airport 2 gone.
How updates and deletes really work
When you updated airport 1, SQL Server recorded the change in its CDC change table, Debezium emitted a flattened row with the new altitude and a newer timestamp, and the sink inserted it as a new row in ClickHouse. For a moment ClickHouse held two rows with airport_id = 1. Because the table is a ReplacingMergeTree keyed on airport_id with _version as the version, a FINAL query returns only the newest. The delete works the same way: Debezium emitted a row marked deleted, that row wins, and your query filters it out. The mental model is always: never change a row in place, append a newer version, and let FINAL give you the current state.
Production considerations
This tutorial runs a single node of everything. In production you would create a dedicated database user rather than using sa, use proper TLS certificates instead of disabling encryption, run one connector task per table, keep all events for a row in the same Kafka partition, run at least three Kafka brokers with replication factor three, and enable the ClickHouse sink's exactly-once mode when correctness is critical. Keep an eye on the CDC change tables, which SQL Server cleans up on a schedule; tune the retention if your connector ever falls behind.
Troubleshooting
If the source connector fails with a message about no maximum LSN, the SQL Server Agent is not running. Confirm MSSQL_AGENT_ENABLED is true and that CDC is enabled on both the database and the table.
If the connector fails with a TLS or certificate error, the encryption settings do not match. The database.encrypt of false above avoids this for local use.
If rows never reach ClickHouse, work through the four verification checks above in order to find where the chain breaks.
If updates appear duplicated, you forgot FINAL, or your ClickHouse ORDER BY key does not match the SQL Server primary key.
Cleaning up
docker compose down -vReferences
- Debezium SQL Server connector documentation
- Debezium New Record State Extraction (event flattening) SMT
- ClickHouse Kafka Connect Sink
- OpenFlights dataset (Open Database License)
What is next
You have now built this pipeline for a database with its own CDC feature. The Kafka and ClickHouse half is the same as every other source. Compare with the PostgreSQL, MySQL, MariaDB, Oracle, and MongoDB guides.
If you would like help designing a production-grade CDC pipeline into ClickHouse, the engineers at Quantrail Data do exactly this. Reach out through our services page and we will be glad to help.



