Apache Kafka

Apache Kafka is a distributed event streaming platform capable of handling trillions of events per day, designed for high-throughput, fault-tolerant real-time data pipelines and streaming applications.

Supported Versions and Architectures

Versions: Kafka 2.0 ~ 2.5 (built on Scala 2.12)
Architectures: Single-node or cluster

Supported Data Types

Category	Data Types
Boolean	BOOLEAN
Integer	SHORT, INTEGER, LONG
Floating Point	FLOAT, DOUBLE
Numeric	NUMBER
String	CHAR, VARCHAR, STRING, TEXT
Binary	BINARY
Composite	ARRAY, MAP, OBJECT
Date/Time	TIME, DATE, DATETIME, TIMESTAMP
UUID	UUID

Data Structure Modes

Kafka supports two data structure modes to meet different business requirements:

Standard Structure (Default)
Original Structure

Purpose: Handles complete DML operations (INSERT, UPDATE, DELETE) with standardized event format for CDC scenarios.

Use Case: CDC log queues where relational database changes are streamed through Kafka to downstream systems.

Data Format:

{
    "ts": 1727097087513,
    "op": "DML:UPDATE",
    "opTs": 1727097087512,
    "table": "table_name",
    "before": {},
    "after": {}
}

Purpose: Native Kafka data handling with full control over message structure, supporting append-only operations.

Use Case: Unstructured data transformation and homogeneous data migration scenarios.

Data Format:

{
    "partition": 3,
    "timestamp": 1638349200000,
    "headers": {"headerKey1": "headerValue1"},
    "key": "user123",
    "value": {
        "id": 1,
        "name": "John Doe",
        "action": "login"
    }
}

Sync Modes

Full Only: Reads from the beginning and stops at the current position
Full + Incremental: Reads all historical data then continues with real-time streaming
Incremental Only: Starts from current position or specified timestamp

Limitations

Authentication: Currently supports only authentication-free Kafka instances
Data Types: Source data types must be compatible with target system requirements
Delivery Semantics: At-least-once delivery may cause duplicates; ensure target-side idempotency
Consumer Groups: Each consumption thread uses different consumer group IDs

Apache Kafka

Supported Versions and Architectures​

Supported Data Types​

Data Structure Modes​

Sync Modes​

Limitations​

Supported Versions and Architectures

Supported Data Types

Data Structure Modes

Sync Modes

Limitations