Skip to main content

Kafka for Data Migration

· 3 min read
Amit Joshi
Engineer

Data Connections

If you search internetfv for data migration tools, Kafka will not feature as top data migration tool. This article we will look how Kafka is equally capable and more versatile tool for all types of data migration. Lets consider some of key migration requirements

  • Ability to read and write to different types of data sources
  • Transform data, Source and target datasource will have different data structure
  • Process millions of transaction within SLA
  • Process historic data and realtime data
  • Enterprise features like data governance, reliability and security to name a few

KAfka Ability to read and write to different types of data sources

Ability to read and write to different types of data sources

Kafka connect is low code mechanism to connect to 150+ data sources. Many connectors are available in open-source ecosystem. Commercial connectors are available with vendors like confluent and others. Cloud providers services have option to connect to Kafka for example, AWS Eventbrdige pipes has Kafka broker endpoint. Open source ESB like apache camel has connect framework where they make all camel components run in Kafka connect cluster. Apache Kafka Connectors I will do a detail of Kafka connect future blog post

Data Transformations

Transform data, Source and target datasource will have different data structure . Kafka can transform data in couple of ways. If its a single message transform Kakfa connect has good data transformation layer. You can even write a Java/Scala custom transformation in Kakfa Connect transformations Kafka Streams is option when more joining to multiple data streams (sources) or data aggregation is required. Either Kafka Streams or KSQL can be used to transform data. Kafka Streams Transformations Most of time source and target data sources of different types for example source can be RDBMS and destination can be Elastic or Mongodb. Transformation help to map data to target requirements

Scalable

  • Process millions of transaction within SLA Kafka has such a robust data capacity even with small Kafka cluster/capacity it can easily process millions of records in a minute. With medium cluster 5 brokers can process million tps. With data safely saved in topic consumer can process data at its own pace without interfering with the producer
  • Process historic data and realtime data. Data migration will need some historic point of time data and from that day onward will require near real-time data. A single Kafka data pipeline can work for both cases by simply changing one line of source connect configuration. Database connect comes with 2 mode one “bulk” mode which is used for one time data transfer and “timestamp or incremental” mode when used for realtime data transfers. The database CDC connectors also have similar feature.
  • Enterprise features like data governance, reliability and security. Most of security, reliability and data governance is possible with core Kafka platform. Additional required features can be added as external tools. For example, calendar based scheduling is not possible in Kafka connect. We can use external scheduler to start and stop kafka connect in schedule window. Kafka open architecture help makes things more flexible