Kafka Data Source¶
Kafka Data Source allows Spark SQL (and Spark Structured Streaming) to load data from and write data to topics in Apache Kafka.
Kafka Data Source is available as kafka format.
The entry point is KafkaSourceProvider.
Note
Apache Kafka is a storage of records in a format-independent and fault-tolerant durable way.
Learn more about Apache Kafka in the official documentation or in Mastering Apache Kafka.
Kafka Data Source supports options to fine-tune structured queries.
Reading Data from Kafka Topics¶
In order to load Kafka records use kafka as the input data source format.
val records = spark.read.format("kafka").load
Alternatively, use org.apache.spark.sql.kafka010.KafkaSourceProvider
.
val records = spark
.read
.format("org.apache.spark.sql.kafka010.KafkaSourceProvider")
.load
Writing Data to Kafka Topics¶
In order to save a DataFrame
to Kafka topics use kafka as the output data source format.
import org.apache.spark.sql.DataFrame
val records: DataFrame = ...
records.format("kafka").save