KafkaSource is a streaming source that loads data from Apache Kafka.

Creating Instance

KafkaSource takes the following to be created:

KafkaSource is created when:

Metadata Log Directory

KafkaSource uses the metadata log directory to persist offsets. The directory is the source ID under the sources directory in the checkpointRoot (of the StreamExecution).


The checkpointRoot directory is one of the following:


Enable ALL logging level for org.apache.spark.sql.kafka010.KafkaSource logger to see what happens inside.

conf/

Refer to Logging.