KafkaSource is a streaming source that loads data from Apache Kafka.
KafkaSource takes the following to be created:
- Executor Parameters for Kafka
- Source Options
- Metadata Log Directory
- KafkaOffsetRangeLimit for Starting Offsets
KafkaSource is created when:
KafkaSourceProvideris requested to create a streaming source
Metadata Log Directory¶
KafkaSource uses the metadata log directory to persist offsets. The directory is the source ID under the
sources directory in the checkpointRoot (of the StreamExecution).
The checkpointRoot directory is one of the following:
- spark.sql.streaming.checkpointLocation configuration property
ALL logging level for
org.apache.spark.sql.kafka010.KafkaSource logger to see what happens inside.
Add the following line to
Refer to Logging.