KafkaSource¶
KafkaSource is a streaming source that loads data from Apache Kafka.
Creating Instance¶
KafkaSource takes the following to be created:
-
SQLContext - KafkaOffsetReader
- Executor Parameters for Kafka
- Source Options
- Metadata Log Directory
- KafkaOffsetRangeLimit for Starting Offsets
-
failOnDataLossflag
KafkaSource is created when:
KafkaSourceProvideris requested to create a streaming source
Metadata Log Directory¶
KafkaSource uses the metadata log directory to persist offsets. The directory is the source ID under the sources directory in the checkpointRoot (of the StreamExecution).
Note
The checkpointRoot directory is one of the following:
checkpointLocationoption- spark.sql.streaming.checkpointLocation configuration property
Logging¶
Enable ALL logging level for org.apache.spark.sql.kafka010.KafkaSource logger to see what happens inside.
Add the following line to conf/log4j.properties:
log4j.logger.org.apache.spark.sql.kafka010.KafkaSource=ALL
Refer to Logging.