KafkaSource¶
KafkaSource
is a streaming source that loads data from Apache Kafka.
Creating Instance¶
KafkaSource
takes the following to be created:
-
SQLContext
- KafkaOffsetReader
- Executor Parameters for Kafka
- Source Options
- Metadata Log Directory
- KafkaOffsetRangeLimit for Starting Offsets
-
failOnDataLoss
flag
KafkaSource
is created when:
KafkaSourceProvider
is requested to create a streaming source
Metadata Log Directory¶
KafkaSource
uses the metadata log directory to persist offsets. The directory is the source ID under the sources
directory in the checkpointRoot (of the StreamExecution).
Note
The checkpointRoot directory is one of the following:
checkpointLocation
option- spark.sql.streaming.checkpointLocation configuration property
Logging¶
Enable ALL
logging level for org.apache.spark.sql.kafka010.KafkaSource
logger to see what happens inside.
Add the following line to conf/log4j.properties
:
log4j.logger.org.apache.spark.sql.kafka010.KafkaSource=ALL
Refer to Logging.