KafkaScan¶
The Internals of Spark SQL
Head over to The Internals of Spark SQL to learn more.
KafkaScan is a logical Scan (Spark SQL).
Creating Instance¶
KafkaScan takes the following to be created:
- Options
KafkaScan is created when:
MicroBatchStream¶
toMicroBatchStream(
checkpointLocation: String): MicroBatchStream
toMicroBatchStream is part of the Scan (Spark SQL) abstraction.
toMicroBatchStream validates the streaming part of the options.
toMicroBatchStream generate a unique group ID.
toMicroBatchStream removes kafka prefix from the keys in the options.
In the end, toMicroBatchStream creates a KafkaMicroBatchStream with the following:
- A new KafkaOffsetReader with the strategy (from the options) and the kafkaParamsForDriver
- kafkaParamsForExecutors with the options with the
kafka.prefix removed and an unique group ID - The given
checkpointLocation - Starting offsets using the options (with startingtimestamp, startingoffsetsbytimestamp and startingOffsets); LatestOffsetRangeLimit is the default
- failOnDataLoss option
supportedCustomMetrics¶
supportedCustomMetrics(): Array[CustomMetric]
supportedCustomMetrics is part of the Scan (Spark SQL) abstraction.
supportedCustomMetrics is the following metrics:
dataLossoffsetOutOfRange