KafkaScan¶
The Internals of Spark SQL
Head over to The Internals of Spark SQL to learn more.
KafkaScan
is a logical Scan
(Spark SQL).
Creating Instance¶
KafkaScan
takes the following to be created:
- Options
KafkaScan
is created when:
MicroBatchStream¶
toMicroBatchStream(
checkpointLocation: String): MicroBatchStream
toMicroBatchStream
is part of the Scan
(Spark SQL) abstraction.
toMicroBatchStream
validates the streaming part of the options.
toMicroBatchStream
generate a unique group ID.
toMicroBatchStream
removes kafka prefix from the keys in the options.
In the end, toMicroBatchStream
creates a KafkaMicroBatchStream with the following:
- A new KafkaOffsetReader with the strategy (from the options) and the kafkaParamsForDriver
- kafkaParamsForExecutors with the options with the
kafka.
prefix removed and an unique group ID - The given
checkpointLocation
- Starting offsets using the options (with startingtimestamp, startingoffsetsbytimestamp and startingOffsets); LatestOffsetRangeLimit is the default
- failOnDataLoss option
supportedCustomMetrics¶
supportedCustomMetrics(): Array[CustomMetric]
supportedCustomMetrics
is part of the Scan
(Spark SQL) abstraction.
supportedCustomMetrics
is the following metrics:
dataLoss
offsetOutOfRange