Skip to content

KafkaScan

The Internals of Spark SQL

Head over to The Internals of Spark SQL to learn more.

KafkaScan is a logical Scan (Spark SQL).

Creating Instance

KafkaScan takes the following to be created:

  • Options

KafkaScan is created when:

MicroBatchStream

toMicroBatchStream(
  checkpointLocation: String): MicroBatchStream

toMicroBatchStream is part of the Scan (Spark SQL) abstraction.


toMicroBatchStream validates the streaming part of the options.

toMicroBatchStream generate a unique group ID.

toMicroBatchStream removes kafka prefix from the keys in the options.

In the end, toMicroBatchStream creates a KafkaMicroBatchStream with the following:

supportedCustomMetrics

supportedCustomMetrics(): Array[CustomMetric]

supportedCustomMetrics is part of the Scan (Spark SQL) abstraction.


supportedCustomMetrics is the following metrics:

  • dataLoss
  • offsetOutOfRange