Skip to content


KafkaContinuousReader is a ContinuousReader for Kafka Data Source in Continuous Stream Processing.

KafkaContinuousReader is <> exclusively when KafkaSourceProvider is requested to create a ContinuousReader.

[[pollTimeoutMs]] [[kafkaConsumer.pollTimeoutMs]] KafkaContinuousReader uses kafkaConsumer.pollTimeoutMs configuration parameter (default: 512) for KafkaContinuousInputPartitions when requested to <>.

[[logging]] [TIP] ==== Enable INFO or WARN logging levels for org.apache.spark.sql.kafka010.KafkaContinuousReader to see what happens inside.

Add the following line to conf/

Refer to[Logging].

Creating Instance

KafkaContinuousReader takes the following to be created:

  • [[offsetReader]] KafkaOffsetReader
  • [[kafkaParams]] Kafka parameters (as java.util.Map[String, Object])
  • [[sourceOptions]] Source options (as Map[String, String])
  • [[metadataPath]] Metadata path
  • [[initialOffsets]] Initial offsets
  • [[failOnDataLoss]] failOnDataLoss flag

=== [[planInputPartitions]] Plan Input Partitions -- planInputPartitions Method

[source, scala]

planInputPartitions(): java.util.List[InputPartition[InternalRow]]

NOTE: planInputPartitions is part of the DataSourceReader contract in Spark SQL for the number of InputPartitions to use as RDD partitions (when DataSourceV2ScanExec physical operator is requested for the partitions of the input RDD).


=== [[setStartOffset]] setStartOffset Method

[source, java]

setStartOffset( start: Optional[Offset]): Unit

setStartOffset is part of the ContinuousReader abstraction.


=== [[deserializeOffset]] deserializeOffset Method

[source, java]

deserializeOffset( json: String): Offset

deserializeOffset is part of the ContinuousReader abstraction.


=== [[mergeOffsets]] mergeOffsets Method

[source, java]

mergeOffsets( offsets: Array[PartitionOffset]): Offset

mergeOffsets is part of the ContinuousReader abstraction.