ContinuousDataSourceRDD¶
ContinuousDataSourceRDD is a specialized RDD (RDD[InternalRow]) that is used exclusively for the only input RDD (with the input rows) of DataSourceV2ScanExec leaf physical operator with a ContinuousReader.
ContinuousDataSourceRDD is <DataSourceV2ScanExec leaf physical operator is requested for the input RDDs (which there is only one actually).
[[spark.sql.streaming.continuous.executorQueueSize]] ContinuousDataSourceRDD uses spark.sql.streaming.continuous.executorQueueSize configuration property for the <
[[spark.sql.streaming.continuous.executorPollIntervalMs]] ContinuousDataSourceRDD uses spark.sql.streaming.continuous.executorPollIntervalMs configuration property for the <
[[creating-instance]] ContinuousDataSourceRDD takes the following to be created:
- [[sc]]
SparkContext - [[dataQueueSize]] Size of the data queue
- [[epochPollIntervalMs]]
epochPollIntervalMs - [[readerInputPartitions]]
InputPartition[InternalRow]s
[[getPreferredLocations]] ContinuousDataSourceRDD uses InputPartition (of a ContinuousDataSourceRDDPartition) for preferred host locations (where the input partition reader can run faster).
=== [[compute]] Computing Partition -- compute Method
[source, scala]¶
compute( split: Partition, context: TaskContext): Iterator[InternalRow]
NOTE: compute is part of the RDD Contract to compute a given partition.
compute...FIXME
=== [[getPartitions]] getPartitions Method
[source, scala]¶
getPartitions: Array[Partition]¶
NOTE: getPartitions is part of the RDD Contract to specify the partitions to <
getPartitions...FIXME