ContinuousDataSourceRDD¶
ContinuousDataSourceRDD
is a specialized RDD
(RDD[InternalRow]
) that is used exclusively for the only input RDD (with the input rows) of DataSourceV2ScanExec
leaf physical operator with a ContinuousReader.
ContinuousDataSourceRDD
is <DataSourceV2ScanExec
leaf physical operator is requested for the input RDDs (which there is only one actually).
[[spark.sql.streaming.continuous.executorQueueSize]] ContinuousDataSourceRDD
uses spark.sql.streaming.continuous.executorQueueSize configuration property for the <
[[spark.sql.streaming.continuous.executorPollIntervalMs]] ContinuousDataSourceRDD
uses spark.sql.streaming.continuous.executorPollIntervalMs configuration property for the <
[[creating-instance]] ContinuousDataSourceRDD
takes the following to be created:
- [[sc]]
SparkContext
- [[dataQueueSize]] Size of the data queue
- [[epochPollIntervalMs]]
epochPollIntervalMs
- [[readerInputPartitions]]
InputPartition[InternalRow]
s
[[getPreferredLocations]] ContinuousDataSourceRDD
uses InputPartition
(of a ContinuousDataSourceRDDPartition
) for preferred host locations (where the input partition reader can run faster).
=== [[compute]] Computing Partition -- compute
Method
[source, scala]¶
compute( split: Partition, context: TaskContext): Iterator[InternalRow]
NOTE: compute
is part of the RDD Contract to compute a given partition.
compute
...FIXME
=== [[getPartitions]] getPartitions
Method
[source, scala]¶
getPartitions: Array[Partition]¶
NOTE: getPartitions
is part of the RDD
Contract to specify the partitions to <
getPartitions
...FIXME