CollectLimitExec Physical Operator¶
CollectLimitExec
is a unary physical operator that represents GlobalLimit unary logical operator at execution time.
Creating Instance¶
CollectLimitExec
takes the following to be created:
- Number of rows (to collect from the child operator)
- Physical operator
CollectLimitExec
is created when SpecialLimits execution planning strategy is executed (and plans a GlobalLimit unary logical operator).
Executing Physical Operator¶
doExecute(): RDD[InternalRow]
doExecute
requests the child operator to execute and (maps over every partition to) takes the given number of rows from every partition. That gives a RDD[InternalRow]
.
doExecute
prepares a ShuffleDependency (for the RDD[InternalRow]
and SinglePartition
partitioning) and creates a ShuffledRowRDD.
In the end, doExecute
(maps over every partition to) takes the given number of rows from the single partition.
doExecute
is part of the SparkPlan abstraction.