CollectLimitExec Physical Operator¶
CollectLimitExec is a unary physical operator that represents GlobalLimit unary logical operator at execution time.
Creating Instance¶
CollectLimitExec takes the following to be created:
- Number of rows (to collect from the child operator)
- Physical operator
CollectLimitExec is created when SpecialLimits execution planning strategy is executed (and plans a GlobalLimit unary logical operator).
Executing Physical Operator¶
doExecute(): RDD[InternalRow]
doExecute requests the child operator to execute and (maps over every partition to) takes the given number of rows from every partition. That gives a RDD[InternalRow].
doExecute prepares a ShuffleDependency (for the RDD[InternalRow] and SinglePartition partitioning) and creates a ShuffledRowRDD.
In the end, doExecute (maps over every partition to) takes the given number of rows from the single partition.
doExecute is part of the SparkPlan abstraction.