CollectMetricsExec Physical Operator¶
CollectMetricsExec
is a unary physical operator.
Creating Instance¶
CollectMetricsExec
takes the following to be created:
- Name
- Metric NamedExpressions
- Child physical operator
CollectMetricsExec
is created when BasicOperators execution planning strategy is executed (and plans a CollectMetrics logical operator).
Collected metrics Accumulator¶
CollectMetricsExec
registers an AggregatingAccumulator under the name Collected metrics.
AggregatingAccumulator
is created with the metric expressions and the output attributes of the child physical operator.
Executing Physical Operator¶
doExecute(): RDD[InternalRow]
doExecute
is part of the SparkPlan abstraction.
doExecute
resets the Collected metrics Accumulator.
doExecute
requests the child physical operator to execute and uses RDD.mapPartitions
operator for the following:
- A new per-partition AggregatingAccumulator (called
updater
) is requested to copyAndReset - The value of the accumulator is published only when a task is completed
- For every row, the per-partition
AggregatingAccumulator
is requested to add it (that updates ImperativeAggregates and TypedImperativeAggregates)