CollectMetricsExec Physical Operator¶
CollectMetricsExec is a unary physical operator.
Creating Instance¶
CollectMetricsExec takes the following to be created:
- Name
- Metric NamedExpressions
- Child physical operator
CollectMetricsExec is created when BasicOperators execution planning strategy is executed (and plans a CollectMetrics logical operator).
Collected metrics Accumulator¶
CollectMetricsExec registers an AggregatingAccumulator under the name Collected metrics.
AggregatingAccumulator is created with the metric expressions and the output attributes of the child physical operator.
Executing Physical Operator¶
doExecute(): RDD[InternalRow]
doExecute is part of the SparkPlan abstraction.
doExecute resets the Collected metrics Accumulator.
doExecute requests the child physical operator to execute and uses RDD.mapPartitions operator for the following:
- A new per-partition AggregatingAccumulator (called
updater) is requested to copyAndReset - The value of the accumulator is published only when a task is completed
- For every row, the per-partition
AggregatingAccumulatoris requested to add it (that updates ImperativeAggregates and TypedImperativeAggregates)