Skip to content

ExpandExec Physical Operator

ExpandExec is a unary physical operator with support for Whole-Stage Java Code Generation.

Creating Instance

ExpandExec takes the following to be created:

ExpandExec is created when:

Performance Metrics

number of output rows

Generating Java Source Code for Consume Path

CodegenSupport
doConsume(
  ctx: CodegenContext,
  input: Seq[ExprCode],
  row: ExprCode): String

doConsume is part of the CodegenSupport abstraction.

doConsume...FIXME

Generating Java Source Code for Produce Path

CodegenSupport
doProduce(
  ctx: CodegenContext): String

doProduce is part of the CodegenSupport abstraction.

doProduce requests the child operator (that is supposed to be a CodegenSupport physical operator) to generate a Java source code for produce code path.

Executing Operator

SparkPlan
doExecute(): RDD[InternalRow]

doExecute is part of the SparkPlan abstraction.

doExecute requests the child operator to execute (that creates a RDD[InternalRow]).

doExecute uses RDD.mapPartitions operator to apply a function to each partition of the RDD[InternalRow].

doExecute...FIXME

needCopyResult

CodegenSupport
needCopyResult: Boolean

needCopyResult is part of the CodegenSupport abstraction.

needCopyResult is always enabled (true).

canPassThrough

ExpandExec canPassThrough in RemoveRedundantProjects physical optimization.

Demo

FIXME

  1. Create an plan with ExpandExec
  2. Access the operator
  3. Request it to produce the consume path code