SparkPlanner — Spark Query Planner¶
SparkPlanner
is a concrete Catalyst Query Planner that converts a logical plan to one or more physical plans using execution planning strategies with support for extension points:
- extra strategies (by means of ExperimentalMethods)
- extraPlanningStrategies
SparkPlanner
is expected to plan (create) at least one physical plan for a given logical plan.
SparkPlanner
extends SparkStrategies abstract class.
Execution Planning Strategies¶
- extraStrategies of the ExperimentalMethods
- extraPlanningStrategies
- LogicalQueryStageStrategy
- PythonEvals
- DataSourceV2Strategy
- FileSourceStrategy
- DataSourceStrategy
- SpecialLimits
- Aggregation
- Window
- JoinSelection
- InMemoryScans
- SparkScripts
- WithCTEStrategy
- BasicOperators
Creating Instance¶
SparkPlanner
takes the following to be created:
SparkPlanner
is created when the following are requested for one:
- BaseSessionStateBuilder
- HiveSessionStateBuilder
- Structured Streaming's
IncrementalExecution
Accessing SparkPlanner¶
SparkPlanner
is available as planner of a SessionState
.
val spark: SparkSession = ...
scala> :type spark.sessionState.planner
org.apache.spark.sql.execution.SparkPlanner
Extra Planning Strategies¶
extraPlanningStrategies: Seq[Strategy] = Nil
extraPlanningStrategies
is an extension point to register extra planning strategies.
extraPlanningStrategies
is used when SparkPlanner
is requested for planning strategies.
extraPlanningStrategies
is overriden in the SessionState
builders (BaseSessionStateBuilder and HiveSessionStateBuilder).
Collecting PlanLater Physical Operators¶
collectPlaceholders(
plan: SparkPlan): Seq[(SparkPlan, LogicalPlan)]
collectPlaceholders
collects all PlanLater physical operators in the given physical plan.
collectPlaceholders
is part of QueryPlanner abstraction.
Filter and Project Pruning¶
pruneFilterProject(
projectList: Seq[NamedExpression],
filterPredicates: Seq[Expression],
prunePushedDownFilters: Seq[Expression] => Seq[Expression],
scanBuilder: Seq[Attribute] => SparkPlan): SparkPlan
Note
pruneFilterProject
is almost like DataSourceStrategy.pruneFilterProjectRaw.
pruneFilterProject
branches off per whether it is possible to use a column pruning only (to get the right projection) and the input projectList
columns of this projection are enough to evaluate all input filterPredicates
filter conditions.
If so, pruneFilterProject
does the following:
-
Applies the input
scanBuilder
function to the inputprojectList
columns that creates a new physical operator -
If there are Catalyst predicate expressions in the input
prunePushedDownFilters
that cannot be pushed down,pruneFilterProject
creates a FilterExec unary physical operator (with the unhandled predicate expressions) -
Otherwise,
pruneFilterProject
simply returns the physical operator
In this case no extra ProjectExec unary physical operator is created.
If not (i.e. it is neither possible to use a column pruning only nor evaluate filter conditions), pruneFilterProject
does the following:
-
Applies the input
scanBuilder
function to the projection and filtering columns that creates a new physical operator -
Creates a FilterExec unary physical operator (with the unhandled predicate expressions if available)
-
Creates a ProjectExec unary physical operator with the optional
FilterExec
operator (with the scan physical operator) or simply the scan physical operator alone
pruneFilterProject
is used when HiveTableScans and InMemoryScans execution planning strategies are executed.