EnsureRequirements Physical Optimization¶
EnsureRequirements
is a physical query optimization.
EnsureRequirements
is a Catalyst rule for transforming physical query plans (Rule[SparkPlan]
).
Creating Instance¶
EnsureRequirements
takes the following to be created:
-
optimizeOutRepartition
flag (default:true
) - Required Distribution
EnsureRequirements
is created when:
QueryExecution
is requested for the preparations rules- AdaptiveSparkPlanExec physical operator is created
Required Distribution¶
requiredDistribution: Option[Distribution] = None
EnsureRequirements
can be given a Distribution when created.
The Distribution
is undefined (None
):
- By default
- When
QueryExecution
is requested for the preparations rules
The Distribution
can only be specified for EnsureRequirements
in Adaptive Query Execution (for QueryStage Physical Preparation Rules) with distribution requirement specified.
Executing Rule¶
apply
transforms the following physical operators in the query plan (up the plan tree):
With the required distribution not specified, apply
gives the new transformed plan.
Otherwise, with the required distribution specified, apply
...FIXME
ShuffleExchangeExec¶
apply
transforms ShuffleExchangeExec with HashPartitioning only beside the following requirements:
- optimizeOutRepartition is enabled (the default)
- shuffleOrigin is either
REPARTITION_BY_COL
orREPARTITION_BY_NUM
SparkPlan¶
apply
transforms other SparkPlans to ensureDistributionAndOrdering of the children based on requiredChildDistribution and requiredChildOrdering.
While transforming the query plan, apply
may also reorderJoinPredicates of ShuffledHashJoinExec and SortMergeJoinExec physical operators, if found.
ensureDistributionAndOrdering can introduce BroadcastExchangeExecs or ShuffleExchangeExecs physical operators in the query plan.
ensureDistributionAndOrdering¶
ensureDistributionAndOrdering(
parent: Option[SparkPlan],
originalChildren: Seq[SparkPlan],
requiredChildDistributions: Seq[Distribution],
requiredChildOrderings: Seq[Seq[SortOrder]],
shuffleOrigin: ShuffleOrigin): Seq[SparkPlan]
ensureDistributionAndOrdering
is...FIXME
checkKeyGroupCompatible¶
checkKeyGroupCompatible(
left: SparkPlan,
right: SparkPlan,
joinType: JoinType,
requiredChildDistribution: Seq[Distribution]): Option[Seq[SparkPlan]]
checkKeyGroupCompatible(
parent: SparkPlan,
left: SparkPlan,
right: SparkPlan,
requiredChildDistribution: Seq[Distribution]): Option[Seq[SparkPlan]] // (1)!
- Uses
JoinType
of either SortMergeJoinExec or ShuffledHashJoinExec physical operator
Note
Only SortMergeJoinExec and ShuffledHashJoinExec physical operators are considered.
checkKeyGroupCompatible
...FIXME
OptimizeSkewedJoin¶
EnsureRequirements
is used to create a OptimizeSkewedJoin physical optimization.