EnsureRequirements Physical Optimization¶
EnsureRequirements is a physical query optimization.
EnsureRequirements is a Catalyst rule for transforming physical query plans (Rule[SparkPlan]).
Creating Instance¶
EnsureRequirements takes the following to be created:
-
optimizeOutRepartitionflag (default:true) - Required Distribution
EnsureRequirements is created when:
QueryExecutionis requested for the preparations rules- AdaptiveSparkPlanExec physical operator is created
Required Distribution¶
requiredDistribution: Option[Distribution] = None
EnsureRequirements can be given a Distribution when created.
The Distribution is undefined (None):
- By default
- When
QueryExecutionis requested for the preparations rules
The Distribution can only be specified for EnsureRequirements in Adaptive Query Execution (for QueryStage Physical Preparation Rules) with distribution requirement specified.
Executing Rule¶
apply transforms the following physical operators in the query plan (up the plan tree):
With the required distribution not specified, apply gives the new transformed plan.
Otherwise, with the required distribution specified, apply...FIXME
ShuffleExchangeExec¶
apply transforms ShuffleExchangeExec with HashPartitioning only beside the following requirements:
- optimizeOutRepartition is enabled (the default)
- shuffleOrigin is either
REPARTITION_BY_COLorREPARTITION_BY_NUM
SparkPlan¶
apply transforms other SparkPlans to ensureDistributionAndOrdering of the children based on requiredChildDistribution and requiredChildOrdering.
While transforming the query plan, apply may also reorderJoinPredicates of ShuffledHashJoinExec and SortMergeJoinExec physical operators, if found.
ensureDistributionAndOrdering can introduce BroadcastExchangeExecs or ShuffleExchangeExecs physical operators in the query plan.
ensureDistributionAndOrdering¶
ensureDistributionAndOrdering(
parent: Option[SparkPlan],
originalChildren: Seq[SparkPlan],
requiredChildDistributions: Seq[Distribution],
requiredChildOrderings: Seq[Seq[SortOrder]],
shuffleOrigin: ShuffleOrigin): Seq[SparkPlan]
ensureDistributionAndOrdering is...FIXME
checkKeyGroupCompatible¶
checkKeyGroupCompatible(
left: SparkPlan,
right: SparkPlan,
joinType: JoinType,
requiredChildDistribution: Seq[Distribution]): Option[Seq[SparkPlan]]
checkKeyGroupCompatible(
parent: SparkPlan,
left: SparkPlan,
right: SparkPlan,
requiredChildDistribution: Seq[Distribution]): Option[Seq[SparkPlan]] // (1)!
- Uses
JoinTypeof either SortMergeJoinExec or ShuffledHashJoinExec physical operator
Note
Only SortMergeJoinExec and ShuffledHashJoinExec physical operators are considered.
checkKeyGroupCompatible...FIXME
OptimizeSkewedJoin¶
EnsureRequirements is used to create a OptimizeSkewedJoin physical optimization.