Skip to content

EnsureRequirements Physical Optimization

EnsureRequirements is a physical query optimization.

EnsureRequirements is a Catalyst rule for transforming physical query plans (Rule[SparkPlan]).

Creating Instance

EnsureRequirements takes the following to be created:

EnsureRequirements is created when:

Required Distribution

requiredDistribution: Option[Distribution] = None

EnsureRequirements can be given a Distribution when created.

The Distribution is undefined (None):

The Distribution can only be specified for EnsureRequirements in Adaptive Query Execution (for QueryStage Physical Preparation Rules) with distribution requirement specified.

Executing Rule

Rule
apply(
  plan: SparkPlan): SparkPlan

apply is part of the Rule abstraction.

apply transforms the following physical operators in the query plan (up the plan tree):

With the required distribution not specified, apply gives the new transformed plan.

Otherwise, with the required distribution specified, apply...FIXME

ShuffleExchangeExec

apply transforms ShuffleExchangeExec with HashPartitioning only beside the following requirements:

SparkPlan

apply transforms other SparkPlans to ensureDistributionAndOrdering of the children based on requiredChildDistribution and requiredChildOrdering.

While transforming the query plan, apply may also reorderJoinPredicates of ShuffledHashJoinExec and SortMergeJoinExec physical operators, if found.

ensureDistributionAndOrdering can introduce BroadcastExchangeExecs or ShuffleExchangeExecs physical operators in the query plan.

ensureDistributionAndOrdering

ensureDistributionAndOrdering(
  parent: Option[SparkPlan],
  originalChildren: Seq[SparkPlan],
  requiredChildDistributions: Seq[Distribution],
  requiredChildOrderings: Seq[Seq[SortOrder]],
  shuffleOrigin: ShuffleOrigin): Seq[SparkPlan]

ensureDistributionAndOrdering is...FIXME

checkKeyGroupCompatible

checkKeyGroupCompatible(
  left: SparkPlan,
  right: SparkPlan,
  joinType: JoinType,
  requiredChildDistribution: Seq[Distribution]): Option[Seq[SparkPlan]]
checkKeyGroupCompatible(
  parent: SparkPlan,
  left: SparkPlan,
  right: SparkPlan,
  requiredChildDistribution: Seq[Distribution]): Option[Seq[SparkPlan]] // (1)!
  1. Uses JoinType of either SortMergeJoinExec or ShuffledHashJoinExec physical operator

Note

Only SortMergeJoinExec and ShuffledHashJoinExec physical operators are considered.

checkKeyGroupCompatible...FIXME

OptimizeSkewedJoin

EnsureRequirements is used to create a OptimizeSkewedJoin physical optimization.