Skip to content

ReplaceHashWithSortAgg Physical Optimization

ReplaceHashWithSortAgg is a physical optimization (Rule[SparkPlan]) to replace Hash Aggregate operators (with grouping keys) with corresponding SortAggregateExec operators when the child satisfies the sort order of the corresponding SortAggregateExec operator.

ReplaceHashWithSortAgg can be enabled using spark.sql.execution.replaceHashWithSortAgg configuration property.

ReplaceHashWithSortAgg is part of the following optimizations:

Executing Rule

Rule
apply(
  plan: SparkPlan): SparkPlan

apply is part of the Rule abstraction.

Noop when spark.sql.execution.replaceHashWithSortAgg disabled

apply does nothing when spark.sql.execution.replaceHashWithSortAgg is disabled.

apply replaceHashAgg.

replaceHashAgg

replaceHashAgg(
  plan: SparkPlan): SparkPlan

replaceHashAgg finds BaseAggregateExec physical operators that are Hash Aggregate operators with grouping keys and converts them to SortAggregateExec when either is met:

  1. The child operator is another Hash Aggregate operator with grouping keys with isPartialAgg and ordering is satisfied
  2. Ordering is satisfied

isHashBasedAggWithKeys

isHashBasedAggWithKeys(
  agg: BaseAggregateExec): Boolean

isHashBasedAggWithKeys is positive (true) when the given BaseAggregateExec is as follows:

  1. It is either HashAggregateExec or ObjectHashAggregateExec
  2. It has got grouping keys

isPartialAgg

isPartialAgg(
  partialAgg: BaseAggregateExec,
  finalAgg: BaseAggregateExec): Boolean

isPartialAgg is positive (true) when...FIXME