Skip to content

ReplaceHashWithSortAgg Physical Optimization

ReplaceHashWithSortAgg is a physical optimization (Rule[SparkPlan]) to replace Hash Aggregate operators with grouping keys with SortAggregateExec operators.

ReplaceHashWithSortAgg can be enabled using spark.sql.execution.replaceHashWithSortAgg configuration property.

ReplaceHashWithSortAgg is part of the following optimizations:

Executing Rule

Signature
apply(
  plan: SparkPlan): SparkPlan

apply is part of the Rule abstraction.

Noop when spark.sql.execution.replaceHashWithSortAgg disabled

apply does nothing when spark.sql.execution.replaceHashWithSortAgg is disabled.

apply replaceHashAgg.

replaceHashAgg

replaceHashAgg(
  plan: SparkPlan): SparkPlan

replaceHashAgg finds BaseAggregateExec physical operators that are Hash Aggregate operators with grouping keys and converts them to SortAggregateExec when either is met:

  1. The child operator is again a Hash Aggregate operator with grouping keys with isPartialAgg and ordering is satisfied
  2. Ordering is satisfied

isHashBasedAggWithKeys

isHashBasedAggWithKeys(
  agg: BaseAggregateExec): Boolean

isHashBasedAggWithKeys is positive (true) when the given BaseAggregateExec is as follows:

  1. It is either HashAggregateExec or ObjectHashAggregateExec
  2. It has got grouping keys

isPartialAgg

isPartialAgg(
  partialAgg: BaseAggregateExec,
  finalAgg: BaseAggregateExec): Boolean

isPartialAgg is positive (true) when...FIXME