ReplaceHashWithSortAgg Physical Optimization¶
ReplaceHashWithSortAgg
is a physical optimization (Rule[SparkPlan]
) to replace Hash Aggregate operators with grouping keys with SortAggregateExec operators.
ReplaceHashWithSortAgg
can be enabled using spark.sql.execution.replaceHashWithSortAgg configuration property.
ReplaceHashWithSortAgg
is part of the following optimizations:
Executing Rule¶
Noop when spark.sql.execution.replaceHashWithSortAgg disabled
apply
does nothing when spark.sql.execution.replaceHashWithSortAgg is disabled.
apply
replaceHashAgg.
replaceHashAgg¶
replaceHashAgg(
plan: SparkPlan): SparkPlan
replaceHashAgg
finds BaseAggregateExec physical operators that are Hash Aggregate operators with grouping keys and converts them to SortAggregateExec when either is met:
- The child operator is again a Hash Aggregate operator with grouping keys with isPartialAgg and ordering is satisfied
- Ordering is satisfied
isHashBasedAggWithKeys¶
isHashBasedAggWithKeys(
agg: BaseAggregateExec): Boolean
isHashBasedAggWithKeys
is positive (true
) when the given BaseAggregateExec is as follows:
- It is either HashAggregateExec or ObjectHashAggregateExec
- It has got grouping keys
isPartialAgg¶
isPartialAgg(
partialAgg: BaseAggregateExec,
finalAgg: BaseAggregateExec): Boolean
isPartialAgg
is positive (true
) when...FIXME