ReplaceHashWithSortAgg Physical Optimization¶
ReplaceHashWithSortAgg
is a physical optimization (Rule[SparkPlan]
) to replace Hash Aggregate operators (with grouping keys) with corresponding SortAggregateExec operators when the child satisfies the sort order of the corresponding SortAggregateExec
operator.
ReplaceHashWithSortAgg
can be enabled using spark.sql.execution.replaceHashWithSortAgg configuration property.
ReplaceHashWithSortAgg
is part of the following optimizations:
Executing Rule¶
Noop when spark.sql.execution.replaceHashWithSortAgg disabled
apply
does nothing when spark.sql.execution.replaceHashWithSortAgg is disabled.
apply
replaceHashAgg.
replaceHashAgg¶
replaceHashAgg(
plan: SparkPlan): SparkPlan
replaceHashAgg
finds BaseAggregateExec physical operators that are Hash Aggregate operators with grouping keys and converts them to SortAggregateExec when either is met:
- The child operator is another Hash Aggregate operator with grouping keys with isPartialAgg and ordering is satisfied
- Ordering is satisfied
isHashBasedAggWithKeys¶
isHashBasedAggWithKeys(
agg: BaseAggregateExec): Boolean
isHashBasedAggWithKeys
is positive (true
) when the given BaseAggregateExec is as follows:
- It is either HashAggregateExec or ObjectHashAggregateExec
- It has got grouping keys
isPartialAgg¶
isPartialAgg(
partialAgg: BaseAggregateExec,
finalAgg: BaseAggregateExec): Boolean
isPartialAgg
is positive (true
) when...FIXME