ReplaceHashWithSortAgg Physical Optimization¶
ReplaceHashWithSortAgg is a physical optimization (Rule[SparkPlan]) to replace Hash Aggregate operators (with grouping keys) with corresponding SortAggregateExec operators when the child satisfies the sort order of the corresponding SortAggregateExec operator.
ReplaceHashWithSortAgg can be enabled using spark.sql.execution.replaceHashWithSortAgg configuration property.
ReplaceHashWithSortAgg is part of the following optimizations:
Executing Rule¶
Noop when spark.sql.execution.replaceHashWithSortAgg disabled
apply does nothing when spark.sql.execution.replaceHashWithSortAgg is disabled.
apply replaceHashAgg.
replaceHashAgg¶
replaceHashAgg(
plan: SparkPlan): SparkPlan
replaceHashAgg finds BaseAggregateExec physical operators that are Hash Aggregate operators with grouping keys and converts them to SortAggregateExec when either is met:
- The child operator is another Hash Aggregate operator with grouping keys with isPartialAgg and ordering is satisfied
- Ordering is satisfied
isHashBasedAggWithKeys¶
isHashBasedAggWithKeys(
agg: BaseAggregateExec): Boolean
isHashBasedAggWithKeys is positive (true) when the given BaseAggregateExec is as follows:
- It is either HashAggregateExec or ObjectHashAggregateExec
- It has got grouping keys
isPartialAgg¶
isPartialAgg(
partialAgg: BaseAggregateExec,
finalAgg: BaseAggregateExec): Boolean
isPartialAgg is positive (true) when...FIXME