Aggregate Logical Operator¶

Aggregate is a unary logical operator for Aggregation Queries and can represent the following high-level operators in a logical query plan:

AstBuilder is requested to visitCommonSelectQueryClausePlan (HAVING clause without GROUP BY) and parse GROUP BY clause
KeyValueGroupedDataset is requested to agg (and aggUntyped)
RelationalGroupedDataset is requested to toDF

Internal Use

Aggregate logical operator is also used internally as part of logical and physical optimizations.

Creating Instance¶

Aggregate takes the following to be created:

Aggregate is created when:

ResolveGroupingAnalytics is requested to constructAggregate
ResolvePivot logical resolution rule is executed
GlobalAggregates logical resolution rule is executed
Catalyst DSL's groupBy operator is used
DecorrelateInnerQuery is requested to rewriteDomainJoins
InjectRuntimeFilter is requested to injectBloomFilter and injectInSubqueryFilter
ReplaceDistinctWithAggregate logical optimization is executed
ReplaceDeduplicateWithAggregate logical optimization is executed
RewriteExceptAll logical optimization is executed
RewriteIntersectAll logical optimization is executed
RewriteAsOfJoin logical optimization is executed
AstBuilder is requested to visitCommonSelectQueryClausePlan (for a global aggregate, i.e. HAVING without GROUP BY) and withAggregationClause
KeyValueGroupedDataset is requested to aggUntyped
RelationalGroupedDataset is requested to toDF
PlanAdaptiveDynamicPruningFilters physical optimization is executed
PlanDynamicPruningFilters physical optimization is executed
CommandUtils is requested to computeColumnStats and computePercentiles
RowLevelOperationRuntimeGroupFiltering logical optimization is executed
others

QueryPlan

output: Seq[Attribute]

output is part of the QueryPlan abstraction.

LogicalPlan

metadataOutput: Seq[Attribute]

metadataOutput is part of the LogicalPlan abstraction.

metadataOutput is empty (Nil).

TreeNode

nodePatterns : Seq[TreePattern]

nodePatterns is part of the TreeNode abstraction.

nodePatterns is AGGREGATE.

supportsHashAggregate(
  aggregateBufferAttributes: Seq[Attribute]): Boolean

supportsHashAggregate builds a StructType for the given aggregateBufferAttributes.

In the end, supportsHashAggregate isAggregateBufferMutable.

supportsHashAggregate is used when:

MergeScalarSubqueries is requested to supportedAggregateMerge
AggUtils is requested to create a physical operator for aggregation
HashAggregateExec physical operator is created (to assert that the aggregateBufferAttributes are supported)

isAggregateBufferMutable(
  schema: StructType): Boolean

isAggregateBufferMutable is enabled (true) when the type of all the fields (in the given schema) are mutable.

isAggregateBufferMutable is used when:

Aggregate is requested to check the requirements for HashAggregateExec
UnsafeFixedWidthAggregationMap is requested to supportsAggregationBufferSchema