Skip to content

SortBasedAggregationIterator

SortBasedAggregationIterator is an AggregationIterator that is used by SortAggregateExec physical operator to process rows in a partition.

Creating Instance

SortBasedAggregationIterator takes the following to be created:

SortBasedAggregationIterator initializes immediately.

SortBasedAggregationIterator is created when:

  • SortAggregateExec physical operator is requested to doExecute

Initialization

initialize(): Unit
Procedure

initialize is a procedure (returns Unit) so what happens inside stays inside (paraphrasing the former advertising slogan of Las Vegas, Nevada).

initialize...FIXME

Performance Metrics

SortBasedAggregationIterator is given the performance metrics of the owning SortAggregateExec aggregate physical operator when created.

The metrics are displayed as part of SortAggregateExec aggregate physical operator (e.g. in web UI in Details for Query).

SortAggregateExec in web UI (Details for Query)

number of output rows

SortBasedAggregationIterator is given number of output rows performance metric when created.

The metric is number of output rows metric of the owning SortAggregateExec physical operator.

The metric is incremented every time SortBasedAggregationIterator is requested for next element (and there's one).

Aggregation Buffer

sortBasedAggregationBuffer: InternalRow

SortBasedAggregationIterator creates a new buffer for aggregation (called sortBasedAggregationBuffer) when created.

sortBasedAggregationBuffer is an InternalRow used when SortBasedAggregationIterator is requested for the following:

Creating New Buffer

newBuffer: InternalRow

newBuffer creates a new aggregation buffer (an InternalRow) and initializes buffer values for all imperative aggregate functions (using their aggBufferAttributes)

Checking for Next Row Available

Iterator
hasNext: Boolean

hasNext is part of the Iterator (Scala) abstraction.

hasNext is the sortedInputHasNewGroup.

sortedInputHasNewGroup

var sortedInputHasNewGroup: Boolean = false

SortBasedAggregationIterator defines sortedInputHasNewGroup flag for hasNext.

sortedInputHasNewGroup indicates that there are no input rows or the current group is the last in the inputIterator.

sortedInputHasNewGroup flag is enabled (true) when the inputIterator has rows (is not empty) when initialize.

sortedInputHasNewGroup flag is disabled (false) when SortBasedAggregationIterator is requested for the following:

Next Row

Iterator
next(): UnsafeRow

next is part of the Iterator (Scala) abstraction.

next if there is an input row available or throws an NoSuchElementException.


If there is an input row available, next does the following:

  1. Processes the current (sorted) group
  2. Generates output aggregate result for the current group using the Generate Output Function for the currentGroupingKey and the aggregation buffer
  3. Initializes the buffer for values of the next group (with the aggregation buffer)
  4. Increments the number of output rows metric

In the end, next returns the generated output row.

Processing Current Sorted Group

processCurrentSortedGroup(): Unit

processCurrentSortedGroup...FIXME

Current Grouping Key

var currentGroupingKey: UnsafeRow

currentGroupingKey is an UnsafeRow that is the value of the grouping key of the aggregation being processed.

In other words, SortBasedAggregationIterator uses the currentGroupingKey to process the current sorted group fully (using the Process Row Function) until the groupingProjection generates a next grouping key for an input row.

currentGroupingKey is initialized as the next grouping key at the beginning of processing the current sorted group (and remains so until there are more rows for the current aggregation group).

Next Grouping Key

var nextGroupingKey: UnsafeRow

nextGroupingKey is an UnsafeRow that is the value of the next grouping key (based on the groupingProjection).

nextGroupingKey is the result of executing the groupingProjection on the current row, if available, that happens when SortBasedAggregationIterator is requested for the following:

As long as nextGroupingKey is within the same group (based on the groupingProjection) while processing current sorted group, SortBasedAggregationIterator keeps processesing input rows (using the Process Row Function) that are assumed to be within the same aggregate group.

nextGroupingKey becomes the Current Grouping Key at the beginning of processing the current sorted group.