SortBasedAggregationIterator¶
SortBasedAggregationIterator
is an AggregationIterator that is used by SortAggregateExec physical operator to process rows in a partition.
Creating Instance¶
SortBasedAggregationIterator
takes the following to be created:
- Partition ID
- Grouping NamedExpressions
- Value Attributes
- Input Iterator (InternalRows)
- AggregateExpressions
- Aggregate Attributes
- Initial input buffer offset
- Result NamedExpressions
- Function to create a new
MutableProjection
given expressions and attributes ((Seq[Expression], Seq[Attribute]) => MutableProjection
) - number of output rows metric
SortBasedAggregationIterator
initializes immediately.
SortBasedAggregationIterator
is created when:
SortAggregateExec
physical operator is requested to doExecute
Initialization¶
initialize(): Unit
Procedure
initialize
is a procedure (returns Unit
) so what happens inside stays inside (paraphrasing the former advertising slogan of Las Vegas, Nevada).
initialize
...FIXME
Performance Metrics¶
SortBasedAggregationIterator
is given the performance metrics of the owning SortAggregateExec aggregate physical operator when created.
The metrics are displayed as part of SortAggregateExec aggregate physical operator (e.g. in web UI in Details for Query).
number of output rows¶
SortBasedAggregationIterator
is given number of output rows performance metric when created.
The metric is number of output rows metric of the owning SortAggregateExec physical operator.
The metric is incremented every time SortBasedAggregationIterator
is requested for next element (and there's one).
Aggregation Buffer¶
sortBasedAggregationBuffer: InternalRow
SortBasedAggregationIterator
creates a new buffer for aggregation (called sortBasedAggregationBuffer
) when created.
sortBasedAggregationBuffer
is an InternalRow used when SortBasedAggregationIterator
is requested for the following:
- Initializing to initialize the buffer when there are input rows in the input iterator
- processCurrentSortedGroup using the Process Row Function (with the firstRowInNextGroup and then all the rows in a group per currentGroupingKey)
- next (after processCurrentSortedGroup) to generate next element (a sort aggregate result) using the Generate Output Function followed by initializing the buffer
- outputForEmptyGroupingKeyWithoutInput (to initialize the buffer followed by generating the only sort aggregate result using the Generate Output Function)
Creating New Buffer¶
newBuffer: InternalRow
newBuffer
creates a new aggregation buffer (an InternalRow) and initializes buffer values for all imperative aggregate functions (using their aggBufferAttributes)
Checking for Next Row Available¶
hasNext
is the sortedInputHasNewGroup.
sortedInputHasNewGroup¶
var sortedInputHasNewGroup: Boolean = false
SortBasedAggregationIterator
defines sortedInputHasNewGroup
flag for hasNext.
sortedInputHasNewGroup
indicates that there are no input rows or the current group is the last in the inputIterator.
sortedInputHasNewGroup
flag is enabled (true
) when the inputIterator has rows (is not empty) when initialize.
sortedInputHasNewGroup
flag is disabled (false
) when SortBasedAggregationIterator
is requested for the following:
- initialize and there are no rows in the inputIterator
- There are no more groups in the inputIterator while processCurrentSortedGroup
Next Row¶
next
if there is an input row available or throws an NoSuchElementException
.
If there is an input row available, next
does the following:
- Processes the current (sorted) group
- Generates output aggregate result for the current group using the Generate Output Function for the currentGroupingKey and the aggregation buffer
- Initializes the buffer for values of the next group (with the aggregation buffer)
- Increments the number of output rows metric
In the end, next
returns the generated output row.
Processing Current Sorted Group¶
processCurrentSortedGroup(): Unit
processCurrentSortedGroup
...FIXME
Current Grouping Key¶
var currentGroupingKey: UnsafeRow
currentGroupingKey
is an UnsafeRow that is the value of the grouping key of the aggregation being processed.
In other words, SortBasedAggregationIterator
uses the currentGroupingKey
to process the current sorted group fully (using the Process Row Function) until the groupingProjection generates a next grouping key for an input row.
currentGroupingKey
is initialized as the next grouping key at the beginning of processing the current sorted group (and remains so until there are more rows for the current aggregation group).
Next Grouping Key¶
var nextGroupingKey: UnsafeRow
nextGroupingKey
is an UnsafeRow that is the value of the next grouping key (based on the groupingProjection).
nextGroupingKey
is the result of executing the groupingProjection on the current row, if available, that happens when SortBasedAggregationIterator
is requested for the following:
As long as nextGroupingKey
is within the same group (based on the groupingProjection) while processing current sorted group, SortBasedAggregationIterator
keeps processesing input rows (using the Process Row Function) that are assumed to be within the same aggregate group.
nextGroupingKey
becomes the Current Grouping Key at the beginning of processing the current sorted group.