Skip to content

AggregationIterators

AggregationIterator is an abstraction of aggregation iterators of UnsafeRows.

abstract class AggregationIterator(...)
extends Iterator[UnsafeRow]

From scala.collection.Iterator:

Iterators are data structures that allow to iterate over a sequence of elements. They have a hasNext method for checking if there is a next element available, and a next method which returns the next element and discards it from the iterator.

Implementations

Creating Instance

AggregationIterator takes the following to be created:

Abstract Class

AggregationIterator is an abstract class and cannot be created directly. It is created indirectly for the concrete AggregationIterators.

AggregateModes

When created, AggregationIterator makes sure that there are at most 2 distinct AggregateModes of the AggregateExpressions.

The AggregateModes have to be a subset of the following mode pairs:

  • Partial and PartialMerge
  • Final and Complete

AggregateFunctions

aggregateFunctions: Array[AggregateFunction]

When created, AggregationIterator initializes AggregateFunctions in the aggregateExpressions (with initialInputBufferOffset).

initializeAggregateFunctions

initializeAggregateFunctions(
  expressions: Seq[AggregateExpression],
  startingInputBufferOffset: Int): Array[AggregateFunction]

initializeAggregateFunctions...FIXME

initializeAggregateFunctions is used when:

Generating Process Row Function

generateProcessRow(
  expressions: Seq[AggregateExpression],
  functions: Seq[AggregateFunction],
  inputAttributes: Seq[Attribute]): (InternalRow, InternalRow) => Unit

generateProcessRow is a procedure

generateProcessRow creates a Scala function (procedure) that takes two InternalRows and produces no output.

def generateProcessRow(currentBuffer: InternalRow, row: InternalRow): Unit = {
  ...
}

generateProcessRow creates a mutable JoinedRow (of two InternalRows).

generateProcessRow branches off based on the given AggregateExpressions (expressions).

With no AggregateExpressions (expressions), generateProcessRow creates a function that does nothing (and "swallows" the input).

functions Argument

generateProcessRow works differently based on the type of the given AggregateFunctions:

Otherwise, with some AggregateExpressions (expressions), generateProcessRow...FIXME


generateProcessRow is used when:

generateOutput

generateOutput: (UnsafeRow, InternalRow) => UnsafeRow

When created, AggregationIterator creates a ResultProjection function.

generateOutput is used when:

generateResultProjection

generateResultProjection(): (UnsafeRow, InternalRow) => UnsafeRow

generateResultProjection...FIXME

initializeBuffer

initializeBuffer(
  buffer: InternalRow): Unit

initializeBuffer requests the expressionAggInitialProjection to store an execution result of an empty row in the given InternalRow (buffer).

initializeBuffer requests all the ImperativeAggregate functions to initialize with the buffer internal row.


initializeBuffer is used when: