CollectSet Expression¶
CollectSet
is a Collect expression (with a mutable.HashSet[Any]] aggregation buffer).
CollectSet and Scala's HashSet
It's fair to say that CollectSet
is merely a Spark SQL-enabled Scala mutable.HashSet[Any]].
Creating Instance¶
CollectSet
takes the following to be created:
- Child Expression
-
mutableAggBufferOffset
(default:0
) -
inputAggBufferOffset
(default:0
)
CollectSet
is created when:
- Catalyst DSL's collectSet is used
- collect_set standard function is used
- collect_set SQL function is used
Pretty Name¶
prettyName: String
prettyName
is part of the Expression abstraction.
prettyName
is collect_set
.
Creating Aggregation Buffer¶
createAggregationBuffer(): mutable.HashSet[Any]
createAggregationBuffer
is part of the TypedImperativeAggregate abstraction.
createAggregationBuffer
creates an empty mutable.HashSet
(Scala).
Interpreted Execution¶
eval(
buffer: mutable.HashSet[Any]): Any
eval
is part of the TypedImperativeAggregate abstraction.
eval
creates a GenericArrayData
with an array based on the DataType of the child expression:
- For
BinaryType
,eval
...FIXME - Otherwise,
eval
...FIXME
EliminateDistinct Logical Optimization¶
CollectSet
is isDuplicateAgnostic
per EliminateDistinct
logical optimization.