CollectSet Expression¶
CollectSet is a Collect expression (with a mutable.HashSet[Any]] aggregation buffer).
CollectSet and Scala's HashSet
It's fair to say that CollectSet is merely a Spark SQL-enabled Scala mutable.HashSet[Any]].
Creating Instance¶
CollectSet takes the following to be created:
- Child Expression
-  mutableAggBufferOffset(default:0)
-  inputAggBufferOffset(default:0)
CollectSet is created when:
- Catalyst DSL's collectSet is used
- collect_set standard function is used
- collect_set SQL function is used
Pretty Name¶
prettyName: String
prettyName is part of the Expression abstraction.
prettyName is collect_set.
Creating Aggregation Buffer¶
createAggregationBuffer(): mutable.HashSet[Any]
createAggregationBuffer is part of the TypedImperativeAggregate abstraction.
createAggregationBuffer creates an empty mutable.HashSet (Scala).
Interpreted Execution¶
eval(
  buffer: mutable.HashSet[Any]): Any
eval is part of the TypedImperativeAggregate abstraction.
eval creates a GenericArrayData with an array based on the DataType of the child expression:
- For BinaryType,eval...FIXME
- Otherwise, eval...FIXME
EliminateDistinct Logical Optimization¶
CollectSet is isDuplicateAgnostic per EliminateDistinct logical optimization.