BloomFilterAggregate Expression¶
BloomFilterAggregate
is a TypedImperativeAggregate expression that uses BloomFilter for an aggregation buffer.
Creating Instance¶
BloomFilterAggregate
takes the following to be created:
- Child Expression
- Estimated Number of Items
- Number of Bits
- Mutable Agg Buffer Offset (default:
0
) - Input Agg Buffer Offset (default:
0
)
BloomFilterAggregate
is created when:
InjectRuntimeFilter
logical optimization is requested to inject a BloomFilter
Estimated Number of Items Expression¶
BloomFilterAggregate
can be given Estimated Number of Items (as an Expression) when created.
Unless given, BloomFilterAggregate
uses spark.sql.optimizer.runtime.bloomFilter.expectedNumItems configuration property.
Number of Bits Expression¶
BloomFilterAggregate
can be given Number of Bits (as an Expression) when created.
The number of bits expression must be a constant literal (i.e., foldable) that evaluates to a long value.
The maximum value for the number of bits is spark.sql.optimizer.runtime.bloomFilter.maxNumBits configuration property.
The number of bits expression is the third expression (in this TernaryLike
tree node).
Number of Bits¶
numBits: Long
Lazy Value
numBits
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
BloomFilterAggregate
defines numBits
value to be either the value of the numBitsExpression (after evaluating it to a number) or spark.sql.optimizer.runtime.bloomFilter.maxNumBits, whatever smaller.
The numBits
value must be a positive value.
numBits
is used to create an aggregation buffer.
Creating Aggregation Buffer¶
TypedImperativeAggregate
createAggregationBuffer(): BloomFilter
createAggregationBuffer
is part of the TypedImperativeAggregate abstraction.
createAggregationBuffer
creates a BloomFilter (with the estimated number of items and the number of bits).
Interpreted Execution¶
TypedImperativeAggregate
eval(
buffer: BloomFilter): Any
eval
is part of the TypedImperativeAggregate abstraction.
eval
serializes the given buffer
(unless the cardinality of this BloomFilter
is 0
and eval
returns null
).
FIXME Why does eval
return null
?
Serializing Aggregate Buffer¶
TypedImperativeAggregate
serialize(
obj: BloomFilter): Array[Byte]
serialize
is part of the TypedImperativeAggregate abstraction.
Two serialize
s
There is another serialize
(in BloomFilterAggregate
companion object) that just makes unit testing easier.
serialize
...FIXME