BloomFilterAggregate Expression¶
BloomFilterAggregate is a TypedImperativeAggregate expression that uses BloomFilter for an aggregation buffer.
Creating Instance¶
BloomFilterAggregate takes the following to be created:
- Child Expression
- Estimated Number of Items
- Number of Bits
- Mutable Agg Buffer Offset (default:
0) - Input Agg Buffer Offset (default:
0)
BloomFilterAggregate is created when:
InjectRuntimeFilterlogical optimization is requested to inject a BloomFilter
Estimated Number of Items Expression¶
BloomFilterAggregate can be given Estimated Number of Items (as an Expression) when created.
Unless given, BloomFilterAggregate uses spark.sql.optimizer.runtime.bloomFilter.expectedNumItems configuration property.
Number of Bits Expression¶
BloomFilterAggregate can be given Number of Bits (as an Expression) when created.
The number of bits expression must be a constant literal (i.e., foldable) that evaluates to a long value.
The maximum value for the number of bits is spark.sql.optimizer.runtime.bloomFilter.maxNumBits configuration property.
The number of bits expression is the third expression (in this TernaryLike tree node).
Number of Bits¶
numBits: Long
Lazy Value
numBits is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
BloomFilterAggregate defines numBits value to be either the value of the numBitsExpression (after evaluating it to a number) or spark.sql.optimizer.runtime.bloomFilter.maxNumBits, whatever smaller.
The numBits value must be a positive value.
numBits is used to create an aggregation buffer.
Creating Aggregation Buffer¶
TypedImperativeAggregate
createAggregationBuffer(): BloomFilter
createAggregationBuffer is part of the TypedImperativeAggregate abstraction.
createAggregationBuffer creates a BloomFilter (with the estimated number of items and the number of bits).
Interpreted Execution¶
TypedImperativeAggregate
eval(
buffer: BloomFilter): Any
eval is part of the TypedImperativeAggregate abstraction.
eval serializes the given buffer (unless the cardinality of this BloomFilter is 0 and eval returns null).
FIXME Why does eval return null?
Serializing Aggregate Buffer¶
TypedImperativeAggregate
serialize(
obj: BloomFilter): Array[Byte]
serialize is part of the TypedImperativeAggregate abstraction.
Two serializes
There is another serialize (in BloomFilterAggregate companion object) that just makes unit testing easier.
serialize...FIXME