DeletionVectorSet¶
Creating Instance¶
DeletionVectorSet takes the following to be created:
-
SparkSession - Target
DataFrame - DeltaLog of the delta table
- OptimisticTransaction
DeletionVectorSet is created when:
DeletionVectorBitmapGeneratoris requested to build deletion vectors
Building Deletion Vectors¶
computeResult(): Seq[DeletionVectorResult]
Very Spark SQL-dependent
computeResult is very Spark SQL-dependent using the following high-level operators for its job:
Dataset.groupByDataset.mapPartitionsDataset.selectDataset.collect
computeResult groups records in the given target DataFrame by filePath and deletionVectorId columns to execute aggColumns aggregation.
computeResult selects the outputColumns.
In the end, computeResult bitmapStorageMapper (on every partition) and collects the result.
computeResult is used when:
DeletionVectorBitmapGeneratoris requested to build the deletion vectors
bitmapStorageMapper¶
bitmapStorageMapper(): Iterator[DeletionVectorData] => Iterator[DeletionVectorResult]
Dataset.mapPartitions
bitmapStorageMapper is executed on every partition using Dataset.mapPartitions operator.
bitmapStorageMapper getRandomPrefixLength (from the metadata of this OptimisticTransaction).
In the end, bitmapStorageMapper createMapperToStoreDeletionVectors (for the data directory of this target delta table).
getRandomPrefixLength¶
getRandomPrefixLength(
metadata: Metadata): Int
getRandomPrefixLength...FIXME