DeletionVectorSet¶
Creating Instance¶
DeletionVectorSet
takes the following to be created:
-
SparkSession
- Target
DataFrame
- DeltaLog of the delta table
- OptimisticTransaction
DeletionVectorSet
is created when:
DeletionVectorBitmapGenerator
is requested to build deletion vectors
Building Deletion Vectors¶
computeResult(): Seq[DeletionVectorResult]
Very Spark SQL-dependent
computeResult
is very Spark SQL-dependent using the following high-level operators for its job:
Dataset.groupBy
Dataset.mapPartitions
Dataset.select
Dataset.collect
computeResult
groups records in the given target DataFrame by filePath
and deletionVectorId
columns to execute aggColumns aggregation.
computeResult
selects the outputColumns.
In the end, computeResult
bitmapStorageMapper (on every partition) and collect
s the result.
computeResult
is used when:
DeletionVectorBitmapGenerator
is requested to build the deletion vectors
bitmapStorageMapper¶
bitmapStorageMapper(): Iterator[DeletionVectorData] => Iterator[DeletionVectorResult]
Dataset.mapPartitions
bitmapStorageMapper
is executed on every partition using Dataset.mapPartitions
operator.
bitmapStorageMapper
getRandomPrefixLength (from the metadata of this OptimisticTransaction).
In the end, bitmapStorageMapper
createMapperToStoreDeletionVectors (for the data directory of this target delta table).
getRandomPrefixLength¶
getRandomPrefixLength(
metadata: Metadata): Int
getRandomPrefixLength
...FIXME