Skip to content

DeletionVectorSet

Creating Instance

DeletionVectorSet takes the following to be created:

DeletionVectorSet is created when:

Building Deletion Vectors

computeResult(): Seq[DeletionVectorResult]
Very Spark SQL-dependent

computeResult is very Spark SQL-dependent using the following high-level operators for its job:

  • Dataset.groupBy
  • Dataset.mapPartitions
  • Dataset.select
  • Dataset.collect

computeResult groups records in the given target DataFrame by filePath and deletionVectorId columns to execute aggColumns aggregation.

computeResult selects the outputColumns.

In the end, computeResult bitmapStorageMapper (on every partition) and collects the result.


computeResult is used when:

bitmapStorageMapper

bitmapStorageMapper(): Iterator[DeletionVectorData] => Iterator[DeletionVectorResult]
Dataset.mapPartitions

bitmapStorageMapper is executed on every partition using Dataset.mapPartitions operator.

bitmapStorageMapper getRandomPrefixLength (from the metadata of this OptimisticTransaction).

In the end, bitmapStorageMapper createMapperToStoreDeletionVectors (for the data directory of this target delta table).

getRandomPrefixLength

getRandomPrefixLength(
  metadata: Metadata): Int

getRandomPrefixLength...FIXME