Skip to content

DeletionVectorBitmapGenerator

buildRowIndexSetsForFilesMatchingCondition

buildRowIndexSetsForFilesMatchingCondition(
  sparkSession: SparkSession,
  txn: OptimisticTransaction,
  tableHasDVs: Boolean,
  targetDf: DataFrame,
  candidateFiles: Seq[AddFile],
  condition: Expression,
  fileNameColumnOpt: Option[Column] = None,
  rowIndexColumnOpt: Option[Column] = None): Seq[DeletionVectorResult]

buildRowIndexSetsForFilesMatchingCondition adds the following columns to the input targetDf DataFrame:

Column Name Column
filePath The given fileNameColumnOpt if specified or _metadata.file_path
rowIndexCol The given rowIndexColumnOpt if specified or one of the following based on spark.databricks.delta.deletionVectors.useMetadataRowIndex:
deletionVectorId
  • With the table with deletion vectors (based on the given tableHasDVs flag), buildRowIndexSetsForFilesMatchingCondition...FIXME...the DeletionVectorDescriptors of the given candidateFiles
  • Otherwise, null (undefined)

In the end, buildRowIndexSetsForFilesMatchingCondition builds the deletion vectors (for the modified targetDf DataFrame).


buildRowIndexSetsForFilesMatchingCondition is used when:

Building Deletion Vectors

buildDeletionVectors(
  spark: SparkSession,
  target: DataFrame,
  targetDeltaLog: DeltaLog,
  deltaTxn: OptimisticTransaction): Seq[DeletionVectorResult]

buildDeletionVectors creates a new DeletionVectorSet to build the deletion vectors.