Skip to content

AutoCompactUtils

prepareAutoCompactRequest

prepareAutoCompactRequest(
  spark: SparkSession,
  txn: OptimisticTransactionImpl,
  postCommitSnapshot: Snapshot,
  partitionsAddedToOpt: Option[PartitionKeySet],
  opType: String,
  maxDeletedRowsRatio: Option[Double]): AutoCompactRequest

prepareAutoCompactRequest creates an AutoCompactRequest based on reserveTablePartitions and a partition predicate (for the given postCommitSnapshot and the reserved partitions).


prepareAutoCompactRequest is used when:

createPartitionPredicate

createPartitionPredicate(
  postCommitSnapshot: Snapshot,
  partitions: PartitionKeySet): Seq[Expression]

createPartitionPredicate...FIXME

reserveTablePartitions

reserveTablePartitions(
  spark: SparkSession,
  deltaLog: DeltaLog,
  postCommitSnapshot: Snapshot,
  partitionsAddedToOpt: Option[PartitionKeySet],
  opType: String,
  maxDeletedRowsRatio: Option[Double]): (Boolean, PartitionKeySet)
maxDeletedRowsRatio always undefined (None)

maxDeletedRowsRatio is always None as that's what prepareAutoCompactRequest is called with when compacting if necessary.

opType always delta.commit.hooks.autoOptimize

opType is always delta.commit.hooks.autoOptimize.

partitionsAddedToOpt

partitionsAddedToOpt is the set of distinct partitions that contain added files by the current transaction.

Noop when the given partitionsAddedToOpt is empty

reserveTablePartitions does nothing and exits early (noop) when the given partitionsAddedToOpt is empty.

reserveTablePartitions returns (false, Set.empty[PartitionKey]).

reserveTablePartitions finds free partitions to perform auto compaction on based on the two internal flags:

When both enabled, reserveTablePartitions filterFreePartitions. Otherwise, the given partitionsAddedToOpt is used as-is.

reserveTablePartitions does nothing (noop) when there is no free partition. reserveTablePartitions returns (false, Set.empty[PartitionKey]).

reserveTablePartitions choosePartitionsBasedOnMinNumSmallFiles with the free partitions.

With shouldCompactBasedOnNumFiles enabled and no chosenPartitionsBasedOnNumFiles, reserveTablePartitions does nothing more and returns (true, Set.empty[PartitionKey]).

reserveTablePartitions choosePartitionsBasedOnDVs with the free partitions.

reserveTablePartitions...FIXME

choosePartitionsBasedOnMinNumSmallFiles

choosePartitionsBasedOnMinNumSmallFiles(
  spark: SparkSession,
  deltaLog: DeltaLog,
  postCommitSnapshot: Snapshot,
  freePartitionsAddedTo: PartitionKeySet): ChosenPartitionsResult

choosePartitionsBasedOnMinNumSmallFiles...FIXME

isQualifiedForAutoCompact

isQualifiedForAutoCompact(
  spark: SparkSession,
  txn: OptimisticTransactionImpl): Boolean

isQualifiedForAutoCompact is disabled (false) when there is no transaction commit (i.e., no txnExecutionTimeMs in the given OptimisticTransactionImpl).

isQualifiedForAutoCompact is enabled (true) for isModifiedPartitionsOnlyAutoCompactEnabled disabled.

isQualifiedForAutoCompact is enabled (true) if either holds:

  1. isNonBlindAppendAutoCompactEnabled is disabled
  2. The given OptimisticTransactionImpl is not blind-append

isQualifiedForAutoCompact is used when:

isNonBlindAppendAutoCompactEnabled

isNonBlindAppendAutoCompactEnabled(
  spark: SparkSession): Boolean

isNonBlindAppendAutoCompactEnabled is the value of spark.databricks.delta.autoCompact.nonBlindAppend.enabled configuration property (in the given SparkSession).

isModifiedPartitionsOnlyAutoCompactEnabled

isModifiedPartitionsOnlyAutoCompactEnabled(
  spark: SparkSession): Boolean

isModifiedPartitionsOnlyAutoCompactEnabled says whether Auto Compaction should run on modified partitions only.


isModifiedPartitionsOnlyAutoCompactEnabled is the value of spark.databricks.delta.autoCompact.modifiedPartitionsOnly.enabled configuration property (in the given SparkSession).


isModifiedPartitionsOnlyAutoCompactEnabled is used when: