Skip to content

AutoCompactBase

AutoCompactBase is an extension of the PostCommitHook abstraction for post-commit hooks that perform auto compaction.

Implementations

Name

PostCommitHook
name: String

name is part of the PostCommitHook abstraction.

name is Auto Compact.

Executing Post-Commit Hook

PostCommitHook
run(
  spark: SparkSession,
  txn: OptimisticTransactionImpl,
  committedVersion: Long,
  postCommitSnapshot: Snapshot,
  actions: Seq[Action]): Unit

run is part of the PostCommitHook abstraction.

run determines whether Auto Compaction is enabled or not.

run does nothing and returns (and hence skips auto compacting) when shouldSkipAutoCompact is enabled.

In the end, run compactIfNecessary with the following:

  • delta.commit.hooks.autoOptimize operation name
  • maxDeletedRowsRatio unspecified (None)

Compacting If Necessary

compactIfNecessary(
  spark: SparkSession,
  txn: OptimisticTransactionImpl,
  postCommitSnapshot: Snapshot,
  opType: String,
  maxDeletedRowsRatio: Option[Double]): Seq[OptimizeMetrics]

maxDeletedRowsRatio always undefined (None)

compactIfNecessary prepares an AutoCompactRequest to determine whether to perform auto compaction or not (based on shouldCompact flag of the AutoCompactRequest).

With shouldCompact flag enabled, compactIfNecessary performs auto compaction. Otherwise, compactIfNecessary returns no OptimizeMetrics.

getAutoCompactType

getAutoCompactType(
  conf: SQLConf,
  metadata: Metadata): Option[AutoCompactType]
Return Type

Option[AutoCompactType] is the return type but it's a fancy way to say "enabled" or "not".

When getAutoCompactType returns Some[AutoCompactType] it means "enabled" while None is "disabled".

getAutoCompactType is enabled when either is true (in the order of precedence):

  1. spark.databricks.delta.autoCompact.enabled
  2. (deprecated) delta.autoOptimize table property
  3. delta.autoOptimize.autoCompact table property

getAutoCompactType defaults to false (disabled).

shouldSkipAutoCompact

shouldSkipAutoCompact(
  autoCompactTypeOpt: Option[AutoCompactType],
  spark: SparkSession,
  txn: OptimisticTransactionImpl): Boolean

shouldSkipAutoCompact is enabled (true) for the following:

  1. The given autoCompactTypeOpt is empty (None)
  2. isQualifiedForAutoCompact is disabled

Executing Auto Compaction

compact(
  spark: SparkSession,
  deltaLog: DeltaLog,
  catalogTable: Option[CatalogTable],
  partitionPredicates: Seq[Expression] = Nil,
  opType: String = OP_TYPE,
  maxDeletedRowsRatio: Option[Double] = None): Seq[OptimizeMetrics]

compact starts a transaction on the delta table and performs optimization.


compact requests the given DeltaLog to start a transaction.

compact creates a DeltaOptimizeContext with the value of the following configuration properties:

compact requests a new OptimizeExecutor (with no zOrderByColumns and the isAutoCompact flag enabled) to optimize.

Note

The delta table to run optimize on is passed indirectly, as the DeltaLog via the OptimisticTransaction.

In the end, compact returns the OptimizeMetrics (from the optimize stats).