Skip to content

PrepareDeltaScanBase Logical Optimizations

PrepareDeltaScanBase is an extension of the Rule[LogicalPlan] (Spark SQL) abstraction for logical optimizations that prepareDeltaScan.

Implementations

PredicateHelper

PrepareDeltaScanBase is a PredicateHelper (Spark SQL).

Executing Rule

apply(
  _plan: LogicalPlan): LogicalPlan

With spark.databricks.delta.stats.skipping configuration property enabled, apply makes sure that the given LogicalPlan (Spark SQL) is neither a subquery (Subquery or SupportsSubquery) nor a V2WriteCommand (Spark SQL) and prepareDeltaScan.


apply is part of the Rule (Spark SQL) abstraction.

prepareDeltaScan

prepareDeltaScan(
  plan: LogicalPlan): LogicalPlan

prepareDeltaScan finds delta table scans (i.e. DeltaTables with TahoeLogFileIndex).

For a delta table scan, prepareDeltaScan finds a DeltaScanGenerator for the TahoeLogFileIndex.

prepareDeltaScan uses an internal deltaScans registry (of canonicalized logical scans and their Snapshots and DeltaScans) to look up the delta table scan or creates a new entry.

prepareDeltaScan creates a PreparedDeltaFileIndex.

In the end, prepareDeltaScan optimizeGeneratedColumns.

getDeltaScanGenerator

getDeltaScanGenerator(
  index: TahoeLogFileIndex): DeltaScanGenerator

getDeltaScanGenerator...FIXME

getPreparedIndex

getPreparedIndex(
  preparedScan: DeltaScan,
  fileIndex: TahoeLogFileIndex): PreparedDeltaFileIndex

getPreparedIndex creates a new PreparedDeltaFileIndex (for the DeltaScan and the TahoeLogFileIndex).

getPreparedIndex requires that the partitionFilters (of the TahoeLogFileIndex) are empty or throws an AssertionError:

assertion failed: Partition filters should have been extracted by DeltaAnalysis.

filesForScan

filesForScan(
  scanGenerator: DeltaScanGenerator,
  limitOpt: Option[Int],
  projection: Seq[Attribute],
  filters: Seq[Expression],
  delta: LogicalRelation): (Snapshot, DeltaScan)

Note

The given limitOpt argument is not used.

filesForScan prints out the following INFO message to the logs:

DELTA: Filtering files for query

filesForScan determines the filters for a scan based on generatedColumn.partitionFilterOptimization.enabled configuration property:

filesForScan requests the given DeltaScanGenerator for the Snapshot to scan and a DeltaScan (that are the return pair).

In the end, filesForScan prints out the following INFO message to the logs:

DELTA: Done

optimizeGeneratedColumns

optimizeGeneratedColumns(
  scannedSnapshot: Snapshot,
  scan: LogicalPlan,
  preparedIndex: PreparedDeltaFileIndex,
  filters: Seq[Expression],
  limit: Option[Int],
  delta: LogicalRelation): LogicalPlan

optimizeGeneratedColumns...FIXME

Logging

PrepareDeltaScanBase is an abstract class and logging is configured using the logger of the implementations.