PrepareDeltaScanBase Logical Optimizations¶
PrepareDeltaScanBase is an extension of the Rule[LogicalPlan] (Spark SQL) abstraction for logical optimizations that prepareDeltaScan.
Implementations¶
PredicateHelper¶
PrepareDeltaScanBase is a PredicateHelper (Spark SQL).
Executing Rule¶
apply(
_plan: LogicalPlan): LogicalPlan
With spark.databricks.delta.stats.skipping configuration property enabled, apply makes sure that the given LogicalPlan (Spark SQL) is neither a subquery (Subquery or SupportsSubquery) nor a V2WriteCommand (Spark SQL) and prepareDeltaScan.
apply is part of the Rule (Spark SQL) abstraction.
prepareDeltaScan¶
prepareDeltaScan(
plan: LogicalPlan): LogicalPlan
prepareDeltaScan finds delta table scans (i.e. DeltaTables with TahoeLogFileIndex).
For a delta table scan, prepareDeltaScan finds a DeltaScanGenerator for the TahoeLogFileIndex.
prepareDeltaScan uses an internal deltaScans registry (of canonicalized logical scans and their Snapshots and DeltaScans) to look up the delta table scan or creates a new entry.
prepareDeltaScan creates a PreparedDeltaFileIndex.
In the end, prepareDeltaScan optimizeGeneratedColumns.
getDeltaScanGenerator¶
getDeltaScanGenerator(
index: TahoeLogFileIndex): DeltaScanGenerator
getDeltaScanGenerator...FIXME
getPreparedIndex¶
getPreparedIndex(
preparedScan: DeltaScan,
fileIndex: TahoeLogFileIndex): PreparedDeltaFileIndex
getPreparedIndex creates a new PreparedDeltaFileIndex (for the DeltaScan and the TahoeLogFileIndex).
getPreparedIndex requires that the partitionFilters (of the TahoeLogFileIndex) are empty or throws an AssertionError:
assertion failed: Partition filters should have been extracted by DeltaAnalysis.
filesForScan¶
filesForScan(
scanGenerator: DeltaScanGenerator,
limitOpt: Option[Int],
projection: Seq[Attribute],
filters: Seq[Expression],
delta: LogicalRelation): (Snapshot, DeltaScan)
Note
The given limitOpt argument is not used.
filesForScan prints out the following INFO message to the logs:
DELTA: Filtering files for query
filesForScan determines the filters for a scan based on generatedColumn.partitionFilterOptimization.enabled configuration property:
- If disabled,
filesForScanuses the givenfiltersexpressions unchanged - With generatedColumn.partitionFilterOptimization.enabled enabled,
filesForScangenerates the partition filters that are used alongside the givenfiltersexpressions
filesForScan requests the given DeltaScanGenerator for the Snapshot to scan and a DeltaScan (that are the return pair).
In the end, filesForScan prints out the following INFO message to the logs:
DELTA: Done
optimizeGeneratedColumns¶
optimizeGeneratedColumns(
scannedSnapshot: Snapshot,
scan: LogicalPlan,
preparedIndex: PreparedDeltaFileIndex,
filters: Seq[Expression],
limit: Option[Int],
delta: LogicalRelation): LogicalPlan
optimizeGeneratedColumns...FIXME
Logging¶
PrepareDeltaScanBase is an abstract class and logging is configured using the logger of the implementations.