PrepareDeltaScanBase Logical Optimizations¶
PrepareDeltaScanBase
is an extension of the Rule[LogicalPlan]
(Spark SQL) abstraction for logical optimizations that prepareDeltaScan.
Implementations¶
PredicateHelper¶
PrepareDeltaScanBase
is a PredicateHelper
(Spark SQL).
Executing Rule¶
apply(
_plan: LogicalPlan): LogicalPlan
With spark.databricks.delta.stats.skipping configuration property enabled, apply
makes sure that the given LogicalPlan
(Spark SQL) is neither a subquery (Subquery
or SupportsSubquery
) nor a V2WriteCommand
(Spark SQL) and prepareDeltaScan.
apply
is part of the Rule
(Spark SQL) abstraction.
prepareDeltaScan¶
prepareDeltaScan(
plan: LogicalPlan): LogicalPlan
prepareDeltaScan
finds delta table scans (i.e. DeltaTables with TahoeLogFileIndex).
For a delta table scan, prepareDeltaScan
finds a DeltaScanGenerator for the TahoeLogFileIndex
.
prepareDeltaScan
uses an internal deltaScans
registry (of canonicalized logical scans and their Snapshots and DeltaScans) to look up the delta table scan or creates a new entry.
prepareDeltaScan
creates a PreparedDeltaFileIndex.
In the end, prepareDeltaScan
optimizeGeneratedColumns.
getDeltaScanGenerator¶
getDeltaScanGenerator(
index: TahoeLogFileIndex): DeltaScanGenerator
getDeltaScanGenerator
...FIXME
getPreparedIndex¶
getPreparedIndex(
preparedScan: DeltaScan,
fileIndex: TahoeLogFileIndex): PreparedDeltaFileIndex
getPreparedIndex
creates a new PreparedDeltaFileIndex (for the DeltaScan and the TahoeLogFileIndex).
getPreparedIndex
requires that the partitionFilters (of the TahoeLogFileIndex) are empty or throws an AssertionError
:
assertion failed: Partition filters should have been extracted by DeltaAnalysis.
filesForScan¶
filesForScan(
scanGenerator: DeltaScanGenerator,
limitOpt: Option[Int],
projection: Seq[Attribute],
filters: Seq[Expression],
delta: LogicalRelation): (Snapshot, DeltaScan)
Note
The given limitOpt
argument is not used.
filesForScan
prints out the following INFO message to the logs:
DELTA: Filtering files for query
filesForScan
determines the filters for a scan based on generatedColumn.partitionFilterOptimization.enabled configuration property:
- If disabled,
filesForScan
uses the givenfilters
expressions unchanged - With generatedColumn.partitionFilterOptimization.enabled enabled,
filesForScan
generates the partition filters that are used alongside the givenfilters
expressions
filesForScan
requests the given DeltaScanGenerator for the Snapshot to scan and a DeltaScan (that are the return pair).
In the end, filesForScan
prints out the following INFO message to the logs:
DELTA: Done
optimizeGeneratedColumns¶
optimizeGeneratedColumns(
scannedSnapshot: Snapshot,
scan: LogicalPlan,
preparedIndex: PreparedDeltaFileIndex,
filters: Seq[Expression],
limit: Option[Int],
delta: LogicalRelation): LogicalPlan
optimizeGeneratedColumns
...FIXME
Logging¶
PrepareDeltaScanBase
is an abstract class and logging is configured using the logger of the implementations.