Skip to content

V2ScanRelationPushDown Logical Optimization

V2ScanRelationPushDown is a logical optimization (Rule[LogicalPlan]).

V2ScanRelationPushDown is a non-excludable optimization and is part of earlyScanPushDownRules.

Executing Rule

apply(plan: LogicalPlan): LogicalPlan

apply is part of the Rule abstraction.


apply runs (applies) the following optimizations on the given LogicalPlan:

  1. Creating ScanBuilders
  2. pushDownSample
  3. pushDownFilters
  4. pushDownAggregates
  5. pushDownLimits
  6. pruneColumns

Creating ScanBuilder (for DataSourceV2Relation)

createScanBuilder(
  plan: LogicalPlan): LogicalPlan

createScanBuilder transforms DataSourceV2Relations in the given LogicalPlan.

For every DataSourceV2Relation, createScanBuilder creates a ScanBuilderHolder with the following:

pushDownSample

pushDownSample(
  plan: LogicalPlan): LogicalPlan

pushDownSample transforms Sample operators in the given LogicalPlan.

pushDownFilters

pushDownFilters(
  plan: LogicalPlan): LogicalPlan

pushDownFilters transforms Filter operators over ScanBuilderHolder in the given LogicalPlan.

pushDownFilters prints out the following INFO message to the logs:

Pushing operators to [name]
Pushed Filters: [pushedFilters]
Post-Scan Filters: [postScanFilters]

pushDownAggregates

pushDownAggregates(
  plan: LogicalPlan): LogicalPlan

pushDownAggregates transforms Aggregate operators in the given LogicalPlan.

pushDownAggregates prints out the following INFO message to the logs:

Pushing operators to [name]
Pushed Aggregate Functions: [aggregateExpressions]
Pushed Group by: [groupByExpressions]
Output: [output]

pushDownLimits

pushDownLimits(
  plan: LogicalPlan): LogicalPlan

pushDownLimits transforms GlobalLimit operators in the given LogicalPlan.

pruneColumns

pruneColumns(
  plan: LogicalPlan): LogicalPlan

pruneColumns transforms Project and Filter operators over ScanBuilderHolder in the given LogicalPlan and creates DataSourceV2ScanRelation.

pruneColumns prints out the following INFO message to the logs:

Output: [output]

Logging

Enable ALL logging level for org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown logger to see what happens inside.

Add the following line to conf/log4j2.properties:

log4j.logger.org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown=ALL

Refer to Logging.