V2ScanRelationPushDown Logical Optimization¶
V2ScanRelationPushDown
is a logical optimization (Rule[LogicalPlan]
).
V2ScanRelationPushDown
is a non-excludable optimization and is part of earlyScanPushDownRules.
Executing Rule¶
apply(plan: LogicalPlan): LogicalPlan
apply
is part of the Rule abstraction.
apply
runs (applies) the following optimizations on the given LogicalPlan:
Creating ScanBuilder (for DataSourceV2Relation)¶
createScanBuilder(
plan: LogicalPlan): LogicalPlan
createScanBuilder
transforms DataSourceV2Relations in the given LogicalPlan.
For every DataSourceV2Relation
, createScanBuilder
creates a ScanBuilderHolder
with the following:
- output schema of the
DataSourceV2Relation
- The
DataSourceV2Relation
- A ScanBuilder (with the options of the
DataSourceV2Relation
) of the SupportsRead of the Table of theDataSourceV2Relation
pushDownSample¶
pushDownSample(
plan: LogicalPlan): LogicalPlan
pushDownSample
transforms Sample
operators in the given LogicalPlan.
pushDownFilters¶
pushDownFilters(
plan: LogicalPlan): LogicalPlan
pushDownFilters
transforms Filter
operators over ScanBuilderHolder
in the given LogicalPlan.
pushDownFilters
prints out the following INFO message to the logs:
Pushing operators to [name]
Pushed Filters: [pushedFilters]
Post-Scan Filters: [postScanFilters]
pushDownAggregates¶
pushDownAggregates(
plan: LogicalPlan): LogicalPlan
pushDownAggregates
transforms Aggregate operators in the given LogicalPlan.
pushDownAggregates
prints out the following INFO message to the logs:
Pushing operators to [name]
Pushed Aggregate Functions: [aggregateExpressions]
Pushed Group by: [groupByExpressions]
Output: [output]
pushDownLimits¶
pushDownLimits(
plan: LogicalPlan): LogicalPlan
pushDownLimits
transforms GlobalLimit operators in the given LogicalPlan.
pruneColumns¶
pruneColumns(
plan: LogicalPlan): LogicalPlan
pruneColumns
transforms Project and Filter
operators over ScanBuilderHolder
in the given LogicalPlan and creates DataSourceV2ScanRelation.
pruneColumns
prints out the following INFO message to the logs:
Output: [output]
Logging¶
Enable ALL
logging level for org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown
logger to see what happens inside.
Add the following line to conf/log4j2.properties
:
log4j.logger.org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown=ALL
Refer to Logging.