V2ScanRelationPushDown Logical Optimization¶
V2ScanRelationPushDown is a logical optimization (Rule[LogicalPlan]).
V2ScanRelationPushDown is a non-excludable optimization and is part of earlyScanPushDownRules.
Executing Rule¶
apply(plan: LogicalPlan): LogicalPlan
apply is part of the Rule abstraction.
apply runs (applies) the following optimizations on the given LogicalPlan:
Creating ScanBuilder (for DataSourceV2Relation)¶
createScanBuilder(
plan: LogicalPlan): LogicalPlan
createScanBuilder transforms DataSourceV2Relations in the given LogicalPlan.
For every DataSourceV2Relation, createScanBuilder creates a ScanBuilderHolder with the following:
- output schema of the
DataSourceV2Relation - The
DataSourceV2Relation - A ScanBuilder (with the options of the
DataSourceV2Relation) of the SupportsRead of the Table of theDataSourceV2Relation
pushDownSample¶
pushDownSample(
plan: LogicalPlan): LogicalPlan
pushDownSample transforms Sample operators in the given LogicalPlan.
pushDownFilters¶
pushDownFilters(
plan: LogicalPlan): LogicalPlan
pushDownFilters transforms Filter operators over ScanBuilderHolder in the given LogicalPlan.
pushDownFilters prints out the following INFO message to the logs:
Pushing operators to [name]
Pushed Filters: [pushedFilters]
Post-Scan Filters: [postScanFilters]
pushDownAggregates¶
pushDownAggregates(
plan: LogicalPlan): LogicalPlan
pushDownAggregates transforms Aggregate operators in the given LogicalPlan.
pushDownAggregates prints out the following INFO message to the logs:
Pushing operators to [name]
Pushed Aggregate Functions: [aggregateExpressions]
Pushed Group by: [groupByExpressions]
Output: [output]
pushDownLimits¶
pushDownLimits(
plan: LogicalPlan): LogicalPlan
pushDownLimits transforms GlobalLimit operators in the given LogicalPlan.
pruneColumns¶
pruneColumns(
plan: LogicalPlan): LogicalPlan
pruneColumns transforms Project and Filter operators over ScanBuilderHolder in the given LogicalPlan and creates DataSourceV2ScanRelation.
pruneColumns prints out the following INFO message to the logs:
Output: [output]
Logging¶
Enable ALL logging level for org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown logger to see what happens inside.
Add the following line to conf/log4j2.properties:
log4j.logger.org.apache.spark.sql.execution.datasources.v2.V2ScanRelationPushDown=ALL
Refer to Logging.