Skip to content

DataSourceV2Strategy Execution Planning Strategy

DataSourceV2Strategy is an execution planning strategy.

Logical Operator Physical Operator
DataSourceV2ScanRelation with V1Scan RowDataSourceScanExec
DataSourceV2ScanRelation BatchScanExec
StreamingDataSourceV2Relation
WriteToDataSourceV2 (Spark Structured Streaming) WriteToDataSourceV2Exec (Spark Structured Streaming)
CreateTableAsSelect AtomicCreateTableAsSelectExec or CreateTableAsSelectExec
RefreshTable RefreshTableExec
ReplaceTable AtomicReplaceTableExec or ReplaceTableExec
ReplaceTableAsSelect AtomicReplaceTableAsSelectExec or ReplaceTableAsSelectExec
AppendData AppendDataExecV1 or AppendDataExec
OverwriteByExpression with a DataSourceV2Relation OverwriteByExpressionExecV1 or OverwriteByExpressionExec
OverwritePartitionsDynamic OverwritePartitionsDynamicExec
DeleteFromTable with DataSourceV2ScanRelation DeleteFromTableExec
WriteToContinuousDataSource WriteToContinuousDataSourceExec
DescribeNamespace DescribeNamespaceExec
DescribeRelation DescribeTableExec
DropTable DropTableExec
NoopDropTable LocalTableScanExec
AlterTable AlterTableExec
others

Creating Instance

DataSourceV2Strategy takes the following to be created:

DataSourceV2Strategy is created when:

Executing Rule

apply(
  plan: LogicalPlan): Seq[SparkPlan]

apply is part of GenericStrategy abstraction.

apply branches off per the type of the given logical operator.

Logging

Enable ALL logging level for org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy logger to see what happens inside.

Add the following line to conf/log4j2.properties:

log4j.logger.org.apache.spark.sql.execution.datasources.v2.DataSourceV2Strategy=ALL

Refer to Logging.