WriteToContinuousDataSourceExec Physical Operator¶
WriteToContinuousDataSourceExec
is a unary physical operator (Spark SQL) that <
WriteToContinuousDataSourceExec
is <DataSourceV2Strategy
(Spark SQL) execution planning strategy is requested to plan a WriteToContinuousDataSource unary logical operator.
[[creating-instance]] WriteToContinuousDataSourceExec
takes the following to be created:
- [[query]][[child]] Child physical operator (
SparkPlan
)
[[output]] WriteToContinuousDataSourceExec
uses empty output schema (which is exactly to say that no output is expected whatsoever).
[[logging]] [TIP] ==== Enable ALL
logging level for org.apache.spark.sql.execution.streaming.continuous.WriteToContinuousDataSourceExec
to see what happens inside.
Add the following line to conf/log4j.properties
:
log4j.logger.org.apache.spark.sql.execution.streaming.continuous.WriteToContinuousDataSourceExec=ALL
Refer to <>.¶
=== [[doExecute]] Executing Physical Operator (Generating RDD[InternalRow]) -- doExecute
Method
[source, scala]¶
doExecute(): RDD[InternalRow]¶
NOTE: doExecute
is part of SparkPlan
Contract to generate the runtime representation of an physical operator as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]
).
doExecute
requests the <DataWriterFactory
.
doExecute
then requests the <RDD[InternalRow]
) and uses the RDD[InternalRow]
and the DataWriterFactory
to create a <
doExecute
prints out the following INFO message to the logs:
Start processing data source writer: [writer]. The input RDD has [partitions] partitions.
doExecute
requests the EpochCoordinatorRef
helper for a <
NOTE: The <
doExecute
requests the EpochCoordinator RPC endpoint reference to send out a <
In the end, doExecute
requests the ContinuousWriteRDD
to collect (which simply runs a Spark job on all partitions in an RDD and returns the results in an array).
NOTE: Requesting the ContinuousWriteRDD
to collect is how a Spark job is ran that in turn runs tasks (one per partition) that are described by the <collect
is meant to run a Spark job (with tasks on executors), it's in the discretion of the tasks themselves to decide when to finish (so if they want to run indefinitely, so be it). What a clever trick!