WriteToContinuousDataSourceExec Physical Operator¶
WriteToContinuousDataSourceExec is a unary physical operator (Spark SQL) that <
WriteToContinuousDataSourceExec is <DataSourceV2Strategy (Spark SQL) execution planning strategy is requested to plan a WriteToContinuousDataSource unary logical operator.
[[creating-instance]] WriteToContinuousDataSourceExec takes the following to be created:
- [[query]][[child]] Child physical operator (
SparkPlan)
[[output]] WriteToContinuousDataSourceExec uses empty output schema (which is exactly to say that no output is expected whatsoever).
[[logging]] [TIP] ==== Enable ALL logging level for org.apache.spark.sql.execution.streaming.continuous.WriteToContinuousDataSourceExec to see what happens inside.
Add the following line to conf/log4j.properties:
log4j.logger.org.apache.spark.sql.execution.streaming.continuous.WriteToContinuousDataSourceExec=ALL
Refer to <>.¶
=== [[doExecute]] Executing Physical Operator (Generating RDD[InternalRow]) -- doExecute Method
[source, scala]¶
doExecute(): RDD[InternalRow]¶
NOTE: doExecute is part of SparkPlan Contract to generate the runtime representation of an physical operator as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).
doExecute requests the <DataWriterFactory.
doExecute then requests the <RDD[InternalRow]) and uses the RDD[InternalRow] and the DataWriterFactory to create a <
doExecute prints out the following INFO message to the logs:
Start processing data source writer: [writer]. The input RDD has [partitions] partitions.
doExecute requests the EpochCoordinatorRef helper for a <
NOTE: The <
doExecute requests the EpochCoordinator RPC endpoint reference to send out a <
In the end, doExecute requests the ContinuousWriteRDD to collect (which simply runs a Spark job on all partitions in an RDD and returns the results in an array).
NOTE: Requesting the ContinuousWriteRDD to collect is how a Spark job is ran that in turn runs tasks (one per partition) that are described by the <collect is meant to run a Spark job (with tasks on executors), it's in the discretion of the tasks themselves to decide when to finish (so if they want to run indefinitely, so be it). What a clever trick!