SQLExecutionRDD¶
SQLExecutionRDD
is an RDD[InternalRow]
to wrap the parent RDD and make sure that the SQL configuration properties are always propagated to executors (even when rdd
or QueryExecution.toRdd are used).
Tip
Review SPARK-28939 to learn when and why SQLExecutionRDD
would be used outside a tracked SQL operation (and with no spark.sql.execution.id
defined).
Creating Instance¶
SQLExecutionRDD
takes the following to be created:
While being created, SQLExecutionRDD
initializes a sqlConfigs internal registry.
SQLExecutionRDD
is created when:
QueryExecution
is requested to toRdd
SQL RDD¶
SQLExecutionRDD
is given an RDD[InternalRow]
when created.
The RDD[InternalRow]
is the executedPlan requested to execute.
sqlConfigs¶
SQLExecutionRDD
requests the given SQLConf for all the configuration properties that have been set when created.
Lazy Value
sqlConfigs
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
Computing Partition¶
compute(
split: Partition,
context: TaskContext): Iterator[InternalRow]
compute
looks up the spark.sql.execution.id local property in the given TaskContext
(Apache Spark).
If not defined (null
), compute
sets the sqlConfigs as thread-local properties to requests the sqlRDD for iterator
(execute the sqlRDD
). Otherwise, if in the context of a tracked SQL operation (and the spark.sql.execution.id
is defined), compute
simply requests the parent sqlRDD for iterator
.
compute
is part of the RDD
(Apache Spark) abstraction.