SQLExecutionRDD¶
SQLExecutionRDD is an RDD[InternalRow] to wrap the parent RDD and make sure that the SQL configuration properties are always propagated to executors (even when rdd or QueryExecution.toRdd are used).
Tip
Review SPARK-28939 to learn when and why SQLExecutionRDD would be used outside a tracked SQL operation (and with no spark.sql.execution.id defined).
Creating Instance¶
SQLExecutionRDD takes the following to be created:
While being created, SQLExecutionRDD initializes a sqlConfigs internal registry.
SQLExecutionRDD is created when:
QueryExecutionis requested to toRdd
SQL RDD¶
SQLExecutionRDD is given an RDD[InternalRow] when created.
The RDD[InternalRow] is the executedPlan requested to execute.
sqlConfigs¶
SQLExecutionRDD requests the given SQLConf for all the configuration properties that have been set when created.
Lazy Value
sqlConfigs is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
Computing Partition¶
compute(
split: Partition,
context: TaskContext): Iterator[InternalRow]
compute looks up the spark.sql.execution.id local property in the given TaskContext (Apache Spark).
If not defined (null), compute sets the sqlConfigs as thread-local properties to requests the sqlRDD for iterator (execute the sqlRDD). Otherwise, if in the context of a tracked SQL operation (and the spark.sql.execution.id is defined), compute simply requests the parent sqlRDD for iterator.
compute is part of the RDD (Apache Spark) abstraction.