SQLExecutionRDD¶

SQLExecutionRDD is an RDD[InternalRow] to wrap the parent RDD and make sure that the SQL configuration properties are always propagated to executors (even when rdd or QueryExecution.toRdd are used).

Tip

Review SPARK-28939 to learn when and why SQLExecutionRDD would be used outside a tracked SQL operation (and with no spark.sql.execution.id defined).

Creating Instance¶

SQLExecutionRDD takes the following to be created:

RDD[InternalRow]
SQLConf

While being created, SQLExecutionRDD initializes a sqlConfigs internal registry.

SQLExecutionRDD is created when:

QueryExecution is requested to toRdd

SQL RDD¶

SQLExecutionRDD is given an RDD[InternalRow] when created.

The RDD[InternalRow] is the executedPlan requested to execute.

sqlConfigs¶

SQLExecutionRDD requests the given SQLConf for all the configuration properties that have been set when created.

Lazy Value

sqlConfigs is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.

Learn more in the Scala Language Specification.

Computing Partition¶

compute(
  split: Partition,
  context: TaskContext): Iterator[InternalRow]

compute looks up the spark.sql.execution.id local property in the given TaskContext (Apache Spark).

If not defined (null), compute sets the sqlConfigs as thread-local properties to requests the sqlRDD for iterator (execute the sqlRDD). Otherwise, if in the context of a tracked SQL operation (and the spark.sql.execution.id is defined), compute simply requests the parent sqlRDD for iterator.

compute is part of the RDD (Apache Spark) abstraction.