SubqueryExec Unary Physical Operator¶
SubqueryExec
is a unary physical operator.
SubqueryExec
uses <SubqueryExec
is first requested to <SubqueryExec
waiting indefinitely for the child operator to be finished).
CAUTION: FIXME When is doPrepare
executed?
SubqueryExec
is <ScalarSubquery
expressions in a physical plan).
[source, scala]¶
val q = sql("select (select max(id) from t1) tt from t1") scala> q.explain == Physical Plan == *Project [Subquery subquery32 AS tt#33L] : +- Subquery subquery32 : +- *HashAggregate(keys=[], functions=[max(id#20L)]) : +- Exchange SinglePartition : +- *HashAggregate(keys=[], functions=[partial_max(id#20L)]) : +- *FileScan parquet default.t1[id#20L] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/Users/jacek/dev/oss/spark/spark-warehouse/t1], PartitionFilters: [], PushedFilters: [], ReadSchema: struct
NOTE: SubqueryExec
physical operator is almost an exact copy of BroadcastExchangeExec.md[BroadcastExchangeExec] physical operator.
=== [[doPrepare]] Executing Child Operator Asynchronously -- doPrepare
Method
[source, scala]¶
doPrepare(): Unit¶
NOTE: doPrepare
is part of SparkPlan.md#doPrepare[SparkPlan Contract] to prepare a physical operator for execution.
doPrepare
simply triggers initialization of the internal lazily-once-initialized <
=== [[relationFuture]] relationFuture
Internal Lazily-Once-Initialized Property
[source, scala]¶
relationFuture: Future[Array[InternalRow]]¶
When "materialized" (aka executed), relationFuture
spawns a new thread of execution that requests SQLExecution
to execute an action (with the current execution id) on subquery <
NOTE: relationFuture
uses Scala's https://docs.scala-lang.org/overviews/core/futures.html[scala.concurrent.Future] that spawns a new thread of execution once instantiated.
The action tracks execution of the <
In the end, relationFuture
posts metric updates and returns the internal rows.
[[executionContext]] NOTE: relationFuture
is executed on a separate thread from a custom https://www.scala-lang.org/api/2.11.8/index.html#scala.concurrent.ExecutionContext[scala.concurrent.ExecutionContext] (built from a cached https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ThreadPoolExecutor.html[java.util.concurrent.ThreadPoolExecutor] with the prefix subquery and up to 16 threads).
NOTE: relationFuture
is used when SubqueryExec
is requested to <
=== [[creating-instance]] Creating SubqueryExec Instance
SubqueryExec
takes the following when created:
- [[name]] Name of the subquery
- [[child]] Child SparkPlan.md[physical plan]
=== [[executeCollect]] Collecting Internal Rows of Executing SubqueryExec Operator -- executeCollect
Method
[source, scala]¶
executeCollect(): Array[InternalRow]¶
NOTE: executeCollect
is part of SparkPlan.md#executeCollect[SparkPlan Contract] to execute a physical operator and collect the results as collection of internal rows.
executeCollect
waits till <Array[InternalRow]
).
Performance Metrics¶
Key | Name (in web UI) | Description |
---|---|---|
collectTime | time to collect (ms) | |
dataSize | data size (bytes) |