ExplainUtils¶

ExplainUtils is a utility to process a query plan (when QueryExecution is requested for a simple (basic) text representation for formatted explain mode).

Demo¶

val q = spark.range(5).join(spark.range(10), Seq("id"), "inner")

scala> q.explain(mode = "formatted")
== Physical Plan ==
AdaptiveSparkPlan (6)
+- Project (5)
   +- BroadcastHashJoin Inner BuildLeft (4)
      :- BroadcastExchange (2)
      :  +- Range (1)
      +- Range (3)


(1) Range
Output [1]: [id#0L]
Arguments: Range (0, 5, step=1, splits=Some(16))

(2) BroadcastExchange
Input [1]: [id#0L]
Arguments: HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [id=#16]

(3) Range
Output [1]: [id#2L]
Arguments: Range (0, 10, step=1, splits=Some(16))

(4) BroadcastHashJoin
Left keys [1]: [id#0L]
Right keys [1]: [id#2L]
Join condition: None

(5) Project
Output [1]: [id#0L]
Input [2]: [id#0L, id#2L]

(6) AdaptiveSparkPlan
Output [1]: [id#0L]
Arguments: isFinalPlan=false

Note that the AdaptiveSparkPlan physical operator has isFinalPlan flag false (and you can see part of the final output).

Execute Adaptive Query Execution optimization.

q.take(0)

The isFinalPlan flag should now be true.

scala> q.explain(mode = "formatted")
== Physical Plan ==
AdaptiveSparkPlan (10)
+- == Final Plan ==
   * Project (6)
   +- * BroadcastHashJoin Inner BuildLeft (5)
      :- BroadcastQueryStage (3)
      :  +- BroadcastExchange (2)
      :     +- * Range (1)
      +- * Range (4)
+- == Initial Plan ==
   Project (9)
   +- BroadcastHashJoin Inner BuildLeft (8)
      :- BroadcastExchange (7)
      :  +- Range (1)
      +- Range (4)


(1) Range [codegen id : 1]
Output [1]: [id#0L]
Arguments: Range (0, 5, step=1, splits=Some(16))

(2) BroadcastExchange
Input [1]: [id#0L]
Arguments: HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [id=#72]

(3) BroadcastQueryStage
Output [1]: [id#0L]
Arguments: 0

(4) Range
Output [1]: [id#2L]
Arguments: Range (0, 10, step=1, splits=Some(16))

(5) BroadcastHashJoin [codegen id : 2]
Left keys [1]: [id#0L]
Right keys [1]: [id#2L]
Join condition: None

(6) Project [codegen id : 2]
Output [1]: [id#0L]
Input [2]: [id#0L, id#2L]

(7) BroadcastExchange
Input [1]: [id#0L]
Arguments: HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [id=#16]

(8) BroadcastHashJoin
Left keys [1]: [id#0L]
Right keys [1]: [id#2L]
Join condition: None

(9) Project
Output [1]: [id#0L]
Input [2]: [id#0L, id#2L]

(10) AdaptiveSparkPlan
Output [1]: [id#0L]
Arguments: isFinalPlan=true

Processing Query Plan¶

processPlan[T <: QueryPlan[T]](
  plan: T,
  append: String => Unit): Unit

processPlan...FIXME

processPlan is used when:

QueryExecution is requested to simpleString

processPlanSkippingSubqueries¶

processPlanSkippingSubqueries[T <: QueryPlan[T]](
  plan: T,
  append: String => Unit,
  collectedOperators: BitSet): Unit

processPlanSkippingSubqueries...FIXME

collectOperatorsWithID¶

collectOperatorsWithID(
  plan: QueryPlan[_],
  operators: ArrayBuffer[QueryPlan[_]],
  collectedOperators: BitSet): Unit

collectOperatorsWithID...FIXME

removeTags¶

removeTags(
  plan: QueryPlan[_]): Unit

removeTags...FIXME

generateOperatorIDs¶

generateOperatorIDs(
  plan: QueryPlan[_],
  startOperatorID: Int): Int

generateOperatorIDs...FIXME