Skip to content

ExtractJoinWithBuckets Scala Extractor

Destructuring BaseJoinExec

  plan: SparkPlan): Option[(BaseJoinExec, Int, Int)]

unapply makes sure that the given SparkPlan is a BaseJoinExec and applicable.

If so, unapply getBucketSpec for the left and right join child operators.


unapply is used when:


  j: BaseJoinExec): Boolean

isApplicable is true when the following all hold:

  1. The given BaseJoinExec physical operator is either a SortMergeJoinExec or a ShuffledHashJoinExec

  2. The left side of the join has a FileSourceScanExec operator

  3. The right side of the join has a FileSourceScanExec operator

  4. satisfiesOutputPartitioning on the leftKeys and the outputPartitioning of the left join operator

  5. satisfiesOutputPartitioning on the rightKeys and the outputPartitioning of the right join operator


  plan: SparkPlan): Boolean

hasScanOperation holds true for SparkPlan physical operators that are FileSourceScanExecs (possibly as the children of FilterExecs and ProjectExecs).


  keys: Seq[Expression],
  partitioning: Partitioning): Boolean

satisfiesOutputPartitioning holds true for HashPartitioning partitionings that match the given join keys (their number and equivalence).

Bucket Spec of FileSourceScanExec Operator

  plan: SparkPlan): Option[BucketSpec]

getBucketSpec finds the FileSourceScanExec operator (in the given SparkPlan) with a non-empty bucket spec but an empty optionalNumCoalescedBuckets. When found, getBucketSpec returns the non-empty bucket spec.