Skip to content

PushPredicateThroughJoin Logical Optimization

PushPredicateThroughJoin is a catalyst/[Catalyst rule] for transforming[logical plans] (i.e. Rule[LogicalPlan]).

When <>, PushPredicateThroughJoin...FIXME

PushPredicateThroughJoin is a part of the Operator Optimization before Inferring Filters and Operator Optimization after Inferring Filters fixed-point rule batches of the base Logical Optimizer.

[[demo]] .Demo: PushPredicateThroughJoin

import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.dsl.plans._

// Using hacks to disable two Catalyst DSL implicits
implicit def symbolToColumn(ack: ThatWasABadIdea) = ack
implicit class StringToColumn(val sc: StringContext) {}

import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
val t1 = LocalRelation(', '
val t2 = LocalRelation(', ''C > 10)

val plan = t1.join(t2)...FIXME

import org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin
val optimizedPlan = PushPredicateThroughJoin(plan)
scala> println(optimizedPlan.numberedTreeString)

=== [[apply]] Executing Rule -- apply Method

[source, scala]

apply( plan: LogicalPlan): LogicalPlan


apply is part of the Rule abstraction.

=== [[split]] split Internal Method

[source, scala]

split( condition: Seq[Expression], left: LogicalPlan, right: LogicalPlan): (Seq[Expression], Seq[Expression], Seq[Expression])

split splits (partitions) the given condition expressions into expressions/[deterministic] or not.

split further splits (partitions) the deterministic expressions (pushDownCandidates) into expressions that reference the catalyst/[output expressions] of the left logical operator (leftEvaluateCondition) or not (rest).

split further splits (partitions) the expressions that do not reference left output expressions into expressions that reference the catalyst/[output expressions] of the right logical operator (rightEvaluateCondition) or not (commonCondition).

In the end, split returns the leftEvaluateCondition, rightEvaluateCondition, and commonCondition with the non-deterministic condition expressions.

NOTE: split is used when PushPredicateThroughJoin is <>.