Skip to content

ExtractSingleColumnNullAwareAntiJoin Scala Extractor

ExtractSingleColumnNullAwareAntiJoin is a Scala extractor to destructure a Join logical operator into a tuple of the following elements:

  1. Streamed Side Keys (Catalyst expressions)
  2. Build Side Keys (Catalyst expressions)

ExtractSingleColumnNullAwareAntiJoin is used to support single-column NULL-aware anti joins (described in the VLDB paper) that will almost certainly be planned as a very time-consuming Broadcast Nested Loop join (O(M*N) calculation). If it's a single column case this expensive calculation could be optimized into O(M) using hash lookup instead of loop lookup. Refer to SPARK-32290.

spark.sql.optimizeNullAwareAntiJoin

ExtractSingleColumnNullAwareAntiJoin uses the spark.sql.optimizeNullAwareAntiJoin configuration property.

Destructuring Join Logical Plan

type ReturnType =
// streamedSideKeys, buildSideKeys
  (Seq[Expression], Seq[Expression])

unapply(
  join: Join): Option[ReturnType]

unapply matches Join logical operators with LeftAnti join type and the following condition:

Or(EqualTo(a=b), IsNull(EqualTo(a=b)))

unapply...FIXME

unapply is used when:

  • AQEPropagateEmptyRelation adaptive logical optimization is executed
  • JoinSelection execution planning strategy is executed
  • LogicalQueryStageStrategy execution planning strategy is executed