ExtractSingleColumnNullAwareAntiJoin Scala Extractor¶
ExtractSingleColumnNullAwareAntiJoin is a Scala extractor to destructure a Join logical operator into a tuple of the following elements:
- Streamed Side Keys (Catalyst expressions)
- Build Side Keys (Catalyst expressions)
ExtractSingleColumnNullAwareAntiJoin is used to support single-column NULL-aware anti joins (described in the VLDB paper) that will almost certainly be planned as a very time-consuming Broadcast Nested Loop join (O(M*N) calculation). If it's a single column case this expensive calculation could be optimized into O(M) using hash lookup instead of loop lookup. Refer to SPARK-32290.
spark.sql.optimizeNullAwareAntiJoin¶
ExtractSingleColumnNullAwareAntiJoin uses the spark.sql.optimizeNullAwareAntiJoin configuration property.
Destructuring Join Logical Plan¶
type ReturnType =
// streamedSideKeys, buildSideKeys
(Seq[Expression], Seq[Expression])
unapply(
join: Join): Option[ReturnType]
unapply matches Join logical operators with LeftAnti join type and the following condition:
Or(EqualTo(a=b), IsNull(EqualTo(a=b)))
unapply...FIXME
unapply is used when: