ExtractSingleColumnNullAwareAntiJoin Scala Extractor¶
ExtractSingleColumnNullAwareAntiJoin
is a Scala extractor to destructure a Join logical operator into a tuple of the following elements:
- Streamed Side Keys (Catalyst expressions)
- Build Side Keys (Catalyst expressions)
ExtractSingleColumnNullAwareAntiJoin
is used to support single-column NULL-aware anti joins (described in the VLDB paper) that will almost certainly be planned as a very time-consuming Broadcast Nested Loop join (O(M*N)
calculation). If it's a single column case this expensive calculation could be optimized into O(M)
using hash lookup instead of loop lookup. Refer to SPARK-32290.
spark.sql.optimizeNullAwareAntiJoin¶
ExtractSingleColumnNullAwareAntiJoin
uses the spark.sql.optimizeNullAwareAntiJoin configuration property.
Destructuring Join Logical Plan¶
type ReturnType =
// streamedSideKeys, buildSideKeys
(Seq[Expression], Seq[Expression])
unapply(
join: Join): Option[ReturnType]
unapply
matches Join logical operators with LeftAnti join type and the following condition:
Or(EqualTo(a=b), IsNull(EqualTo(a=b)))
unapply
...FIXME
unapply
is used when: