InSubquery Expression¶
InSubquery is a Predicate that represents the following IN SQL predicate in a logical query plan:
NOT? IN '(' query ')'
InSubquery can also be used internally for other use cases (e.g., Runtime Filtering, Dynamic Partition Pruning).
InSubquery is an Unevaluable expression.
Creating Instance¶
InSubquery takes the following to be created:
- Values (Expressions)
- ListQuery
InSubquery is created when:
- InjectRuntimeFilter logical optimization is executed (and injectInSubqueryFilter)
AstBuilderis requested to withPredicate (forNOT? IN '(' query ')'SQL predicate)- PlanDynamicPruningFilters physical optimization is executed (with spark.sql.optimizer.dynamicPartitionPruning.enabled enabled)
RowLevelOperationRuntimeGroupFilteringlogical optimization is executed
Unevaluable¶
InSubquery is an Unevaluable expression.
InSubquery can be converted to a Join operator at logical optimization using RewritePredicateSubquery:
- Left-Semi Join unless it is a
NOT INthat becomes a Left-Anti Join (among the other less important cases)
InSubquery can also be converted to InSubqueryExec expression (over a SubqueryExec) in PlanSubqueries physical optimization.
Logical Analysis¶
InSubquery is resolved using the following logical analysis rules:
- ResolveSubquery
InConversion
Logical Optimization¶
InSubquery is optimized using the following logical optimizations:
- NullPropagation (so
nullvalues givenullresults) - InjectRuntimeFilter
- RewritePredicateSubquery
Physical Optimization¶
InSubquery is optimized using the following physical optimizations:
Catalyst DSL¶
InSubquery can be created using in operator using Catalyst DSL (via ImplicitOperators).
nodePatterns¶
nodePatterns is IN_SUBQUERY.