RewritePredicateSubquery Logical Optimization¶
RewritePredicateSubquery
is a base logical optimization that <
-
Filter
operators withExists
andIn
withListQuery
expressions give left-semi joins -
Filter
operators withNot
withExists
andIn
withListQuery
expressions give left-anti joins
NOTE: Prefer EXISTS
(over Not
with In
with ListQuery
subquery expression) if performance matters since https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala?utf8=%E2%9C%93#L110[they say] "that will almost certainly be planned as a Broadcast Nested Loop join".
RewritePredicateSubquery
is part of the RewriteSubquery once-executed batch in the standard batches of the Logical Optimizer.
RewritePredicateSubquery
is simply a <Rule[LogicalPlan]
.
[source, scala]¶
// FIXME Examples of RewritePredicateSubquery // 1. Filters with Exists and In (with ListQuery) expressions // 2. NOTs
// Based on RewriteSubquerySuite // FIXME Contribute back to RewriteSubquerySuite import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan import org.apache.spark.sql.catalyst.rules.RuleExecutor object Optimize extends RuleExecutor[LogicalPlan] { import org.apache.spark.sql.catalyst.optimizer._ val batches = Seq( Batch("Column Pruning", FixedPoint(100), ColumnPruning), Batch("Rewrite Subquery", Once, RewritePredicateSubquery, ColumnPruning, CollapseProject, RemoveRedundantProject)) }
val q = ... val optimized = Optimize.execute(q.analyze)
RewritePredicateSubquery
is part of the RewriteSubquery once-executed batch in the standard batches of the Logical Optimizer.
=== [[rewriteExistentialExpr]] rewriteExistentialExpr
Internal Method
[source, scala]¶
rewriteExistentialExpr( exprs: Seq[Expression], plan: LogicalPlan): (Option[Expression], LogicalPlan)
rewriteExistentialExpr
...FIXME
NOTE: rewriteExistentialExpr
is used when...FIXME
=== [[dedupJoin]] dedupJoin
Internal Method
[source, scala]¶
dedupJoin(joinPlan: LogicalPlan): LogicalPlan¶
dedupJoin
...FIXME
NOTE: dedupJoin
is used when...FIXME
=== [[getValueExpression]] getValueExpression
Internal Method
[source, scala]¶
getValueExpression(e: Expression): Seq[Expression]¶
getValueExpression
...FIXME
NOTE: getValueExpression
is used when...FIXME
=== [[apply]] Executing Rule -- apply
Method
[source, scala]¶
apply(plan: LogicalPlan): LogicalPlan¶
apply
transforms Filter.md[Filter] unary operators in the input spark-sql-LogicalPlan.md[logical plan].
apply
splits conjunctive predicates in the Filter.md#condition[condition expression] (i.e. expressions separated by And
expression) and then partitions them into two collections of expressions spark-sql-Expression-SubqueryExpression.md#hasInOrExistsSubquery[with and without In or Exists subquery expressions].
apply
creates a Filter.md#creating-instance[Filter] operator for condition (sub)expressions without subqueries (combined with And
expression) if available or takes the Filter.md#child[child] operator (of the input Filter
unary operator).
In the end, apply
creates a new logical plan with Join.md[Join] operators for spark-sql-Expression-Exists.md[Exists] and spark-sql-Expression-In.md[In] expressions (and their negations) as follows:
-
For spark-sql-Expression-Exists.md[Exists] predicate expressions,
apply
<> and creates a Join.md#creating-instance[Join] operator with LeftSemi join type. In the end, apply
<> -
For
Not
expressions with a spark-sql-Expression-Exists.md[Exists] predicate expression,apply
<> and creates a Join.md#creating-instance[Join] operator with LeftAnti join type. In the end, apply
<> -
For spark-sql-Expression-In.md[In] predicate expressions with a spark-sql-Expression-ListQuery.md[ListQuery] subquery expression,
apply
<> followed by < > and creates a Join.md#creating-instance[Join] operator with LeftSemi join type. In the end, apply
<> -
For
Not
expressions with a spark-sql-Expression-In.md[In] predicate expression with a spark-sql-Expression-ListQuery.md[ListQuery] subquery expression,apply
<>, < > followed by splitting conjunctive predicates and creates a Join.md#creating-instance[Join] operator with LeftAnti join type. In the end, apply
<> -
For other predicate expressions,
apply
<> and creates a Project.md#creating-instance[Project] unary operator with a Filter.md#creating-instance[Filter] operator
apply
is part of the Rule abstraction.