OptimizeIn Logical Optimization¶
OptimizeIn
is a base logical optimization that <
-
Replaces an
In
expression that has an empty list and the value expression not nullable tofalse
-
Eliminates duplicates of Literal expressions in an In predicate expression that is inSetConvertible
-
Replaces an
In
predicate expression that is inSetConvertible with InSet expressions when the number of literal expressions in the list expression is greater than spark.sql.optimizer.inSetConversionThreshold internal configuration property
OptimizeIn
is part of the Operator Optimization before Inferring Filters fixed-point batch in the standard batches of the Logical Optimizer.
OptimizeIn
is simply a Catalyst rule for transforming logical plans, i.e. Rule[LogicalPlan]
.
// Use Catalyst DSL to define a logical plan
// HACK: Disable symbolToColumn implicit conversion
// It is imported automatically in spark-shell (and makes demos impossible)
// implicit def symbolToColumn(s: Symbol): org.apache.spark.sql.ColumnName
trait ThatWasABadIdea
implicit def symbolToColumn(ack: ThatWasABadIdea) = ack
import org.apache.spark.sql.catalyst.dsl.expressions._
import org.apache.spark.sql.catalyst.dsl.plans._
import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
val rel = LocalRelation('a.int, 'b.int, 'c.int)
import org.apache.spark.sql.catalyst.expressions.{In, Literal}
val plan = rel
.where(In('a, Seq[Literal](1, 2, 3)))
.analyze
scala> println(plan.numberedTreeString)
00 Filter a#6 IN (1,2,3)
01 +- LocalRelation <empty>, [a#6, b#7, c#8]
// In --> InSet
spark.conf.set("spark.sql.optimizer.inSetConversionThreshold", 0)
import org.apache.spark.sql.catalyst.optimizer.OptimizeIn
val optimizedPlan = OptimizeIn(plan)
scala> println(optimizedPlan.numberedTreeString)
00 Filter a#6 INSET (1,2,3)
01 +- LocalRelation <empty>, [a#6, b#7, c#8]
Executing Rule¶
apply(plan: LogicalPlan): LogicalPlan
apply
...FIXME
apply
is part of the Rule abstraction.