EliminateResolvedHint Logical Optimization¶
EliminateResolvedHint
is a default logical optimization.
Non-Excludable Rule¶
EliminateResolvedHint
is a non-excludable rule.
Executing Rule¶
apply(
plan: LogicalPlan): LogicalPlan
apply
is part of the Rule abstraction.
apply
transforms Join logical operators with no hints defined in the given LogicalPlan:
-
Extracts hints from the left and right sides of the join (that gives new operators and JoinHints for either side)
-
Creates a new JoinHint with the hints merged for the left and right sides
-
Creates a new Join logical operator with the new left and right operators and the new
JoinHint
In the end, apply
finds ResolvedHints and, if found, requests the HintErrorHandler to joinNotFoundForJoinHint and ignores the hint (returns the child of the ResolvedHint
).
HintErrorHandler¶
hintErrorHandler: HintErrorHandler
hintErrorHandler
is the default HintErrorHandler.
Extracting Hints from Logical Plan¶
extractHintsFromPlan(
plan: LogicalPlan): (LogicalPlan, Seq[HintInfo])
extractHintsFromPlan
collects (extracts) HintInfos from the ResolvedHint unary logical operators in the given LogicalPlan and gives:
- HintInfos
- Transformed plan with ResolvedHint nodes removed
While collecting, extractHintsFromPlan
removes the ResolvedHint unary logical operators.
Note
It is possible (yet still unclear) that some ResolvedHint
s won't get extracted.
extractHintsFromPlan
is used when:
EliminateResolvedHint
is requested to executeCacheManager
is requested to useCachedData
Merging Hints¶
mergeHints(
hints: Seq[HintInfo]): Option[HintInfo]
mergeHints
...FIXME
Demo¶
Logical Query Plan¶
Create a logical plan using Catalyst DSL.
import org.apache.spark.sql.catalyst.dsl.plans._
import org.apache.spark.sql.catalyst.plans.logical.{SHUFFLE_HASH, SHUFFLE_MERGE}
import org.apache.spark.sql.catalyst.plans.logical.LocalRelation
val t1 = LocalRelation('id.long, 'name.string).hint(SHUFFLE_HASH.displayName)
val t2 = LocalRelation('id.long, 'age.int).hint(SHUFFLE_MERGE.displayName)
val logical = t1.join(t2)
scala> println(logical.numberedTreeString)
00 'Join Inner
01 :- 'UnresolvedHint shuffle_hash
02 : +- LocalRelation <empty>, [id#0L, name#1]
03 +- 'UnresolvedHint merge
04 +- LocalRelation <empty>, [id#2L, age#3]
Analyze Plan¶
val analyzed = logical.analyze
scala> println(analyzed.numberedTreeString)
00 Join Inner
01 :- ResolvedHint (strategy=shuffle_hash)
02 : +- LocalRelation <empty>, [id#0L, name#1]
03 +- ResolvedHint (strategy=merge)
04 +- LocalRelation <empty>, [id#2L, age#3]
Optimize Plan¶
Optimize the plan (using EliminateResolvedHint
only).
import org.apache.spark.sql.catalyst.optimizer.EliminateResolvedHint
val optimizedPlan = EliminateResolvedHint(analyzed)
scala> println(optimizedPlan.numberedTreeString)
00 Join Inner, leftHint=(strategy=shuffle_hash), rightHint=(strategy=merge)
01 :- LocalRelation <empty>, [id#0L, name#1]
02 +- LocalRelation <empty>, [id#2L, age#3]