HashJoin — Hash-Based Join Physical Operators¶
HashJoin
is an extension of the JoinCodegenSupport abstraction for hash-based join physical operators with support for Java code generation.
Contract¶
BuildSide¶
buildSide: BuildSide
Preparing HashedRelation¶
prepareRelation(
ctx: CodegenContext): HashedRelationInfo
Used when:
HashJoin
is requested to codegenInner, codegenOuter, codegenSemi, codegenAnti, codegenExistence
Implementations¶
Build Keys¶
buildKeys: Seq[Attribute]
Lazy Value
buildKeys
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
buildKeys
is the left keys for the BuildSide as BuildLeft
while the right keys as BuildRight
.
Important
HashJoin
assumes that the number of the left and the right keys are the same and are of the same types (position-wise).
Streamed Keys¶
streamedKeys: Seq[Attribute]
Lazy Value
streamedKeys
is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
streamedKeys
is the opposite of the build keys.
streamedKeys
is the right keys for the BuildSide as BuildLeft
while the left keys as BuildRight
.
Output Attributes¶
output: Seq[Attribute]
output
is a collection of Attributes based on the joinType.
joinType | Output Schema |
---|---|
InnerLike | output of the left followed by the right operator's |
LeftOuter | output of the left followed by the right operator's (with nullability on) |
RightOuter | output of the left (with nullability on) followed by the right operator's |
ExistenceJoin | output of the left followed by the exists attribute |
LeftSemi | output of the left operator |
LeftAnti | output of the left operator |
output
is part of the QueryPlan abstraction.
CodegenSupport — Generating Java Source Code¶
HashJoin
is a CodegenSupport (indirectly as a JoinCodegenSupport) and supports generating Java source code for consume and produce execution paths.
Consume Path¶
doConsume(
ctx: CodegenContext,
input: Seq[ExprCode],
row: ExprCode): String
doConsume
generates a Java source code for "consume" execution path based on the given join type.
joinType | doConsume |
---|---|
InnerLike | codegenInner |
LeftOuter | codegenOuter |
RightOuter | codegenOuter |
LeftSemi | codegenSemi |
LeftAnti | codegenAnti |
ExistenceJoin | codegenExistence |
doConsume
is part of the CodegenSupport abstraction.
Anti Join¶
codegenAnti(
ctx: CodegenContext,
input: Seq[ExprCode]): String
codegenAnti
...FIXME
codegenAnti
is used when:
BroadcastHashJoinExec
physical operator is requested to codegenAnti (with the isNullAwareAntiJoin flag off)HashJoin
is requested to doConsume
Inner Join¶
codegenInner(
ctx: CodegenContext,
input: Seq[ExprCode]): String
codegenInner
prepares a HashedRelation (with the given CodegenContext).
Note
Preparing a HashedRelation is implementation-specific.
codegenInner
genStreamSideJoinKey and getJoinCondition.
For isEmptyHashedRelation
, codegenInner
returns the following text (which is simply a comment with no executable code):
// If HashedRelation is empty, hash inner join simply returns nothing.
For keyIsUnique
, codegenInner
returns the following code:
// generate join key for stream side
[keyEv.code]
// find matches from HashedRelation
UnsafeRow [matched] = [anyNull] ? null: (UnsafeRow)[relationTerm].getValue([keyEv.value]);
if ([matched] != null) {
[checkCondition] {
[numOutput].add(1);
[consume(ctx, resultVars)]
}
}
For all other cases, codegenInner
returns the following code:
// generate join key for stream side
[keyEv.code]
// find matches from HashRelation
Iterator[UnsafeRow] [matches] = [anyNull] ?
null : (Iterator[UnsafeRow])[relationTerm].get([keyEv.value]);
if ([matches] != null) {
while ([matches].hasNext()) {
UnsafeRow [matched] = (UnsafeRow) [matches].next();
[checkCondition] {
[numOutput].add(1);
[consume(ctx, resultVars)]
}
}
}
Produce Path¶
doProduce(
ctx: CodegenContext): String
doProduce
assumes that the streamedPlan is a CodegenSupport and requests it to generate a Java source code for "produce" execution path.
doProduce
is part of the CodegenSupport abstraction.
join¶
join(
streamedIter: Iterator[InternalRow],
hashed: HashedRelation,
numOutputRows: SQLMetric): Iterator[InternalRow]
join
branches off per JoinType to create an joined rows iterator (off the rows from the input streamedIter
and hashed
):
-
outerJoin for a LeftOuter or a RightOuter join
-
existenceJoin for a ExistenceJoin join
join
creates a result projection.
In the end, for every row in the joined rows iterator join
increments the input numOutputRows
SQL metric and applies the result projection.
join
reports an IllegalArgumentException
for unsupported JoinType:
HashJoin should not take [joinType] as the JoinType
join
is used when:
- BroadcastHashJoinExec and ShuffledHashJoinExec physical operators are executed