HashJoin — Hash-Based Join Physical Operators¶
HashJoin is an extension of the JoinCodegenSupport abstraction for hash-based join physical operators with support for Java code generation.
Contract¶
BuildSide¶
buildSide: BuildSide
Preparing HashedRelation¶
prepareRelation(
ctx: CodegenContext): HashedRelationInfo
Used when:
HashJoinis requested to codegenInner, codegenOuter, codegenSemi, codegenAnti, codegenExistence
Implementations¶
Build Keys¶
buildKeys: Seq[Attribute]
Lazy Value
buildKeys is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
buildKeys is the left keys for the BuildSide as BuildLeft while the right keys as BuildRight.
Important
HashJoin assumes that the number of the left and the right keys are the same and are of the same types (position-wise).
Streamed Keys¶
streamedKeys: Seq[Attribute]
Lazy Value
streamedKeys is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
streamedKeys is the opposite of the build keys.
streamedKeys is the right keys for the BuildSide as BuildLeft while the left keys as BuildRight.
Output Attributes¶
output: Seq[Attribute]
output is a collection of Attributes based on the joinType.
| joinType | Output Schema |
|---|---|
InnerLike | output of the left followed by the right operator's |
LeftOuter | output of the left followed by the right operator's (with nullability on) |
RightOuter | output of the left (with nullability on) followed by the right operator's |
ExistenceJoin | output of the left followed by the exists attribute |
LeftSemi | output of the left operator |
LeftAnti | output of the left operator |
output is part of the QueryPlan abstraction.
CodegenSupport — Generating Java Source Code¶
HashJoin is a CodegenSupport (indirectly as a JoinCodegenSupport) and supports generating Java source code for consume and produce execution paths.
Consume Path¶
doConsume(
ctx: CodegenContext,
input: Seq[ExprCode],
row: ExprCode): String
doConsume generates a Java source code for "consume" execution path based on the given join type.
| joinType | doConsume |
|---|---|
InnerLike | codegenInner |
LeftOuter | codegenOuter |
RightOuter | codegenOuter |
LeftSemi | codegenSemi |
LeftAnti | codegenAnti |
ExistenceJoin | codegenExistence |
doConsume is part of the CodegenSupport abstraction.
Anti Join¶
codegenAnti(
ctx: CodegenContext,
input: Seq[ExprCode]): String
codegenAnti...FIXME
codegenAnti is used when:
BroadcastHashJoinExecphysical operator is requested to codegenAnti (with the isNullAwareAntiJoin flag off)HashJoinis requested to doConsume
Inner Join¶
codegenInner(
ctx: CodegenContext,
input: Seq[ExprCode]): String
codegenInner prepares a HashedRelation (with the given CodegenContext).
Note
Preparing a HashedRelation is implementation-specific.
codegenInner genStreamSideJoinKey and getJoinCondition.
For isEmptyHashedRelation, codegenInner returns the following text (which is simply a comment with no executable code):
// If HashedRelation is empty, hash inner join simply returns nothing.
For keyIsUnique, codegenInner returns the following code:
// generate join key for stream side
[keyEv.code]
// find matches from HashedRelation
UnsafeRow [matched] = [anyNull] ? null: (UnsafeRow)[relationTerm].getValue([keyEv.value]);
if ([matched] != null) {
[checkCondition] {
[numOutput].add(1);
[consume(ctx, resultVars)]
}
}
For all other cases, codegenInner returns the following code:
// generate join key for stream side
[keyEv.code]
// find matches from HashRelation
Iterator[UnsafeRow] [matches] = [anyNull] ?
null : (Iterator[UnsafeRow])[relationTerm].get([keyEv.value]);
if ([matches] != null) {
while ([matches].hasNext()) {
UnsafeRow [matched] = (UnsafeRow) [matches].next();
[checkCondition] {
[numOutput].add(1);
[consume(ctx, resultVars)]
}
}
}
Produce Path¶
doProduce(
ctx: CodegenContext): String
doProduce assumes that the streamedPlan is a CodegenSupport and requests it to generate a Java source code for "produce" execution path.
doProduce is part of the CodegenSupport abstraction.
join¶
join(
streamedIter: Iterator[InternalRow],
hashed: HashedRelation,
numOutputRows: SQLMetric): Iterator[InternalRow]
join branches off per JoinType to create an joined rows iterator (off the rows from the input streamedIter and hashed):
-
outerJoin for a LeftOuter or a RightOuter join
-
existenceJoin for a ExistenceJoin join
join creates a result projection.
In the end, for every row in the joined rows iterator join increments the input numOutputRows SQL metric and applies the result projection.
join reports an IllegalArgumentException for unsupported JoinType:
HashJoin should not take [joinType] as the JoinType
join is used when:
- BroadcastHashJoinExec and ShuffledHashJoinExec physical operators are executed