Skip to content

Streaming Join

In Spark Structured Streaming, a streaming join is an operator in a streaming query that is described (built) using the high-level streaming operators:

Streaming joins can be stateless or stateful:

Stream-Stream Joins

Spark Structured Streaming supports stream-stream joins with the following:

Stream-stream equi-joins are planned as <> physical operators of two ShuffleExchangeExec physical operators (per <>).

Join State Watermark

Stream-stream joins may optionally define Join State Watermark for state removal (cf. <>).

A join state watermark can be specified on the following:

. <> (key state)

. <> (value state)

A join state watermark can be specified on key state, value state or both.

IncrementalExecution

Under the covers, the <> create a logical query plan with one or more Join logical operators.

Spark Structured Streaming uses IncrementalExecution for planning streaming queries for execution.

At query planning, IncrementalExecution uses the StreamingJoinStrategy execution planning strategy for planning stream-stream joins as StreamingSymmetricHashJoinExec physical operators.

Demos

Further Reading Or Watching