SparkStructuredStreamingRunner
SparkStructuredStreamingRunner
is a PipelineRunner that produces SparkStructuredStreamingPipelineResult
.
SparkStructuredStreamingRunner can be configured using [SparkStructuredStreamingPipelineOptions](#SparkStructuredStreamingPipelineOptions).
import org.apache.beam.sdk.options.PipelineOptionsFactory
import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingPipelineOptions
val opts = PipelineOptionsFactory.as(classOf[SparkStructuredStreamingPipelineOptions])
import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner
opts.setRunner(classOf[SparkStructuredStreamingRunner])
import org.apache.beam.sdk.Pipeline
val p = Pipeline.create(opts)
import org.apache.beam.sdk.io.TextIO
val linesIn = TextIO.read().from("*.txt")
val lines = p.apply(linesIn)
val countsOut = TextIO.write().to("counts.txt")
lines.apply(countsOut)
val r = p.run()
r.waitUntilFinish
import org.apache.beam.sdk.PipelineResult
assert(r.getState == PipelineResult.State.DONE)
NOTE FIXME How to configure SparkStructuredStreamingRunner logger?
SparkStructuredStreamingRunner uses [SparkStructuredStreamingPipelineOptions](#SparkStructuredStreamingPipelineOptions).
import org.apache.beam.runners.spark.structuredstreaming.SparkStructuredStreamingRunner
val runner = SparkStructuredStreamingRunner.create
{blurb, class: information} TODO Review SparkStructuredStreamingPipelineOptions options {/blurb}
Unbounded PCollections in a Pipeline will automatically switch execution from the default batch mode to streaming. You can turn it on using streaming
option (part of StreamingOptions
options).
PipelineTranslatorStreaming
is used for streaming mode while PipelineTranslatorBatch
for batch mode.
{blurb, class: information} TODO Review PipelineTranslator to understand how the translation of Beam concepts to Spark’s works. {/blurb}