Skip to content

TriggeredGraphExecution

TriggeredGraphExecution is a GraphExecution for the DataflowGraph (in PipelineUpdateContext).

Creating Instance

TriggeredGraphExecution takes the following to be created:

TriggeredGraphExecution is created when:

pipelineState Lookup Table

pipelineState: Map[TableIdentifier, StreamState]

pipelineState is a lookup table (of TableIdentifiers with their StreamStates) supporting full concurrency of retrievals and high expected concurrency for updates.

pipelineState is used when:

StreamState

A flow can be in exactly one StreamState:

  • CANCELED
  • EXCLUDED
  • IDLE
  • QUEUED
  • RUNNING
  • SKIPPED
  • SUCCESSFUL
  • TERMINATED_WITH_ERROR

A flow's state can be looked up in the pipelineState registry.

Sealed Trait

StreamState is a Scala sealed trait which means that all of the implementations are in the same compilation unit (a single file).

Learn more in the Scala Language Specification.

Topological Execution Thread

topologicalExecutionThread: Option[Thread]

topologicalExecutionThread is the Topological Execution thread of execution of this pipeline update.

topologicalExecutionThread is initialized and started when TriggeredGraphExecution is requested to start.

topologicalExecutionThread runs until awaitCompletion or stopInternal.

Start Pipeline

GraphExecution
start(): Unit

start is part of the GraphExecution abstraction.

start registers the stream listener.

start requests this PipelineUpdateContext for the flows to be refreshed and...FIXME

start creates a Topological Execution thread and starts its execution.

Create Topological Execution Thread

buildTopologicalExecutionThread(): Thread

buildTopologicalExecutionThread creates a new thread of execution known as Topological Execution.

When started, the thread does topological execution.

topologicalExecution

topologicalExecution(): Unit

topologicalExecution finds the flows in QUEUED and RUNNING states or failed but can be re-tried.

For each flow in RUNNING state, topologicalExecution...FIXME

topologicalExecution checks leaking permits.

FIXME Explain

topologicalExecution starts flows that are ready to start.

Start Single Flow

startFlow(
  flow: ResolvedFlow): Unit

startFlow prints out the following INFO message to the logs:

Starting flow [flow_identifier]

startFlow requests this PipelineUpdateContext for the FlowProgressEventLogger to recordPlanningForBatchFlow.

startFlow plans and starts the flow (that may or may not give a FlowExecution).

For the flow that has been executed successfully (and available as the FlowExecution), startFlow marks it as RUNNING (in the pipelineState registry). startFlow prints out the following INFO message to the logs:

Flow [flowIdentifier] started.

Otherwise, for an ONCE flow, startFlow requests this PipelineUpdateContext for the FlowProgressEventLogger to recordIdle and marks it as IDLE (in the pipelineState registry).

For a non-ONCE flow that has not been executed successfully, startFlow requests this PipelineUpdateContext for the FlowProgressEventLogger to recordSkipped and marks it as SKIPPED (in the pipelineState registry).

For any non-fatal exceptions, startFlow recordFailed.

Streaming Trigger

GraphExecution
streamTrigger(
  flow: Flow): Trigger

streamTrigger is part of the GraphExecution abstraction.

streamTrigger is AvailableNowTrigger (Spark Structured Streaming).

awaitCompletion

GraphExecution
awaitCompletion(): Unit

awaitCompletion is part of the GraphExecution abstraction.

awaitCompletion waits for this Topological Execution thread to die.

stopInternal

stopInternal(
  stopTopologicalExecutionThread: Boolean): Unit

stopInternal...FIXME


stopInternal is used when:

  • TriggeredGraphExecution is requested to start and stop

Stop Execution

GraphExecution
stop(): Unit

stop is part of the GraphExecution abstraction.

stop stopInternal (with stopTopologicalExecutionThread flag enabled).