TriggeredGraphExecution¶
TriggeredGraphExecution is a GraphExecution for the DataflowGraph (in PipelineUpdateContext).
Creating Instance¶
TriggeredGraphExecution takes the following to be created:
- DataflowGraph
- PipelineUpdateContext
-
onCompletionCallback (RunTerminationReason => Unit, default: do nothing) -
Clock(default:SystemClock)
TriggeredGraphExecution is created when:
PipelineExecutionis requested to run a pipeline update
pipelineState Lookup Table¶
pipelineState is a lookup table (of TableIdentifiers with their StreamStates) supporting full concurrency of retrievals and high expected concurrency for updates.
pipelineState is used when:
TriggeredGraphExecutionis requested to flowsWithState, getRunTerminationReason, recordFailed, recordSkippedIfSelected, recordSuccess, start, stopInternal, topologicalExecution
StreamState¶
A flow can be in exactly one StreamState:
CANCELEDEXCLUDEDIDLEQUEUEDRUNNINGSKIPPEDSUCCESSFULTERMINATED_WITH_ERROR
A flow's state can be looked up in the pipelineState registry.
Sealed Trait
StreamState is a Scala sealed trait which means that all of the implementations are in the same compilation unit (a single file).
Learn more in the Scala Language Specification.
Topological Execution Thread¶
topologicalExecutionThread is the Topological Execution thread of execution of this pipeline update.
topologicalExecutionThread is initialized and started when TriggeredGraphExecution is requested to start.
topologicalExecutionThread runs until awaitCompletion or stopInternal.
Start Pipeline¶
start registers the stream listener.
start requests this PipelineUpdateContext for the flows to be refreshed and...FIXME
start creates a Topological Execution thread and starts its execution.
Create Topological Execution Thread¶
buildTopologicalExecutionThread creates a new thread of execution known as Topological Execution.
When started, the thread does topological execution.
topologicalExecution¶
topologicalExecution finds the flows in QUEUED and RUNNING states or failed but can be re-tried.
For each flow in RUNNING state, topologicalExecution...FIXME
topologicalExecution checks leaking permits.
FIXME Explain
topologicalExecution starts flows that are ready to start.
Start Single Flow¶
startFlow prints out the following INFO message to the logs:
startFlow requests this PipelineUpdateContext for the FlowProgressEventLogger to recordPlanningForBatchFlow.
startFlow plans and starts the flow (that may or may not give a FlowExecution).
For the flow that has been executed successfully (and available as the FlowExecution), startFlow marks it as RUNNING (in the pipelineState registry). startFlow prints out the following INFO message to the logs:
Otherwise, for an ONCE flow, startFlow requests this PipelineUpdateContext for the FlowProgressEventLogger to recordIdle and marks it as IDLE (in the pipelineState registry).
For a non-ONCE flow that has not been executed successfully, startFlow requests this PipelineUpdateContext for the FlowProgressEventLogger to recordSkipped and marks it as SKIPPED (in the pipelineState registry).
For any non-fatal exceptions, startFlow recordFailed.
Streaming Trigger¶
GraphExecution
streamTrigger is part of the GraphExecution abstraction.
streamTrigger is AvailableNowTrigger (Spark Structured Streaming).
awaitCompletion¶
awaitCompletion waits for this Topological Execution thread to die.
stopInternal¶
stopInternal...FIXME
stopInternal is used when:
Stop Execution¶
stop stopInternal (with stopTopologicalExecutionThread flag enabled).