PipelineExecution¶
PipelineExecution manages the lifecycle of a GraphExecution (in the given PipelineUpdateContext).
PipelineExecution is part of PipelineUpdateContext.
Creating Instance¶
PipelineExecution takes the following to be created:
PipelineExecution is created alongside PipelineUpdateContext.
Run Pipeline (and Wait for Completion)¶
runPipeline(): Unit
runPipeline starts this pipeline and requests the PipelineExecution (of this PipelineUpdateContext) to wait for the execution to complete.
runPipeline is used when:
PipelinesHandleris requested to start a pipeline run
Run Pipeline Update¶
startPipeline(): Unit
startPipeline resolves and validates the dataflow graph (of this pipeline update).
startPipeline materializes the datasets (of this dataflow graph).
startPipeline creates a new TriggeredGraphExecution for the materialized dataflow graph.
In the end, startPipeline requests the TriggeredGraphExecution to start.
startPipeline is used when:
PipelineExecutionis requested to runPipeline
Await Completion¶
awaitCompletion(): Unit
awaitCompletion requests this GraphExecution to awaitCompletion.
awaitCompletion is used when:
PipelineExecutionis requested to runPipeline
Initialize Dataflow Graph¶
initializeGraph(): DataflowGraph
initializeGraph requests this PipelineUpdateContext for the unresolved DataflowGraph to be resolved and validated.
In the end, initializeGraph materializes the tables (datasets).
initializeGraph is used when:
PipelineExecutionis requested to start the pipeline
GraphExecution¶
graphExecution: Option[GraphExecution]
graphExecution is the GraphExecution of the pipeline.
PipelineExecution creates a TriggeredGraphExecution when startPipeline.
Used in:
Is Execution Started¶
executionStarted: Boolean
executionStarted is a flag that indicates whether this GraphExecution has been created or not.
executionStarted is used when:
SessionHolder(Spark Connect) is requested toremoveCachedPipelineExecution
stopPipeline¶
stopPipeline(): Unit
stopPipeline requests this GraphExecution to stop.
In case this GraphExecution has not been created (started) yet, stopPipeline reports a IllegalStateException:
Pipeline execution has not started yet.
stopPipeline is used when:
SessionHolder(Spark Connect) is requested toremoveCachedPipelineExecution
Resolve Dataflow Graph¶
resolveGraph(): DataflowGraph
resolveGraph requests this PipelineUpdateContext for the unresolved DataflowGraph to resolve and validate.
resolveGraph is used when: