Skip to content

PipelineExecution

PipelineExecution manages the lifecycle of a GraphExecution (in the given PipelineUpdateContext).

PipelineExecution is part of PipelineUpdateContext.

Creating Instance

PipelineExecution takes the following to be created:

PipelineExecution is created alongside PipelineUpdateContext.

Run Pipeline (and Wait for Completion)

runPipeline(): Unit

runPipeline starts the pipeline and requests the PipelineExecution (of this PipelineUpdateContext) to wait for the execution to complete.


runPipeline is used when:

Start Pipeline

startPipeline(): Unit

startPipeline resolves and validates the pipeline graph.

startPipeline creates a new TriggeredGraphExecution.

In the end, startPipeline requests the GraphExecution to start.


startPipeline is used when:

Await Completion

awaitCompletion(): Unit

awaitCompletion requests this GraphExecution to awaitCompletion.


awaitCompletion is used when:

Initialize Dataflow Graph

initializeGraph(): DataflowGraph

initializeGraph requests this PipelineUpdateContext for the unresolved DataflowGraph to be resolved and validated.

In the end, initializeGraph materializes the tables (datasets).


initializeGraph is used when:

GraphExecution

graphExecution: Option[GraphExecution]

graphExecution is the GraphExecution of the pipeline.

PipelineExecution creates a TriggeredGraphExecution when startPipeline.

Used in:

Is Execution Started

executionStarted: Boolean

executionStarted is a flag that indicates whether this GraphExecution has been created or not.


executionStarted is used when:

  • SessionHolder (Spark Connect) is requested to removeCachedPipelineExecution

stopPipeline

stopPipeline(): Unit

stopPipeline requests this GraphExecution to stop.

In case this GraphExecution has not been created (started) yet, stopPipeline reports a IllegalStateException:

Pipeline execution has not started yet.

stopPipeline is used when:

  • SessionHolder (Spark Connect) is requested to removeCachedPipelineExecution