Skip to content

PipelineExecution

PipelineExecution manages the lifecycle of a GraphExecution (in the given PipelineUpdateContext).

PipelineExecution is part of PipelineUpdateContext.

Creating Instance

PipelineExecution takes the following to be created:

PipelineExecution is created alongside PipelineUpdateContext.

Run Pipeline (and Wait for Completion)

runPipeline(): Unit

runPipeline starts this pipeline and requests the PipelineExecution (of this PipelineUpdateContext) to wait for the execution to complete.


runPipeline is used when:

Dry-Run Pipeline Update

dryRunPipeline(): Unit

dryRunPipeline resolves and validates this dataflow graph.

UnresolvedPipelineException

Resolving and validating this dataflow graph may fail with an UnresolvedPipelineException.

If successful, dryRunPipeline requests this PipelineUpdateContext to emit an RunCompletion PipelineEvent.


dryRunPipeline is used when:

Run Pipeline Update

startPipeline(): Unit

startPipeline resolves and validates this dataflow graph.

For a full-refresh update, startPipeline resets the state of all the flows in the DataflowGraph.

startPipeline materializes the datasets (of this dataflow graph).

startPipeline creates a new TriggeredGraphExecution for the materialized dataflow graph.

In the end, startPipeline requests the TriggeredGraphExecution to start.


startPipeline is used when:

Await Completion

awaitCompletion(): Unit

awaitCompletion requests this GraphExecution to awaitCompletion.


awaitCompletion is used when:

Initialize Dataflow Graph

initializeGraph(): DataflowGraph

initializeGraph requests this PipelineUpdateContext for the unresolved DataflowGraph to be resolved and validated.

In the end, initializeGraph materializes the tables (datasets).


initializeGraph is used when:

GraphExecution

graphExecution: Option[GraphExecution]

graphExecution is the GraphExecution of the pipeline.

PipelineExecution creates a TriggeredGraphExecution when startPipeline.

Used in:

Is Execution Started

executionStarted: Boolean

executionStarted is a flag that indicates whether this GraphExecution has been created or not.


executionStarted is used when:

  • SessionHolder (Spark Connect) is requested to removeCachedPipelineExecution

stopPipeline

stopPipeline(): Unit

stopPipeline requests this GraphExecution to stop.

In case this GraphExecution has not been created (started) yet, stopPipeline reports a IllegalStateException:

Pipeline execution has not started yet.

stopPipeline is used when:

  • SessionHolder (Spark Connect) is requested to removeCachedPipelineExecution

Resolve and Validate Dataflow Graph

resolveGraph(): DataflowGraph

resolveGraph requests this PipelineUpdateContext for the unresolved DataflowGraph to resolve and validate.

UnresolvedPipelineException

In case of an UnresolvedPipelineException, resolveGraph handleInvalidPipeline and passes the exception along.


resolveGraph is used when:

  • PipelineExecution is requested to dry-run and run a pipeline update