PipelineExecution¶

PipelineExecution manages the lifecycle of a GraphExecution (in the given PipelineUpdateContext).

PipelineExecution is part of PipelineUpdateContext.

Creating Instance¶

PipelineExecution takes the following to be created:

PipelineUpdateContext

PipelineExecution is created alongside PipelineUpdateContext.

Run Pipeline (and Wait for Completion)¶

runPipeline(): Unit

runPipeline starts this pipeline and requests the PipelineExecution (of this PipelineUpdateContext) to wait for the execution to complete.

runPipeline is used when:

PipelinesHandler is requested to start a pipeline run

Dry-Run Pipeline Update¶

dryRunPipeline(): Unit

dryRunPipeline resolves and validates this dataflow graph.

UnresolvedPipelineException

Resolving and validating this dataflow graph may fail with an UnresolvedPipelineException.

If successful, dryRunPipeline requests this PipelineUpdateContext to emit an RunCompletion PipelineEvent.

dryRunPipeline is used when:

PipelinesHandler is requested to start a dry-run pipeline update

Run Pipeline Update¶

startPipeline(): Unit

startPipeline resolves and validates this dataflow graph.

For a full-refresh update, startPipeline resets the state of all the flows in the DataflowGraph.

startPipeline materializes the datasets (of this dataflow graph).

startPipeline creates a new TriggeredGraphExecution for the materialized dataflow graph.

In the end, startPipeline requests the TriggeredGraphExecution to start.

startPipeline is used when:

PipelineExecution is requested to runPipeline

Await Completion¶

awaitCompletion(): Unit

awaitCompletion requests this GraphExecution to awaitCompletion.

awaitCompletion is used when:

PipelineExecution is requested to runPipeline

Initialize Dataflow Graph¶

initializeGraph(): DataflowGraph

initializeGraph requests this PipelineUpdateContext for the unresolved DataflowGraph to be resolved and validated.

In the end, initializeGraph materializes the tables (datasets).

initializeGraph is used when:

PipelineExecution is requested to start the pipeline

GraphExecution¶

graphExecution: Option[GraphExecution]

graphExecution is the GraphExecution of the pipeline.

PipelineExecution creates a TriggeredGraphExecution when startPipeline.

Used in:

awaitCompletion
executionStarted
startPipeline
stopPipeline

Is Execution Started¶

executionStarted: Boolean

executionStarted is a flag that indicates whether this GraphExecution has been created or not.

executionStarted is used when:

SessionHolder (Spark Connect) is requested to removeCachedPipelineExecution

stopPipeline¶

stopPipeline(): Unit

stopPipeline requests this GraphExecution to stop.

In case this GraphExecution has not been created (started) yet, stopPipeline reports a IllegalStateException:

Pipeline execution has not started yet.

stopPipeline is used when:

SessionHolder (Spark Connect) is requested to removeCachedPipelineExecution

Resolve and Validate Dataflow Graph¶

resolveGraph(): DataflowGraph

resolveGraph requests this PipelineUpdateContext for the unresolved DataflowGraph to resolve and validate.

UnresolvedPipelineException

In case of an UnresolvedPipelineException, resolveGraph handleInvalidPipeline and passes the exception along.

resolveGraph is used when:

PipelineExecution is requested to dry-run and run a pipeline update