PipelineExecution¶
PipelineExecution manages the lifecycle of a GraphExecution (in the given PipelineUpdateContext).
PipelineExecution is part of PipelineUpdateContext.
Creating Instance¶
PipelineExecution takes the following to be created:
PipelineExecution is created alongside PipelineUpdateContext.
Run Pipeline (and Wait for Completion)¶
runPipeline(): Unit
runPipeline starts this pipeline and requests the PipelineExecution (of this PipelineUpdateContext) to wait for the execution to complete.
runPipeline is used when:
PipelinesHandleris requested to start a pipeline run
Dry-Run Pipeline Update¶
dryRunPipeline(): Unit
dryRunPipeline resolves and validates this dataflow graph.
UnresolvedPipelineException
Resolving and validating this dataflow graph may fail with an UnresolvedPipelineException.
If successful, dryRunPipeline requests this PipelineUpdateContext to emit an RunCompletion PipelineEvent.
dryRunPipeline is used when:
PipelinesHandleris requested to start a dry-run pipeline update
Run Pipeline Update¶
startPipeline(): Unit
startPipeline resolves and validates this dataflow graph.
For a full-refresh update, startPipeline resets the state of all the flows in the DataflowGraph.
startPipeline materializes the datasets (of this dataflow graph).
startPipeline creates a new TriggeredGraphExecution for the materialized dataflow graph.
In the end, startPipeline requests the TriggeredGraphExecution to start.
startPipeline is used when:
PipelineExecutionis requested to runPipeline
Await Completion¶
awaitCompletion(): Unit
awaitCompletion requests this GraphExecution to awaitCompletion.
awaitCompletion is used when:
PipelineExecutionis requested to runPipeline
Initialize Dataflow Graph¶
initializeGraph(): DataflowGraph
initializeGraph requests this PipelineUpdateContext for the unresolved DataflowGraph to be resolved and validated.
In the end, initializeGraph materializes the tables (datasets).
initializeGraph is used when:
PipelineExecutionis requested to start the pipeline
GraphExecution¶
graphExecution: Option[GraphExecution]
graphExecution is the GraphExecution of the pipeline.
PipelineExecution creates a TriggeredGraphExecution when startPipeline.
Used in:
Is Execution Started¶
executionStarted: Boolean
executionStarted is a flag that indicates whether this GraphExecution has been created or not.
executionStarted is used when:
SessionHolder(Spark Connect) is requested toremoveCachedPipelineExecution
stopPipeline¶
stopPipeline(): Unit
stopPipeline requests this GraphExecution to stop.
In case this GraphExecution has not been created (started) yet, stopPipeline reports a IllegalStateException:
Pipeline execution has not started yet.
stopPipeline is used when:
SessionHolder(Spark Connect) is requested toremoveCachedPipelineExecution
Resolve and Validate Dataflow Graph¶
resolveGraph(): DataflowGraph
resolveGraph requests this PipelineUpdateContext for the unresolved DataflowGraph to resolve and validate.
UnresolvedPipelineException
In case of an UnresolvedPipelineException, resolveGraph handleInvalidPipeline and passes the exception along.
resolveGraph is used when: