DataflowGraph¶
DataflowGraph is a GraphRegistrationContext with tables, sinks, views and flows fully-qualified, resolved and de-duplicated.
Creating Instance¶
DataflowGraph takes the following to be created:
DataflowGraph is created when:
DataflowGraphis requested to reanalyzeFlowGraphRegistrationContextis requested to convert to a DataflowGraph
Outputs¶
output: Map[TableIdentifier, Output]
Lazy Value
output is a Scala lazy value to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
Learn more in the Scala Language Specification.
output is a collection of unique Outputs (tables and sinks) by their TableIdentifier.
output is used when:
FlowPlanneris requested to plan a flow for execution (to find the destination table of a flow)DataflowGraphis requested for the materialized flows
reanalyzeFlow¶
reanalyzeFlow(
srcFlow: Flow): ResolvedFlow
reanalyzeFlow finds the upstream datasets.
reanalyzeFlow finds the upstream flows (for the upstream datasets that could be found in the resolvedFlows registry).
reanalyzeFlow finds the upstream views (for the upstream datasets that could be found in the view registry).
reanalyzeFlow creates a new (sub)DataflowGraph for the upstream flows, views and a single table (the destination of the given Flow).
reanalyzeFlow requests the subgraph to resolve and returns the ResolvedFlow for the given Flow.
reanalyzeFlow is used when:
BatchTableWriteis requested to executeAsync (and executeInternal)StreamingTableWriteis requested to executeAsync (and startStream)
Resolve¶
resolve(): DataflowGraph
resolve...FIXME
resolve is used when:
DataflowGraphis requested to reanalyzeFlowPipelineExecutionis requested to initializeGraph
Validate¶
validate(): DataflowGraph
validate...FIXME
validate is used when:
PipelineExecutionis requested to initialize the dataflow graph