Flow¶
Flow is an extension of the GraphElement abstraction for flows in dataflow graphs.
Flows can be batch or streaming.
Flows can be defined explicitly or implicitly (while defining other pipeline elements) in SQL and Python.
Flows must be successfully analyzed (resolved) in order to determine whether they are streaming or not.
Flows and DataFrames
Think of flows as Spark DataFrames (that declaratively describe computations over batch or streaming data sources in Apache Spark).
Contract (Subset)¶
FlowFunction¶
func: FlowFunction
FlowFunction of this Flow
Used to create an UnresolvedFlow
See:
once¶
once: Boolean
One-time flows Unsupported
One-time flows are not supported yet (and defineFlow reports an AnalysisException for DefineFlows with once enabled).
Indicates whether this is a ONCE flow or not. ONCE flows can only be run once per full refresh.
- ONCE flows are planned for execution as AppendOnceFlows (FlowResolver)
- ONCE flows are marked as IDLE when
TriggeredGraphExecutionis requested to start flows in topologicalExecution. - ONCE flows must be batch (not streaming).
- For ONCE flows or when the logical plan for the flow is streaming,
GraphElementTypeUtilsconsiders a ResolvedFlow as aSTREAMING_TABLE(in getDatasetTypeForMaterializedViewOrStreamingTable).
Default: false
See:
Used when:
TriggeredGraphExecutionis requested to topologicalExecutionPipelinesErrorsis requested to checkStreamingErrorsAndRetry (to skip ONCE flows with no exception)GraphValidationsis requested to validateFlowStreamingnessGraphElementTypeUtilsis requested to getDatasetTypeForMaterializedViewOrStreamingTableFlowResolveris requested to convertResolvedToTypedFlow