Skip to content

Flow

Flow is an extension of the GraphElement abstraction for flows in dataflow graphs.

Flows can be batch or streaming.

Flows can be defined explicitly or implicitly (while defining other pipeline elements) in SQL and Python.

Flows must be successfully analyzed (resolved) in order to determine whether they are streaming or not.

Flows and DataFrames

Think of flows as Spark DataFrames (that declaratively describe computations over batch or streaming data sources in Apache Spark).

Contract (Subset)

FlowFunction

func: FlowFunction

FlowFunction of this Flow

Used to create an UnresolvedFlow

See:

once

once: Boolean

One-time flows Unsupported

One-time flows are not supported yet (and defineFlow reports an AnalysisException for DefineFlows with once enabled).

Indicates whether this is a ONCE flow or not. ONCE flows can only be run once per full refresh.

Default: false

See:

Used when:

Implementations