UnresolvedFlow¶
UnresolvedFlow is a Flow that represents a flow in the Python and SQL transformations in Spark Declarative Pipelines:
- register_flow in PySpark's decorators
- CREATE FLOW ... AS INSERT INTO ... BY NAME
- CREATE MATERIALIZED VIEW
- CREATE STREAMING TABLE ... AS
- CREATE VIEW and the other variants of CREATE VIEW
UnresolvedFlow is registered to a GraphRegistrationContext with register a flow.
UnresolvedFlow is analyzed and resolved to ResolvedFlow (by FlowResolver when DataflowGraph is requested to resolve).
UnresolvedFlow must have unique identifiers (or an AnalysisException is reported).
Creating Instance¶
UnresolvedFlow takes the following to be created:
-
TableIdentifier - Flow destination (
TableIdentifier) -
FlowFunction -
QueryContext - SQL Config
- once flag
-
QueryOrigin
UnresolvedFlow is created when:
PipelinesHandleris requested to define a flowSqlGraphRegistrationContextis requested to handle the following SQL queries:
once Flag¶
UnresolvedFlow is given the once flag when created.
once flag is disabled (false) explicitly for the following:
- CreateFlowHandler
- CreateMaterializedViewAsSelectHandler
- CreatePersistedViewCommandHandler
- CreateStreamingTableAsSelectHandler
- CreateTemporaryViewHandler
PipelinesHandleris requested to define a flow
No ONCE UnresolvedFlows
It turns out that all UnresolvedFlows created are not ONCE flows.
As per this commit, it is said that:
However, the server does not currently implement this behavior yet. To avoid accidentally releasing APIs that don't actually work, we should take these arguments out for now. And add them back in when we actually support this functionality.