SparkConnectGraphElementRegistry¶
SparkConnectGraphElementRegistry
is a GraphElementRegistry.
SparkConnectGraphElementRegistry
acts as a communication bridge between Spark Declarative Pipelines' Python execution environment and Spark Connect Server (with PipelinesHandler).
Creating Instance¶
SparkConnectGraphElementRegistry
takes the following to be created:
-
SparkSession
(SparkConnectClient
) - Dataflow Graph ID
SparkConnectGraphElementRegistry
is created when:
pyspark.pipelines.cli
is requested to run
register_dataset¶
GraphElementRegistry
register_dataset(
self,
dataset: Dataset,
) -> None
register_dataset
is part of the GraphElementRegistry abstraction.
register_dataset
makes sure that the given Dataset
is either MaterializedView
, StreamingTable
or TemporaryView
.
register_dataset
requests this SparkConnectClient to execute a PipelineCommand.DefineDataset
command.
PipelinesHandler
DefineDataset
commands are handled by PipelinesHandler on Spark Connect Server.
register_flow¶
GraphElementRegistry
register_flow(
self,
flow: Flow
) -> None
register_flow
is part of the GraphElementRegistry abstraction.
register_flow
requests this SparkConnectClient to execute a PipelineCommand.DefineFlow
command.
PipelinesHandler
DefineFlow
commands are handled by PipelinesHandler on Spark Connect Server.