Skip to content

SparkConnectGraphElementRegistry

SparkConnectGraphElementRegistry is a GraphElementRegistry.

SparkConnectGraphElementRegistry acts as a communication bridge between Spark Declarative Pipelines' Python execution environment and Spark Connect Server (with PipelinesHandler).

Creating Instance

SparkConnectGraphElementRegistry takes the following to be created:

  • SparkSession (SparkConnectClient)
  • Dataflow Graph ID

SparkConnectGraphElementRegistry is created when:

  • pyspark.pipelines.cli is requested to run

register_dataset

GraphElementRegistry
register_dataset(
    self,
    dataset: Dataset,
) -> None

register_dataset is part of the GraphElementRegistry abstraction.

register_dataset makes sure that the given Dataset is either MaterializedView, StreamingTable or TemporaryView.

register_dataset requests this SparkConnectClient to execute a PipelineCommand.DefineDataset command.

PipelinesHandler

DefineDataset commands are handled by PipelinesHandler on Spark Connect Server.

register_flow

GraphElementRegistry
register_flow(
    self,
    flow: Flow
) -> None

register_flow is part of the GraphElementRegistry abstraction.

register_flow requests this SparkConnectClient to execute a PipelineCommand.DefineFlow command.

PipelinesHandler

DefineFlow commands are handled by PipelinesHandler on Spark Connect Server.