Skip to content

SparkConnectGraphElementRegistry

SparkConnectGraphElementRegistry is a GraphElementRegistry.

SparkConnectGraphElementRegistry acts as a communication bridge between Spark Declarative Pipelines' Python execution environment and Spark Connect Server (with PipelinesHandler).

Creating Instance

SparkConnectGraphElementRegistry takes the following to be created:

  • SparkSession (SparkConnectClient)
  • Dataflow Graph ID

SparkConnectGraphElementRegistry is created when:

  • pyspark.pipelines.cli is requested to run

Register Flow

GraphElementRegistry
register_flow(
    self,
    flow: Flow
) -> None

register_flow is part of the GraphElementRegistry abstraction.

register_flow requests this SparkConnectClient to execute a PipelineCommand.DefineFlow command.

PipelinesHandler on Spark Connect Server

DefineFlow commands are handled by PipelinesHandler on Spark Connect Server.

Register Output

GraphElementRegistry
register_output(
    self,
    output: Output
) -> None

register_output is part of the GraphElementRegistry abstraction.

register_output requests this SparkConnectClient to execute a DefineOutput pipeline command.

PipelinesHandler on Spark Connect Server

DefineOutput command is handled by PipelinesHandler on Spark Connect Server.