Skip to content

DataflowGraphRegistry

DataflowGraphRegistry is a registry of Dataflow Graphs (a mere wrapper around a collection of GraphRegistrationContexts)

Scala object

DataflowGraphRegistry is an object in Scala which means it is a class that has exactly one instance (itself). A Scala object is created lazily when it is referenced for the first time.

Learn more in Tour of Scala.

Demo

import org.apache.spark.sql.connect.pipelines.DataflowGraphRegistry

val graphId = DataflowGraphRegistry.createDataflowGraph(
  defaultCatalog=spark.catalog.currentCatalog(),
  defaultDatabase=spark.catalog.currentDatabase,
  defaultSqlConf=Map.empty)
assert(DataflowGraphRegistry.getAllDataflowGraphs.size == 1)

Dataflow Graphs

dataflowGraphs: ConcurrentHashMap[String, GraphRegistrationContext]

DataflowGraphRegistry manages GraphRegistrationContexts (by graph IDs).

A new GraphRegistrationContext is added when DataflowGraphRegistry is requested to create a new dataflow graph.

A single GraphRegistrationContext can be looked up with getDataflowGraph and getDataflowGraphOrThrow.

All the GraphRegistrationContexts can be returned with getAllDataflowGraphs.

A GraphRegistrationContext is removed when dropDataflowGraph.

dataflowGraphs is cleared up with dropAllDataflowGraphs.

Create Dataflow Graph

createDataflowGraph(
  defaultCatalog: String,
  defaultDatabase: String,
  defaultSqlConf: Map[String, String]): String

createDataflowGraph generates a graph ID (as a pseudo-randomly generated UUID).

createDataflowGraph registers a new GraphRegistrationContext with the graph ID (in this dataflowGraphs registry).

In the end, createDataflowGraph returns the graph ID.


createDataflowGraph is used when:

Find Dataflow Graph (or Throw SparkException)

getDataflowGraphOrThrow(
  dataflowGraphId: String): GraphRegistrationContext

getDataflowGraphOrThrow looks up the GraphRegistrationContext for the given dataflowGraphId or throws an SparkException if it does not exist.

Dataflow graph with id [graphId] could not be found

getDataflowGraphOrThrow is used when:

Find Dataflow Graph

getDataflowGraph(
  graphId: String): Option[GraphRegistrationContext]

getDataflowGraph finds the GraphRegistrationContext for the given graphId (in this dataflowGraphs registry).


getDataflowGraph is used when: