Skip to content

DataflowGraphRegistry

DataflowGraphRegistry is a registry of Dataflow Graphs (a mere wrapper around a collection of GraphRegistrationContexts)

Scala object

DataflowGraphRegistry is an object in Scala which means it is a class that has exactly one instance (itself). A Scala object is created lazily when it is referenced for the first time.

Learn more in Tour of Scala.

Demo

import org.apache.spark.sql.connect.pipelines.DataflowGraphRegistry

val graphId = DataflowGraphRegistry.createDataflowGraph(
  defaultCatalog=spark.catalog.currentCatalog(),
  defaultDatabase=spark.catalog.currentDatabase,
  defaultSqlConf=Map.empty)
assert(DataflowGraphRegistry.getAllDataflowGraphs.size == 1)

Dataflow Graphs

dataflowGraphs: ConcurrentHashMap[String, GraphRegistrationContext]

DataflowGraphRegistry manages GraphRegistrationContexts (by graph IDs).

A new GraphRegistrationContext is added when DataflowGraphRegistry is requested to createDataflowGraph.

A single GraphRegistrationContext can be looked up with getDataflowGraph and getDataflowGraphOrThrow.

All the GraphRegistrationContexts can be returned with getAllDataflowGraphs.

A GraphRegistrationContext is removed when dropDataflowGraph.

dataflowGraphs is cleared up with dropAllDataflowGraphs.

createDataflowGraph

createDataflowGraph(
  defaultCatalog: String,
  defaultDatabase: String,
  defaultSqlConf: Map[String, String]): String

createDataflowGraph...FIXME


createDataflowGraph is used when:

Find Dataflow Graph (or Throw SparkException)

getDataflowGraphOrThrow(
  dataflowGraphId: String): GraphRegistrationContext

getDataflowGraphOrThrow looks up the GraphRegistrationContext for the given dataflowGraphId or throws an SparkException if it does not exist.

Dataflow graph with id [graphId] could not be found

getDataflowGraphOrThrow is used when:

Find Dataflow Graph

getDataflowGraph(
  graphId: String): Option[GraphRegistrationContext]

getDataflowGraph finds the GraphRegistrationContext for the given graphId (in this dataflowGraphs registry).


getDataflowGraph is used when: